Blog do projektu Open Source JavaHotel

czwartek, 26 października 2017

HDP, BigInsights, Kafka, Kerberos

I spent several hours resolving a nasty problem which came up after enabling Kerberos security. Suddenly command line kafka-topic utility tools refused to cooperate:
[2017-10-26 23:31:17,424] WARN Could not login: the client is being asked for a password, but the Zookeeper client code does not currently support obtaining a password from the user. Make sure that the client is configured to use a ticket cache (using the JAAS configuration setting 'useTicketCache=true)' and restart the client. If you still get this message after that, the TGT in the ticket cache has expired and must be manually refreshed. To do so, first determine if you are using a password or a keytab. If the former, run kinit in a Unix shell in the environment of the user who is running this Zookeeper client using the command 'kinit ' (where  is the name of the client's Kerberos principal). If the latter, do 'kinit -k -t  ' (where  is the name of the Kerberos principal, and  is the location of the keytab file). After manually refreshing your cache, restart this client. If you continue to see this message after manually refreshing your cache, ensure that your KDC host's clock is in sync with this host's clock. (org.apache.zookeeper.client.ZooKeeperSaslClient)
[2017-10-26 23:31:17,426] WARN SASL configuration failed: javax.security.auth.login.LoginException: No password provided Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn)
Exception in thread "main" org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure
 at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:946)
 at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:923)
 at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1230)
 at org.I0Itec.zkclient.ZkClient.(ZkClient.java:156)
 at org.I0Itec.zkclient.ZkClient.(ZkClient.java:130)
 at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:76)
 at kafka.utils.ZkUtils$.apply(ZkUtils.scala:58)
 at kafka.admin.TopicCommand$.main(TopicCommand.scala:53)
 at kafka.admin.TopicCommand.main(TopicCommand.scala)

The reason is quite simple. Kafka to communicate with underlying Zookeper uses /etc/security/keytabs/kafka.service.keytab. As a default, this file has permission 400 so only kafka user can access it.
The solution is to change permission to 440, so the security is softened a little bit but the file is still protected. User vying to create Kafka topic should belong to hadoop group.

poniedziałek, 9 października 2017

Hadoop and SQL engines

Every Hadoop distribution comes with several SQL engines. So I decided to create a simple test to compare them. Run the same queries against the same data set. So far I have been working with two Hadoop distribution, BigInsights 4.x and HDP 2.6.2 with Big SQL 5.0.1. The second is now the successor of BigInsights.
I was comparing the following SQL engines:

  • MySQL, embedded
  • Hive against data in different format: text files, Parquet and OCR
  • Big SQL on Hive tables, Parquet and OCR
  • Spark SQL
  • Phoenix, SQL engine for HBase.
It is not any kind of benchmarking, the purpose is not to prove the superiority of one SQL engine over another. I also haven't done any kind of tunning or reconfiguration to speed up. Just to conduct a simple check after installation and have several numbers at hand.
The test description and several results are here.
Although I do not claim any ultimate authority here, I can provide several conclusions.
  • Big SQL is a winner. Particularly comparing to Hive. Very important: Big SQL is running on the same physical data, the only difference is a different computational model. It even beats MySQL. But, of course, MySQL will get the upper hand for OLTP requests. 
  • Hive behaves much better paired with TEZ. On the other hand, the execution time is very fluid, can change from one execution to another drastically.
  • Spark SQL is outside competition but it is hard to outmatch in-memory execution.
  • Phoenix SQL is at the end of the race, but the execution time is very stable.


sobota, 30 września 2017

Visualize car data with Brunel and Scala

There is a sample in IBM Data Science Experience "Visualize car data with Brunel". But this sample notebook is written in Python, PySpark. So I transformed it to Scala just receiving the same result but using Scala syntax. I added some comments to explain the code.
The result is published here.
To run it:

  • Download Cars+.ipynb notebook
  • Upload to Jupyter with Apache Toree-Scala (Spark) kernel enabled
  • Enjoy

wtorek, 26 września 2017

Next version of Civilization The Board Game

Introduction
I deployed next version of my computer implementation of  Civilization The Board Game. The implementation consists of three parts:
New features
I implemented new, more user-friendly interface.
Single user, training game.

Two players game, real melee battle.
The opponent deck is inactive, informative only.
More user-friendly features
If figure (scout or army) is in the corner between two hidden tiles, a dialog to select one to be revealed comes up.
Figures can be stacked. If a player decided to move stacked figures, a dialog to select figures to move pops up.
After selecting an action in the left panel, squares, where action could be conducted, are highlighted.
Next steps
Implement
  • Spend trade to rush the production
  • Send production from scout to city
  • Buying units
  • Battles (?)

sobota, 23 września 2017

New version of Civilization the Board Game

Introduction
I deployed next version of my computer implementation of  Civilization The Board Game. The implementation consists of three parts:
New features
  • Game progress saved
  • Game resume
  • Two players game
Game progress saved
The game is saved constantly in redis key/value database. It is done automatically in the background in a transparent way, there is no "Save" button. The server side of the game is stateless, it is only computational engine, redis datastore is a memory cache. Redis is accessed through an interface, I'm going to prepare HBase version as a warmup. Stateless is a very important feature, in the imaginative future when thousands of players are swarming, load balancing and traffic redirection can be applied. 

Game resume
A game can be left and resumed at any time. Just select a game and the player is pushed immediately into the middle of the battle.

Also, two players game can be resumed but second players should join to continue the game.
Two players game
Two options of creating new game are available: "Training" and "Two players game".

Training is a single player game, just to get one's hand in. "Two players game" is more serious matter, test yourself in hand-to-hand battle. Select opponent civilization you want to fight against and wait for the contester.

Joining the game
Select the game from the waiting list and the contester together with one who threw down the gauntlet are moved into the middle of the duel.

Leaving the game
The player can leave the game by clicking the close icon in the upper right corner of the board.
Leaving the game legally unlocks the game and remove it from the waiting list. But because the game is saved automatically while playing, event after abruptly closing the browser the game can be resumed exactly at the same stage whatsoever. The game is removed from waiting list after 24 hours of inactivity.
Next steps
Make map and player deck more user-friendly.

sobota, 9 września 2017

DB2, UTL_ENCODE, BASE64_DECODE, BASE64_ENCODE

I created an implementation of the two methods from Oracle UTL_ENCODE package.  It is implemented as UDF Java function and DB2 module. In Java, it is simply a utilization of JVM Base64 package. More time consuming was preparing DB2 signature and test according to this article.
Full source code is here.

niedziela, 27 sierpnia 2017

Civilization The Board Game

Introduction
For some time, I became a fan of Civilization The Board Game. I found it more engaging, dynamic and enthralling than the computer game. It is like comparing real melee combat with the bureaucratic war waged behind the office desk.
And an idea stirred me up to move the game to the computer screen. Avoid the stuff piling up on the table, train and test ideas without spreading the board game and allow players in remote locations to fight.
For the time being, my idea ended up in two projects.
CivilizationEngine here
Civilization UI here
Demo version on Heroku: https://civilizationboardgame.herokuapp.com/  (wait a moment until dyno is activated, it is a free quota).
Every project comes with its own build.xml file allowing creation of target artifact.
General design principles
The solution consists of two separate projects: Civilization Engine and Civilization UI. I decided that all game logic and state is managed by back end engine. The UI, as the name suggests, is focused only on displaying the board game and allowing the user to execute a command. The command is sent to the server, server changes the game state and UI is receiving the current game state and updates the screen.
The game is nothing more like moving from one game state to another. Every change is triggered by the command. At every moment, it is possible to restore the current game state by setting the initial board and replaying all commands up to the point.
Data is transmitted between engine and UI in JSON format.
Civilization Engine
Civilization Engine is created as IntelliJ IDEA Scala project, can be imported directly from GitHub.
Why Scala? I found it very appropriate here. Most of the operations are related to list walking through, list looking up, filtering, mapping and Scala is an excellent tool for that. If I decided to use Java probably the code would pump up twice even with Java8 streaming features.
I'm very fond of this command (full source) :
  def itemizeForSetSity(b: GameBoard, civ: Civilization.T): Seq[P] =
    getFigures(b, civ).filter(_.s.figures.numberofScouts > 0).map(_.p).filter(p => SetCityAction.verifySetCity(b, civ, p, Command.SETCITY).isEmpty)
It yields all points where a new city can be set.
  • Find all figures on the board belonging to a civilization
  • Single out squares with at least one scout
  • Map squares to points
  • Verify if the point is eligible for city setting using SetCityAction.verifySetCity
All stuff in a single line.
A general outline of the project
  • resources, game objects (JSON format) used in the game: tiles, squares, objects (now TECHNOLOGIES only)
  • gameboard , class definitions
  • objects , enumerations and classes related to game artifacts
  • helper, game logic, I found more convenient to put them as helper, object class then as methods in Gameboard class.
  • io, methods regarding reading and writing data in JSON format. I'm using a dependency PlayJSON package.
  • I, external interface
Brief interface description
  • getData(LISTOFCIV), list of civilizations available
  • getData(REGISTEOWNER), generates new game and returns unique token to be used in further communication
  • getData(GETBOARDGAME), returns current game state
  • executeCommand,  executes next command
  • itemizeCommand, provides all possible parameters for a particular command. For instance: for StartOfMove it brings all points where figure movement are allowed to commence.
So far, only a few commands are implemented
  • SetCapital
  • SetArmy
  • SetScout
  • EnfOfPhase
  • BuyScout
  • BytArmy
  • MoveFigure
  • RevealTile
  • SetCity
User Interface
For the time being, so ugly that only a mother or father could love it. More details: look here.

Next steps
Implementation of game persistence. Because of Heroku limitation, I cannot use disk file system as a mean. I'm planning to use Redis, there is a free quota for this service in Heroku. Redis will be used to store the games and also as a cache. This way, the server part will be completely stateless. Every step will consist of restoring the game from Redis, executing a command and storing the updated game to Redis again.