Blog do projektu Open Source JavaHotel

piątek, 31 sierpnia 2018

Pandas DataFrame and Scale Spark DataFrame

In pandas DataFrame (similar but different then Spark's DataFrame), data is provided by series.

import numpy
from pandas import DataFrame, Series
d = {'one' : [1., 2., 3., 4.],  'two' : [4., 3., 2., 1.]}
df = DataFrame(d)
one two 
0 1.0 4.0 
1 2.0 3.0 
2 3.0 2.0 
3 4.0 1.0
In Spark's DataFrame data is provided by features, rows in feature matrix.
val sqlC = new org.apache.spark.sql.SQLContext(sc)
import sqlC.implicits._
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val df = Seq((1.0,4.0),(2.0,3.0),(3.0,2.0),(4.0,1.1)).toDF("one","two")
Of course, there is a plenty of methods to create DataFrame from a file or any external source. But sometimes it is convenient to create Spark's DataFrame manually using panda's convention.
So I created a simple Scala method for creating DataFrame using series, not features.
Zeppelin notebook
import org.apache.spark.sql._
import org.apache.spark.sql.types._

def createDF(spark: SparkSession, names: Seq[String], series: Seq[Any]*): DataFrame = {
    require(names.length == series.length)
    //    val datas : Seq[Seq[Any]] = List.fill(names.length)(Nil)
    //    val rows : Seq[Row] = List.fill(names.length)(Row())
    val numof: Int = series(0).length
    var rows: Seq[Row] = Nil
    for (i <- 0 until numof) {
      var da: Seq[Any] = Nil
      for (j <- 0 until series.length)
        da = da :+ series(j)(i)
      val r: Row = Row.fromSeq(da)
      rows = rows :+ r
    val rdd = spark.sparkContext.makeRDD(rows)
    // schema
    val schema: Seq[StructField] =
      for (i <- 0 until names.length)
        yield StructField(names(i),
          series(i)(0) match {
            case t: Int => IntegerType
            case t: Double => DoubleType
            case _ => StringType
    spark.createDataFrame(rdd, StructType(schema))
Usage example
val names2 = Seq("one", "tow")
    val seriesone = Seq(1.0,2.0,3.0,4.0)
    val seriestwo = Seq(4.0,3.0,2.0,1.0)
    val da =  createDF(spark, names2,seriesone,seriestwo)
Example taken from Udacity course.
val names1 = Seq("countries","gold","silver","bronze")
    val countries = Seq("Russian Fed.", "Norway", "Canada", "United States",
    "Netherlands", "Germany", "Switzerland", "Belarus",
    "Austria", "France", "Poland", "China", "Korea",
    "Sweden", "Czech Republic", "Slovenia", "Japan",
    "Finland", "Great Britain", "Ukraine", "Slovakia",
    "Italy", "Latvia", "Australia", "Croatia", "Kazakhstan")

    val gold = Seq(13, 11, 10, 9, 8, 8, 6, 5, 4, 4, 4, 3, 3, 2, 2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0)
    val silver = Seq(11, 5, 10, 7, 7, 6, 3, 0, 8, 4, 1, 4, 3, 7, 4, 2, 4, 3, 1, 0, 0, 2, 2, 2, 1, 0)
    val bronze = Seq(9, 10, 5, 12, 9, 5, 2, 1, 5, 7, 1, 2, 2, 6, 2, 4, 3, 1, 2, 1, 0, 6, 2, 1, 0, 1)

    val da1 =  createDF(spark, names1, countries, gold,silver,bronze)

wtorek, 21 sierpnia 2018

Civilization The Board Game, next version

I deployed a new version of my computer implementation of  Civilization The Board Game. The implementation consists of three parts:
Several new features are implemented: conquering the alien city and victories.  Also, the performance is improved by caching.
Conquering the city and taking the double loot
The alien city can be attacked and pulled down. After winning (or losing) the winner can take the double loot.

According to game rules, the double loot is at the winner's disposal. Some loots are ranked as one, for instance, stealing the trade and some are ranked as two, for instance, the new technology. The number of loot cannot exceed the two.
Military Victory
If an alien capital city is conquered, the victor will get the medal of the Military Victory.

Culture Victory
After reaching the top of the culture track, the Culture Victory is announced.

Economic Victory
When a player puts aside at least 16 golden coins is acclaimed as Economic Winner by the astonished world.

Technology Victory
When a player piles up the stack of technology high enough, the world recognizes him as Steve Jobs and Elon Musk in one person.

Performance improvement
I started to sample the Civilization Engine using VisualVM profiler and immediately discovered the first bottleneck. The game board is every time recreated by executing all commands starting from the initial layout and it was the main resource consumer. The solution was pretty simple, just cache the current board and the latency was reduced even for my free Heroku quota, 1 CPU and 0.5 GB of memory.
Next steps
Unfortunately, it is high time to implement "hanging" feature like "Writing" :
"Cancel a city action being performed by another player (may not cancel resource ability)". I do not have a clear idea of how to accomplish it. One player has to be triggered that another player executes a command eligible for the action and should have a choice: let it or cancel it.

poniedziałek, 20 sierpnia 2018

HortonWorks, Atlas and Hive integration not working

I installed HDP 2.6.5 and tried to enable Hive for Atlas. But no Hive action was reflected in Atlas. I doublechecked the configuration and found it correct. There were no error messages neither in Hive logs nor in Atlas logs. Also, it looked that Atlas Hive Hook was activated for every action but it did not land in Atlas dashboard.
018-08-20 18:39:56,696 INFO  [HiveServer2-Background-Pool: Thread-899]: log.PerfLogger ( -
2018-08-20 18:39:56,697 INFO  [HiveServer2-Background-Pool: Thread-899]: log.PerfLogger ( -
The solution was very simple and was related to Kafka. Atlas is using Kafka as the intermediate medium to push data into Atlas realm. I installed only single Kafka broker but configuration requires at least 3 Kafka brokers to create a quorum. After adding two lacking Kafka brokers the flow was unlocked and all current and pending request found their way to Atlas.

środa, 1 sierpnia 2018

Polymer 2 to Polymer 3

I decided to upgrade from Polymer 2 to Polymer 3.  According to
"we've made a smooth upgrade path our top priority for Polymer 3.0. Polymer's API remains almost unchanged, and we're providing an upgrade tool (polymer-modulizer) that will automatically handle most of the work in converting your 2.x-based elements and apps to 3.0.". 
Encouraged by this advertisement, I run "modulizer --out ." and ... It depends on what one means by "most of the work" and "smooth".

Module names instead of path names
Instead of import pathnames like:
<href="../bower_components/polymer/polymer-element.html" rel="import"></link>
module names are used:
import {PolymerElement, html} from '@polymer/polymer/polymer-element.js'
I understand the rationale behind that but Chrome browser I'm using obviously does not. It still insists that @polymer is a directory name and demands the path name to resolve it. Adding the path names manually is challenging because not only custom elements should be touched but also all internal Polymer elements. So it looks that some kind of "build" process is necessary after every upgrade. After some trials, errors and research I ended up using:
polymer build --module-resolution node
which does the job. Then I replace current "node_modules" directory with "build/default/node_modules". In the case of the Polymer upgrade,  "yarn install" should be launched beforehand. Previously, no build process was required. It was enough to run "bower install".

ReferenceError: IntlMessageFormat is not defined
Uncaught (in promise) ReferenceError: IntlMessageFormat is not defined
    at HTMLElement. (app-localize-behavior.js:285)
    at runMethodEffect (property-effects.js:905)
    at Function._evaluateBinding (property-effects.js:3079)
    at Object.runBindingEffect [as fn] (property-effects.js:648)
    at runEffectsForProperty (property-effects.js:169)
    at runEffects (property-effects.js:131)
    at HTMLElement._propagatePropertyChanges (property-effects.js:1933)
    at HTMLElement._propertiesChanged (property-effects.js:1891)
    at HTMLElement._flushProperties (properties-changed.js:370)
    at HTMLElement._flushProperties (property-effects.js:1731)
Unfortunately, "polymer build" does not catch "node_modules/intl-messageformat" package and it is absent after build. So we have to reinstall the package again.
npm install intl-messageformat 
ReferenceError: IntlMessageFormat is not defined
Although "intl-messageformat" is downloaded, it should be also imported somewhere to make "IntMessageFormat" class visible.
I ended up patching manually "node_modules/@polymer/app-localize-behavior/app-localize-behavior.js" file.
import "../polymer/polymer-legacy.js";
import "../iron-ajax/iron-ajax.js";
import "../../intl-messageformat/dist/intl-messageformat.js";
Uncaught TypeError: Cannot set property 'IntlMessageFormat' of undefined
Uncaught TypeError: Cannot set property 'IntlMessageFormat' of undefined
    at main.js:7
    at main.js:7
It was a stab in the back because Chrome does not tell where this disaster happens. After some time I found a hint here.
The solution is to patch manually "node_modules/intl-messageformat/dist/intl-messageformat.js".
At the end of the file replace:
    var src$main$$default = $$core$$default;
    this['IntlMessageFormat'] = src$main$$default;

    var src$main$$default = $$core$$default;
    this['IntlMessageFormat'] = src$main$$default;
Uncaught ReferenceError: KeyframeEffect is not defined
This exception was thrown from paper-dropdown-menu element. Instead of spending time trying to resolve it, I decided to remove this dependency and replaced it with something different. Replace namespace with imports
In Polymer 2 I was using Polymer namespace for custom classes.
olymer.CivData = function(superClass) {

        return class extends Polymer.CivLocalize(superClass) {

                constructor() {

class CivTLevel extends Polymer.CivData(Polymer.Element) {

    static get is() {
        return 'civ-tlevel';
Polymer 3 enforces transforming all class to ES6 modules. So sequence above should be replaced by:
import { CivLocalize} from "../js/civ-localize.js";

export const CivData = function (superClass) {
  return class extends CivLocalize(superClass) {
    constructor() {

import { html } from "../node_modules/@polymer/polymer/lib/utils/html-tag.js";
import { PolymerElement } from "../node_modules/@polymer/polymer/polymer-element.js";
import { CivData} from "../js/civ-data.js";

class CivTLevel extends CivData(PolymerElement) {
  static get template() {

"modulizer" utility has nothing to do with that, it should be done manually.
"Smooth upgrade" ended up in several sleepless nights. But in the end, I made it. On the whole, I really like this ES6 module, it provides a good encapsulation and isolation which is very important in every programming language including JavaScript.