javahotel: kwietnia 2019

poniedziałek, 29 kwietnia 2019

My HiBench

Problem
HiBench is regarded as a leading benchmark in the BigData world according to this webpage. But while trying to run it in my test HDP 2.6.5 or 3.1environment, 2.6.5 or 3.1, I found it very frustrating, particularly when the cluster is secured by Kerberos.
Because the project seems to be abandoned two years ago, I started implementing manual patches to the code, it is described here. But at some point, I came to a dead end. Kafka 0.8 client does not support Kerberos, Kerberos support was added in Kafka 0.10 client, but Kafka 0.10 is not backward compatible and moving there required redeveloping substantial part of Kafka related code in Java and Scala.
Solution
So finally I decided to make my own fork of HiBench project and split completely from the trunk. The result is a new GitHub project here.

Main changes include:

Kerberos support
Kafka 2.0 implemented
HDP 3.1/Apache Hadoop 3.1 support
Hive 3.1 support, the standard HiBench was developed for Hive 0.17 and does not talk to 3.1

Features lost comparing to standard HiBench

Only HDP is supported and tested, standalone Spark, Apache Hadoop, Kafka etc are not supported.
Only support for HDP 2.6.5 and HDP 3.1 is implemented. Support for an older version of HDP, Spark, Scala is abandoned.
Support for Apache Flink and Gearpunmo is removed.

Tests

The project was tested in two HDP 3.1 and HDP 2.6.5 environments, tiny in local KVM cluster and tiny/small/large in a larger multi-host cluster.

Future plans

Review Kafka streaming benchmark, while analyzing the code I found some mysteries out there
The same for Dfsio benchmark test
Test in wire encryption secured cluster

javahotel

poniedziałek, 29 kwietnia 2019

My HiBench

Etykiety

Civilization The Board Game

Archiwum bloga

O mnie