Introduction
I spent some time trying to come to terms with HiBench but finally, I gave up. The project seemed to be abandoned, there was a lot of problems to run it in a secure (kerberized) environment and adjust it to new versions of Hive and Spark. Also, it is the project with a long history and there are a lot of layers not consistent with each other
So I ended up with creating my own version of HiBench benchmark. Only code migrated from HiBench is Spark/Scala and Jave source code upgraded to new versions of dependencies and more consistent parameter handling.
Features
- Dedicated to HDP 3.1.5. Standalone services are not supported.
- Enabled for Kerberos
- Hive 3.0
- Spark 2.x
- Simple to run and expand, minimal configuration
Features not supported
- Streaming, pending
- Nutch indexing, not under development any longer
- Flink, Gearpump, not part of HDP stack
Configuration
All details are here.