Blog do projektu Open Source JavaHotel

niedziela, 29 listopada 2020

My own Hadoop/HDP benchmark

 Introduction

I spent some time trying to come to terms with HiBench but finally, I gave up. The project seemed to be abandoned, there was a lot of problems to run it in a secure (kerberized) environment and adjust it to new versions of Hive and Spark. Also, it is the project with a long history and there are a lot of layers not consistent with each other

So I ended up with creating my own version of HiBench benchmark. Only code migrated from HiBench is Spark/Scala and Jave source code upgraded to new versions of dependencies and more consistent parameter handling.

Features

  • Dedicated to HDP 3.1.5. Standalone services are not supported.  
  • Enabled for Kerberos
  • Hive 3.0
  • Spark 2.x
  • Simple to run and expand, minimal configuration
Features not supported
  • Streaming, pending
  • Nutch indexing, not under development any longer
  • Flink, Gearpump, not part of HDP stack
Configuration 

    All details are here.