Blog do projektu Open Source JavaHotel

wtorek, 31 grudnia 2019


An enhancement to my implementation of TPC-DS test. BigSQL test can be conducted by jsqsh utility, the dependency on DB2 client software is removed. Jsqsh is the part of BigSQL package and no additional dependency software is required now. How to use jsqsh for MyTPC-DS is described here.

I also added some additional results of the TPC-DS test conducted on HDP 3.1. The result is presented in the shape of Bar Plot to make them easier to understand. The graphic is created with the help of matplotlib. The Python source code is uploaded here. The input tables are taken directly from GitHub Wiki page. The pick up the correct table, a simple HTML tag is added to the page before every table.

The tests were executed using 100 GB data set and cannot be the basis of any far-reaching conclusion. But one apparent difference between HDP 2.6 and 3.1 is the significant performance improvement of Hive, from Hive 2.1 to Hive 3.1. Now the performance of Hive is almost as effective as the performance of BigSQL and ahead of SparkSQL. Also, the query coverage of Hive is much more comprehensive, from 50% (Hive 2.1) to almost 100% (Hive 3.1).

Useful links
  • MyTPC-DS execution framework, link
  • TPC-DS results using Bar Plot graphics, link
  • TPC-DS results analytics tables, link 
  • Python source code to compile the test result and prepare GitHub Wiki table, link
  • Python source code to prepare graphics using matplotlib package, link
  • Run BigSQL TPC-DS test using jsqsh tool, link