Blog do projektu Open Source JavaHotel

wtorek, 25 grudnia 2018

TPC-DS benchmark

Inspiration
TCP-DS benchmark is an industry standard tool to evaluate the performance of the relational database engine. The tool can be used not only to leverage the general efficiency. The database administrator or DevOp engineer can utilize the tools to estimate the robustness of a particular database installation by comparing it against a reference instance. Also, it can be useful to verify the database health after upgrade or maintenance. Another use case is to gauge the performance gain after tunning or scaling.
The main disadvantage of the standard package is that it is not easy to use, requires a number of manual adjustment and fixes. It makes the tool less reliable and the process less repeatable. To compare the result against the reference results we have to be sure that the benchmark for both environments, the reference, and the current, are prepared and executed using the same queries and according to the same pattern.
Solution
I created a simple solution to make the task very simple. The project is available in GitHub.
The following databases are supported so far: Oracle, IBM DB2, MySQL/MariaDB, PostgreSQL, Hive, SparkSQL, Netezza, and IBM BigSQL.
Also, a wiki is available having hints on how to set up a test for a particular database. Having the tool, the TCP-DS benchmark is ready to use. Just download, configure for a particular environment and run.
So far, Qualify and Power Tests are implemented.
Queries
The TPC-DS queries are not ready to use out of the box for all SQL engines. The specification allows some minor fixes to make them executable for a particular database. To avoid keeping a different version of queries, I decided to modify them on the fly. I also apply only changes possible to implement through the simple string or regular expression text replacement.
Test Validation
I was unable to match the answer data set provided against any output. It requires further investigation. So I decided to use Oracle output as a reference result set. Unfortunately, even Oracle result does match all of them and is not fully reliable. The life is never an easy road free of stone.
Next steps
Throughput and Data Maintenance tests.

Brak komentarzy:

Publikowanie komentarza