javahotel: Hadoop and SQL engines

poniedziałek, 9 października 2017

Hadoop and SQL engines

Every Hadoop distribution comes with several SQL engines. So I decided to create a simple test to compare them. Run the same queries against the same data set. So far I have been working with two Hadoop distribution, BigInsights 4.x and HDP 2.6.2 with Big SQL 5.0.1. The second is now the successor of BigInsights.
I was comparing the following SQL engines:

MySQL, embedded
Hive against data in different format: text files, Parquet and OCR
Big SQL on Hive tables, Parquet and OCR
Spark SQL
Phoenix, SQL engine for HBase.

It is not any kind of benchmarking, the purpose is not to prove the superiority of one SQL engine over another. I also haven't done any kind of tunning or reconfiguration to speed up. Just to conduct a simple check after installation and have several numbers at hand.

The test description and several results are here.

Although I do not claim any ultimate authority here, I can provide several conclusions.

Big SQL is a winner. Particularly comparing to Hive. Very important: Big SQL is running on the same physical data, the only difference is a different computational model. It even beats MySQL. But, of course, MySQL will get the upper hand for OLTP requests.
Hive behaves much better paired with TEZ. On the other hand, the execution time is very fluid, can change from one execution to another drastically.
Spark SQL is outside competition but it is hard to outmatch in-memory execution.
Phoenix SQL is at the end of the race, but the execution time is very stable.

javahotel

poniedziałek, 9 października 2017

Hadoop and SQL engines

Brak komentarzy:

Prześlij komentarz

Etykiety

Civilization The Board Game

Archiwum bloga

O mnie