Blog do projektu Open Source JavaHotel

wtorek, 30 października 2018

BigSQL, joins and partition elimination.

Partition elimination in joins
One of the advantages of BigSQL over Hive is enabling partition elimination also during join execution. It is described in this article.
It can be explained using a simple example.
db2 "insert into p_x values('a',0)"
db2 "insert into p_x values('b',1)"
db2 "insert into p_x values('c',2)"
db2 "insert into p_x values('d',3)"
db2 "insert into p_x values('e',4)"
db2 "insert into p_x values('f',5)"
db2 "insert into p_x values('g',5)"
db2 "insert into p_x values('h',6)"
db2 "insert into p_x values('i',6)"
db2 "insert into p_x values('j',7)"
db2 "insert into p_x values('k',8)"
db2 "insert into p_x values('l',9)"
db2 "CREATE HADOOP TABLE p_y(n VARCHAR(64),x int)"
db2 "insert into p_y values('e',4)"
db2 "insert into p_y values('f',5)"
db2 "insert into p_y values('g',5)"
Then run a straightforward join query.
db2 "select * from p_x,p_y where p_x.x = p_y.x"
The range of values in the p_y table is 4 to 5 so in order to resolve this join, it is enough to scan only corresponding partitions in p_x table.
It can be detected by looking into BigSQL log. But firstly DEBUG level for BigSQL Scheduler should be turned on.

BigSQL->Configs->Advanced bigsql-log4j The following two parameters should be modified:
Then BigSQL is to be restarted to get new settings taking effect.
The query partition elimination is reported in /var/ibm/bigsql/logs/bigsql-sched.log
DEBUG [pool-1-thread-5] : found column with partition-key. db2ColumnIndex: 1
DEBUG [pool-1-thread-5] : [createScanState] Partition elimination expr:  (  ( x <= 5 )  AND  ( x >= 4 )  ) 
DEBUG [pool-1-thread-5] : found default/dummy partition. skipping. HdfsPartition{fileDescriptors=[]}
[eliminatePartition]partition elimination: checking partition: x=0
[eliminatePartition]partitionEliminated? x=0 true
[eliminatePartition]partition elimination: checking partition: x=1
[eliminatePartition]partitionEliminated? x=1 true
[eliminatePartition]partition elimination: checking partition: x=2
[eliminatePartition]partitionEliminated? x=2 true
[eliminatePartition]partition elimination: checking partition: x=3
[eliminatePartition]partitionEliminated? x=3 true
[eliminatePartition]partition elimination: checking partition: x=4
[eliminatePartition]partitionEliminated? x=4 false
The locations are: [TScanRangeLocation(host_idx:0, volume_id:-1), TScanRangeLocation(host_idx:1, volume_id:-1)]
[eliminatePartition]partition elimination: checking partition: x=5
[eliminatePartition]partitionEliminated? x=5 false
[createScanState] The locations are: [TScanRangeLocation(host_idx:2, volume_id:-1), TScanRangeLocation(host_idx:1, volume_id:-1)]
[createScanState] The locations are: [TScanRangeLocation(host_idx:2, volume_id:-1), TScanRangeLocation(host_idx:0, volume_id:-1)]
[eliminatePartition]partition elimination: checking partition: x=6
[eliminatePartition]partitionEliminated? x=6 true
[eliminatePartition]partition elimination: checking partition: x=7
[eliminatePartition]partitionEliminated? x=7 true
[eliminatePartition]partition elimination: checking partition: x=8
[eliminatePartition]partitionEliminated? x=8 true
[eliminatePartition]partition elimination: checking partition: x=9
[eliminatePartition]partitionEliminated? x=9 true
[createScanState]Finished partition elimination. partition-elimination-stats (eliminated/total): 8 / 10 Took 124 milliseconds
2018-10-30 00:51:47,775 INFO [pool-1-thread-5] : [removeReadLock(String)]Removed Read Lock on table sb.p_x: 0
2018-10-30 00:51:47,775 DEBUG [pool-1-thread-5] : [assignSplits] Workers node-numbers: [1, 2, 3]
2018-10-30 00:51:47,775 DEBUG [pool-1-thread-5] : [assignSplits] Workers ip to node-numbers: {[3],[2],[1]}
As one can see, all partitions except x=4 and x=5 are ignored during execution of the join query.
Bigger example
Of course, partition elimination is nice but what we are really interested in is the performance boost. How to measure it?
I created a simple project. The general idea is to prepare two huge tables, one non-partitioned and the second partitioned and run identical join query against the table.  The source code and detailed description can be found here.
The final result?
For non-partitioned table:
[sbartkowski@oc0522068411 bigjoin]$ time ./
Run runnonquery.sql script

real 0m16.521s
user 0m0.036s
sys 0m0.100s
For partitioned table and partition elimination in place:
[sbartkowski@oc0522068411 bigjoin]$ time ./
Run runnonquery.sql script

real 0m8.860s
user 0m0.037s
sys 0m0.095s
So the performance improved twice which should come as no surprise.
The same query submitted in Hive against the same tables executes as below. Obviously Hive has some way to go.
select D.ID,max( from monit.testdim AS D , monit.testpart AS T WHERE T.part=D.part GROUP BY D.ID;

|   |             _c1             |
| PART 5  | 2014-09-18 00:19:11.893212  |
| PART 2  | 2014-09-18 00:03:35.599702  |
| PART 4  | 2014-09-18 00:26:17.660366  |
| PART 3  | 2014-09-18 00:24:47.644779  |
4 rows selected (41,863 seconds)
The BigSQL is really big.