Blog do projektu Open Source JavaHotel

środa, 28 listopada 2012

DB2 optimizer

Introduction
Let's take a look at the simple table :
create table testts (ide int, name varchar2(100))
create index ideindx on testts(ide)
and simple statement:
create table testts (ide int, name varchar2(100))
select * from testts where ide=10
In order to execute this statement DB2 (like any other database engine) creates "access plan". Access plan is simply SQL statement decomposed into more atomic operations like "scan table", "apply predicate", "look up in the index", "acquire lock" etc. The simple SQL statement above can be executed in two ways:
  1. Scan the table and find rows which fulfill the predicate.
  2. Look up the index and then pick up row(s) from the table. 
Every way brings the same result and DB2 should choose the best one. It is obvious that the second method is better then the first. But in case of more complicated statements there are more possible ways to get result and it is not always obvious which method is the best. The engine which creates and analyzes different access plans for SQL statements is called "DB2 optimizer". The decision is based on "cost". To every basic operation in access plan a cost is attached (related to resource consumption like I/O or CPU). The best access plan is the access plan having the lowest overall cost of execution. In most cases it means that this access plan executes faster then the others but literally the best access plan has the most efficient way of resource consumption.
Problem
But take a closer look at this simple example.
To populate this table with some test data I created a simple procedure
CREATE OR REPLACE PROCEDURE INSERTS (NUMB IN INT) 
AS
  MAXI INT;
  I INT;
BEGIN
  SELECT MAX(IDE) INTO MAXI FROM TESTTS;
  IF MAXI IS NULL THEN
    MAXI := 0;
  END IF;  
  FOR I IN 1..NUMB LOOP
    INSERT INTO TESTTS VALUES(MAXI + I, 'AUTO NAME ' || (MAXI + I));
  END LOOP;  

END;
Add one record to the table and check access plan for this statement.
call inserts(1)
runstats on table testts
explain plan for select * from testts where ide=10
Using db2exfmt utility we can export access plan in human readable format. It is available here.
Access Plan:
-----------
 Total Cost:   6.87148
 Query Degree:  1

      Rows 
     RETURN
     (   1)
      Cost 
       I/O 
       |
        1 
     TBSCAN
     (   2)
     6.87148 
        1 
       |
        1 
 TABLE: DB2INST3
     TESTTS
       Q1
But there there is a surprise. Although index was created, DB2 optimizer decided that table scanning is a better option. After adding more rows:
call inserts(10) 
runstats on table testts
the access plan is as one could expect.

Total Cost:   6.87692
 Query Degree:  1

             Rows 
            RETURN
            (   1)
             Cost 
              I/O 
              |
               1 
            FETCH 
            (   2)
            6.87692 
               1 
         /----+----\
        1            11 
     IXSCAN    TABLE: DB2INST3
     (   3)        TESTTS
    0.0538374        Q1
        0 
       |
       11 
 INDEX: DB2INST3
     IDEINDX
       Q1
What's going on 
Obviously DB2 decides that scanning table is better option for one row but worse for eleven rows. But how to check it ? By virtue of optimization guidelines (practical example is described here) we can force DB2 engine to use index for one row and scan table for eleven rows.
Access plan for one row with index lookup is available here. General cost in case of index lookup is 6.87458 and is a little greater then table scanning (6.87148). It is so because index lookup requires two steps : index lookup and later table access to pick up rows. Table scanning requires only one step. Although estimated I/O cost is the same (equal to 1), CPU consumption is higher for two steps (57021 aginsts 54482).
Access plan for eleven rows ignoring index is available here. This time total cost is equal to 6.89792 and is higher then total cost with index lookup : 6.87692. Although the dilemma is the same : two steps (index and table) against one step (table only) the optimizer decides that predicated CPU consumption for scanning through 11 rows in search for one row is higher. Cost for table scanning here is : 76152. CPU cost for index lookup and picking up one row is : 58941.
But add much more rows to the table and again check access plan for this SQL statement with and without index.
call inserts(100000) 
runstats on table testts
Access plan with index lookup (default) is available here. The access plan with force table scanning is available here. The difference is overwhelming: 665.297 against 30.3052. But the main factor are not subtle differences in CPU consumption but obviously I/O cost :430 against 1. It goes without saying that I/O access with table scanning is more extensive than index lookup. This difference in also reflected in SQL real execution (DB10.1, 32 bit Ubuntu machine, 2GB memory).
 (table scanning)
$ time db2 "select * from testts where ide=10" 


IDE         NAME                                                                                                
----------- ---------------------------------------------------------------------------------------------------- 
         10 AUTO NAME 10                                                                                        

  Wybrano rekordów: 1. 


real 0m0.155s 
user 0m0.016s 
sys 0m0.032s 
(default, index lookop)
$ time db2 "select * from testts where ide=10" 

IDE         NAME                                                                                                
----------- ---------------------------------------------------------------------------------------------------- 
         10 AUTO NAME 10                                                                                        

  Wybrano rekordów: 1. 


real 0m0.062s 
user 0m0.028s 
sys 0m0.020s
Execution is more then 2 times faster with index lookup and probable more rows in the table the difference is more distinct.
Conclusion 
In this simple example one can see how DB2 optimizer works and how the decision is taken which access plan to choose. But it is important to underline that the decision is based on resource consumption and it not always goes with execution speed. This distinction is very important because sometimes the best execution time is achieved by more extensive CPU consumption. It does not matter in a test environment but is not always a good decision in production, multiuser and mulitiprocess environment. This fastest and CPU thirsty access plan can starve to death the other processes.

Brak komentarzy:

Prześlij komentarz