Blog do projektu Open Source JavaHotel

niedziela, 12 czerwca 2011

NTILE and DB2

Problem
NTILE is an analytic function. It is implemented in Oracle and MS/SQL database servers. It simply distributes rows into some number of buckets, something like dealing cards between players. Unfortunately it is not implemented in DB2 and there is no simple and obvious substitution for it. But it is possible to evaluate a similar solution in DB2 although it requires a little more consideration.

Solution
In DB2 it is not possible to write UDF which works as analytic/ranking function so we cannot provide exactly the same syntax.
But let's try to write a regular function which behaves like NTILE.

FUNCTION NTILE_FUN(A IN INTEGER, BUCKETNO IN INTEGER, NOFROWS IN INTEGER) 
  RETURN INTEGER
  AS
  MINROWS INTEGER;
  MINREST INTEGER;
  X INTEGER;
  FIRSTB INTEGER;
  LASTB INTEGER;
  ACTB INTEGER;
  BEGIN
     MINROWS := FLOOR(NOFROWS / BUCKETNO);
     MINREST := NOFROWS - (BUCKETNO * MINROWS);

     ACTB := 1;
     FIRSTB := 1;
     WHILE FIRSTB <= NOFROWS LOOP
       LASTB := FIRSTB + MINROWS - 1;
       IF MINREST > 0 THEN
         LASTB := LASTB + 1;
         MINREST := MINREST - 1;
       END IF;
       IF A <= LASTB THEN
         EXIT;
       END IF;
       FIRSTB := LASTB + 1;
       ACTB := ACTB + 1;
     END LOOP;      
     RETURN ACTB;
  END;

This function takes three parameters:
  • A : number of row to be inserted into bucket
  • BUCKETNO ; number of buckets
  • NOFROWS : number of all rows to be distributed
Returns the number of bucket (between 1 and BUCKETNO) where row A should be placed. So it works exactly the same way like NTILE. Probably it is possible to get rid of an iteration and provide some arithmetic to calculate the bucket number  but it works for me for the time being.

Examples
Now let's  take the following NTILE example.

select ntile(4)over(order by empno) grp,
         empno,
         ename
     from empno';

DB2 equivalence using function define above:

select NTILE_FUN(row_number( )over(order by empno),4,
             (SELECT COUNT(*) FROM empno)) grp,
         empno,
         ename
    from empno order by empno 
So in place of calling NTILE function we use NTILE_FUN function with appropriate  parameters. The most important point is using ranking function ROW_NUMBER to provide valid row number (the first parameter for NTILE_FUN).

Another example:

SELECT last_name, salary, NTILE(4) OVER (ORDER BY salary DESC) 
             AS quartile FROM empsalary
DB2 equivalence:
SELECT last_name, salary, 
             NTILE_FUN(row_number( )over(ORDER BY salary DESC),4, 
             (SELECT COUNT(*) FROM empsalary)) 
             AS quartile FROM empsalary
More complicated example with PARTITION BY:
SELECT
        Category,
        Weight,
        Entrant,
        NTILE(2) OVER (
            PARTITION BY Category
            ORDER BY Weight DESC
        ) AS Ntile
   FROM ContestResults
DB2 equivalence:
SELECT
        Category,
        Weight,
        Entrant,
        NTILE_FUN(row_number( )over(
            PARTITION BY Category
            ORDER BY Weight DESC),2, 
            (SELECT COUNT(*) FROM ContestResults as C WHERE C.Category = CC.Category) 
        ) AS Ntile
   FROM ContestResults AS CC
This time NOFROWS parameter is a little more complicated because we have to provide the number of record in one partition - not the number of records in the whole table.

Working code
The working code is available: DDL and SQL package.
Important: this code works in DB2 Oracle compatible mode. Unfortunately - this mode is not supported in DB2 Express (free edition) although is planned for the future. It is necessary to download and install DB2 trial (90-day).
After creating a database with PL/SQL support test tables should be created:
db2 -tvf create_ddl.sql
and deploy package:
db2 -td@ -vf create_package.sql
Then run command in order to see the output:
db2 set serveroutput on
and run example:
db2 call a_ntiletest.run_test1
  db2 call a_ntiletest.run_test2
  db2 call a_ntiletest.run_test3
  db2 call a_ntiletest.run_test4

Because this example is using PL/SQL syntax it is possible also to run it (after changing the value of ORACLE constant in package specification) in Oracle directly and compare the results.

1 komentarz:

  1. I saw no one commented but just wanted to say, this helped me a lot. Thank you!

    OdpowiedzUsuń