Blog do projektu Open Source JavaHotel

środa, 29 sierpnia 2012

DB2, PostgreSQL, FDW and Linux 64 bit

Introduction
FDW (foreign data wrapper) is a feature in PostgreSQL which allows access to external resources as if they were relational tables in PostreSQL database. There is also ODBC wrapper available so I was interested if it is possible to configure integration with DB2. Finally I was successful but it was a journey littered with troubles.
DB2 data server client
Download and install DB2 data server client. Then configure connection to DB2 database by means of CLP application. In my environment remote DB2 database SAMPLE is created on server think and access port is 50005, user/password db2inst3/db2inst3
db2 "catalog tcpip node DB2SAMPL remove think server 50005"
db2 "catalog database SAMPLE as DB2SAMPL at node DB2SAMPL"
Test the connection just defined:
db2 connect to DB2SAMPL user db2inst3 using db2inst3
db2 list tables
ODBC DB2 connection on Linux
We have to setup ODBC connection with DB2 database (local or remote) on the machine were PostgreSQL is installed.  Be sure that ODBC Linux version is installed, open source unixODBC is available for all Linux distribution. Next step is to configure ODBC access to DB2 database (DB2 Infocenter detailed information) . It can be done by GUI (ODBCConfig) or manually. You have to identify where libdb2.so ODBC driver is located and add the driver definition to /etc/odbcinst.ini file. In my environment this definition looks like:
[DB2]
Description  = UDB 10.1
Driver  = 
Driver64  = /home/opt/ibm/db2/V10.1/lib64/libdb2.so
Setup  = 
Setup64  = 
UsageCount  = 1
CPTimeout  = 0
CPTimeToLive  = 0
DisableGetFunctions  = 0
DontDLCLose  = 1
ExFetchMapping  = 1
Threading  = 3
FakeUnicode  = 0
IconvEncoding  = 
Trace  = 
TraceFile  = 
TraceLibrary  = 
Then define ODBC connection to DB2 database. Add an entry to /etc/odbc.ini file. My entry looks like:
[DB2SAMPL]
Driver = DB2
Database = SAMPLE
Servername = think
UID = db2inst3
PWD = db2inst3
Port = 50005
Next step is to verify that ODBC connection is working by means of isql command :
isql db2sampl db2inst3 db2inst3
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> select * from emp
The last command should yield the content of EMP table created in DB2 SAMPLE database.
Additional remark. In order to have it running we have to catalog DB2 connection using DB2 CLP and ODBC connection also. Also the database alias used in CLP (DB2SAMPL here) should match the DSN name in ODBC. Otherwise the ODBC connection is not working.
Prerequisites in PostgreSQL 
Together with PostgreSQL server additional packages should be installed. In my environment (Fedora 14) I installed these packages via commands:
yum install postgresql-contrib
yum install libpq-dev
yum install postgresql-server-dev.9.1
Make sure that pg_config utility is linked to entry in /usr/bin (or any other public PATH directory). It was not done automatically in my environment.
/usr/bin/pg_config -> /usr/pgsql-9.1/bin/pg_config
Install fdw_odbc extension
 Download solution from repository. Then run command (as root user or via sudo)
make USE_PGXS=1 install
Output:
/bin/mkdir -p '/usr/pgsql-9.1/lib'
/bin/mkdir -p '/usr/pgsql-9.1/share/extension'
/bin/sh /usr/pgsql-9.1/lib/pgxs/src/makefiles/../../config/install-sh -c -m 755  odbc_fdw.so '/usr/pgsql-9.1/lib/odbc_fdw.so'
/bin/sh /usr/pgsql-9.1/lib/pgxs/src/makefiles/../../config/install-sh -c -m 644 ./odbc_fdw.control '/usr/pgsql-9.1/share/extension/'
/bin/sh /usr/pgsql-9.1/lib/pgxs/src/makefiles/../../config/install-sh -c -m 644 ./odbc_fdw--1.0.sql  '/usr/pgsql-9.1/share/extension/'
Configure extension in PostgreSQL 
 Connect to PostgreSQL database and issue the following commands:
CREATE EXTENSION odbc_fdw;
CREATE SERVER odbc_db2 FOREIGN DATA WRAPPER odbc_fdw OPTIONS (dsn 'DB2SAMPL');
CREATE USER MAPPING FOR sb SERVER odbc_db2 OPTIONS (username 'db2inst3', password 'db2inst3');
First test 
To test connection create a simple table in DB2, for instance: DB2 CLP:
db2 "create table testt(x int)"
db2 "insert into testt values(1)"
db2 "insert into testt values(2)"
db2 "insert into testt values(3)"
Then define access to this table in PostgreSQL
test=# CREATE FOREIGN TABLE TESTT (X INT) SERVER odbc_db2 OPTIONS ( sql_query'select x from TESTT', sql_count 'select count(x) from TESTT', x 'X' );
CREATE FOREIGN TABLE
test=# select * from testt;
 x 
---
 1
 2
 3
(3 wiersze)
Next test 
Now define access to EMPLOYEE table in DB2 SAMPLE database (more information)
CREATE FOREIGN TABLE EMPLOYEE (EMPNO CHAR(6), FIRSTNME VARCHAR(12), WORKDEPT CHAR(3), SALARY DECIMAL(9,2)) SERVER odbc_db2 OPTIONS (sql_query 'select cast (empno as CHAR(7)) as empno,firstnme,cast (workdept as CHAR(4)) as workdept,replace(salary,'','',''.'') as salary from employee', sql_count 'select count(*) from employee', EMPNO 'EMPNO', FIRSTNME 'FIRSTNME', WORKDEPT 'WORKDEPT', SALARY 'SALARY')
Then run several statements:
select * from employee;
select * from employee where salary > 70000;
select count(empno),workdept from employee group by workdept order by count(empno);
select max(salary),min(salary),workdept from employee group by workdept;
select count(empno),workdept from employee group by workdept having count(empno) > 5;
One has to remember that foreign tables are "read-only" in PostgreSQL. Cannot be updated and do not participate in transactional processing (more details)
More complicated example
What about joins between foreign and native tables ?
Assume that together with official personal data in SAMPLE database the sleeky employer wants to keep some political incorrect information in a database.
So create additional table in PostgreSQL:

create table secret (empno char(6), politicalviews varchar(100));
insert into secret values('000010','Democratic');
insert into secret values('000020','Democratic');
insert into secret values('000030','Conservative');
insert into secret values('000050','Conservative');
insert into secret values('000060','Liberal');
Then run several statements:
select * from employee e,secret s  where e.empno = s.empno;
select e.*, s.politicalviews from employee e left outer join secret s on e.empno = s.empno;
select e.*, case when s.politicalviews is null then 'Unstable' else s.politicalviews end  from employee e left outer join secret s on e.empno = s.empno;
Let's check the number of people with specified and unspecified political views in the company
select workdept,sum(case when p=1 then 1 else 0 end) as unstable,sum(case when p=0 then 1 else 0 end) as stable  from (select workdept, case when s.politicalviews is null then 1 else 0 end as p from employee e left outer join secret s on e.empno = s.empno) as w group by workdept;
Output:
 workdept | unstable | stable 
----------+----------+----------
 A00      |        4 |        1
 B01      |        0 |        1
 C01      |        3 |        1
 D11      |       10 |        1
 E21      |        6 |        0
 D21      |        7 |        0
 E01      |        0 |        1
 E11      |        7 |        0
Let's try with VIEW:
create view companyandpolitics select workdept,sum(case when p=1 then 1 else 0 end) as unstable,sum(case when p=0 then 1 else 0 end) as stable  from (select workdept, case when s.politicalviews is null then 1 else 0 end as p from employee e left outer join secret s on e.empno = s.empno) as w group by workdept;
select * from politics;
Some additional remarks

CREATE FOREIGN TABLE EMPLOYEE (EMPNO CHAR(6), FIRSTNME VARCHAR(12), WORKDEPT CHAR(3), SALARY DECIMAL(9,2)) SERVER odbc_db2 OPTIONS (sql_query 'select cast (empno as CHAR(7)) as empno,firstnme,cast (workdept as CHAR(4)) as workdept,replace(salary,'','',''.'') as salary from employee', sql_count 'select count(*) from employee', EMPNO 'EMPNO', FIRSTNME 'FIRSTNME', WORKDEPT 'WORKDEPT', SALARY 'SALARY')

  • For some reason the last character is lost if the length of CHAR column is the same in DB2 and foreign table definition. I cannot tell where is the problem: in DB2, DB2 ODBC/CLI, fdw_odbc extension or FDW implementation in PostreSQL. The solution is simply to extend CHAR size in SELECT by DB2 CAST scalar function.
  • DECIMAL. DB2 ODBC/CLI maps DECIMAL column type to CHAR as a default.. In my locale as a decimal point , (colon) is used which is not acceptable by PostgreSQL. So the solution is to use DB2 REPLACE scalar function to replace , (colon) with . (dot). 
  • I did not test date, timestamp, UTF-8 string etc. But - if any problem arises - I hope that it is possible to overcome any issue using method described above. It is also possible to modify the defaults of DB2 ODBC/CLI connection - look for more details.
  • What about BLOB, LOB, XML column ?
  • EMPNO 'EMPNO' - use capital letters (not EMPNO 'empno'). DB2 ODBC/CLI returns all column names in upper case regardless the case used in SELECT statement.

Conclusion
Unfortunately, it was a painful job to setup this FDW for DB2. This solution is almost not documented, error handling is very poor (not existing at all). In case of any problem it can break the whole PostgreSQL or simply brings nothing without any error message. So I had to spend some time to dig through the source code and run this extension with debug mode enabled and additional logging. To my mind the extension requires more thorough test and improvements to be fully trusted.
But the result is amazing - we have a powerful tool for integrating PostgreSQL with DB2.

Brak komentarzy:

Prześlij komentarz