Blog do projektu Open Source JavaHotel

środa, 29 sierpnia 2012

DB2, PostgreSQL, FDW and Linux 64 bit

Introduction
FDW (foreign data wrapper) is a feature in PostgreSQL which allows access to external resources as if they were relational tables in PostreSQL database. There is also ODBC wrapper available so I was interested if it is possible to configure integration with DB2. Finally I was successful but it was a journey littered with troubles.
DB2 data server client
Download and install DB2 data server client. Then configure connection to DB2 database by means of CLP application. In my environment remote DB2 database SAMPLE is created on server think and access port is 50005, user/password db2inst3/db2inst3
db2 "catalog tcpip node DB2SAMPL remove think server 50005"
db2 "catalog database SAMPLE as DB2SAMPL at node DB2SAMPL"
Test the connection just defined:
db2 connect to DB2SAMPL user db2inst3 using db2inst3
db2 list tables
ODBC DB2 connection on Linux
We have to setup ODBC connection with DB2 database (local or remote) on the machine were PostgreSQL is installed.  Be sure that ODBC Linux version is installed, open source unixODBC is available for all Linux distribution. Next step is to configure ODBC access to DB2 database (DB2 Infocenter detailed information) . It can be done by GUI (ODBCConfig) or manually. You have to identify where libdb2.so ODBC driver is located and add the driver definition to /etc/odbcinst.ini file. In my environment this definition looks like:
[DB2]
Description  = UDB 10.1
Driver  = 
Driver64  = /home/opt/ibm/db2/V10.1/lib64/libdb2.so
Setup  = 
Setup64  = 
UsageCount  = 1
CPTimeout  = 0
CPTimeToLive  = 0
DisableGetFunctions  = 0
DontDLCLose  = 1
ExFetchMapping  = 1
Threading  = 3
FakeUnicode  = 0
IconvEncoding  = 
Trace  = 
TraceFile  = 
TraceLibrary  = 
Then define ODBC connection to DB2 database. Add an entry to /etc/odbc.ini file. My entry looks like:
[DB2SAMPL]
Driver = DB2
Database = SAMPLE
Servername = think
UID = db2inst3
PWD = db2inst3
Port = 50005
Next step is to verify that ODBC connection is working by means of isql command :
isql db2sampl db2inst3 db2inst3
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> select * from emp
The last command should yield the content of EMP table created in DB2 SAMPLE database.
Additional remark. In order to have it running we have to catalog DB2 connection using DB2 CLP and ODBC connection also. Also the database alias used in CLP (DB2SAMPL here) should match the DSN name in ODBC. Otherwise the ODBC connection is not working.
Prerequisites in PostgreSQL 
Together with PostgreSQL server additional packages should be installed. In my environment (Fedora 14) I installed these packages via commands:
yum install postgresql-contrib
yum install libpq-dev
yum install postgresql-server-dev.9.1
Make sure that pg_config utility is linked to entry in /usr/bin (or any other public PATH directory). It was not done automatically in my environment.
/usr/bin/pg_config -> /usr/pgsql-9.1/bin/pg_config
Install fdw_odbc extension
 Download solution from repository. Then run command (as root user or via sudo)
make USE_PGXS=1 install
Output:
/bin/mkdir -p '/usr/pgsql-9.1/lib'
/bin/mkdir -p '/usr/pgsql-9.1/share/extension'
/bin/sh /usr/pgsql-9.1/lib/pgxs/src/makefiles/../../config/install-sh -c -m 755  odbc_fdw.so '/usr/pgsql-9.1/lib/odbc_fdw.so'
/bin/sh /usr/pgsql-9.1/lib/pgxs/src/makefiles/../../config/install-sh -c -m 644 ./odbc_fdw.control '/usr/pgsql-9.1/share/extension/'
/bin/sh /usr/pgsql-9.1/lib/pgxs/src/makefiles/../../config/install-sh -c -m 644 ./odbc_fdw--1.0.sql  '/usr/pgsql-9.1/share/extension/'
Configure extension in PostgreSQL 
 Connect to PostgreSQL database and issue the following commands:
CREATE EXTENSION odbc_fdw;
CREATE SERVER odbc_db2 FOREIGN DATA WRAPPER odbc_fdw OPTIONS (dsn 'DB2SAMPL');
CREATE USER MAPPING FOR sb SERVER odbc_db2 OPTIONS (username 'db2inst3', password 'db2inst3');
First test 
To test connection create a simple table in DB2, for instance: DB2 CLP:
db2 "create table testt(x int)"
db2 "insert into testt values(1)"
db2 "insert into testt values(2)"
db2 "insert into testt values(3)"
Then define access to this table in PostgreSQL
test=# CREATE FOREIGN TABLE TESTT (X INT) SERVER odbc_db2 OPTIONS ( sql_query'select x from TESTT', sql_count 'select count(x) from TESTT', x 'X' );
CREATE FOREIGN TABLE
test=# select * from testt;
 x 
---
 1
 2
 3
(3 wiersze)
Next test 
Now define access to EMPLOYEE table in DB2 SAMPLE database (more information)
CREATE FOREIGN TABLE EMPLOYEE (EMPNO CHAR(6), FIRSTNME VARCHAR(12), WORKDEPT CHAR(3), SALARY DECIMAL(9,2)) SERVER odbc_db2 OPTIONS (sql_query 'select cast (empno as CHAR(7)) as empno,firstnme,cast (workdept as CHAR(4)) as workdept,replace(salary,'','',''.'') as salary from employee', sql_count 'select count(*) from employee', EMPNO 'EMPNO', FIRSTNME 'FIRSTNME', WORKDEPT 'WORKDEPT', SALARY 'SALARY')
Then run several statements:
select * from employee;
select * from employee where salary > 70000;
select count(empno),workdept from employee group by workdept order by count(empno);
select max(salary),min(salary),workdept from employee group by workdept;
select count(empno),workdept from employee group by workdept having count(empno) > 5;
One has to remember that foreign tables are "read-only" in PostgreSQL. Cannot be updated and do not participate in transactional processing (more details)
More complicated example
What about joins between foreign and native tables ?
Assume that together with official personal data in SAMPLE database the sleeky employer wants to keep some political incorrect information in a database.
So create additional table in PostgreSQL:

create table secret (empno char(6), politicalviews varchar(100));
insert into secret values('000010','Democratic');
insert into secret values('000020','Democratic');
insert into secret values('000030','Conservative');
insert into secret values('000050','Conservative');
insert into secret values('000060','Liberal');
Then run several statements:
select * from employee e,secret s  where e.empno = s.empno;
select e.*, s.politicalviews from employee e left outer join secret s on e.empno = s.empno;
select e.*, case when s.politicalviews is null then 'Unstable' else s.politicalviews end  from employee e left outer join secret s on e.empno = s.empno;
Let's check the number of people with specified and unspecified political views in the company
select workdept,sum(case when p=1 then 1 else 0 end) as unstable,sum(case when p=0 then 1 else 0 end) as stable  from (select workdept, case when s.politicalviews is null then 1 else 0 end as p from employee e left outer join secret s on e.empno = s.empno) as w group by workdept;
Output:
 workdept | unstable | stable 
----------+----------+----------
 A00      |        4 |        1
 B01      |        0 |        1
 C01      |        3 |        1
 D11      |       10 |        1
 E21      |        6 |        0
 D21      |        7 |        0
 E01      |        0 |        1
 E11      |        7 |        0
Let's try with VIEW:
create view companyandpolitics select workdept,sum(case when p=1 then 1 else 0 end) as unstable,sum(case when p=0 then 1 else 0 end) as stable  from (select workdept, case when s.politicalviews is null then 1 else 0 end as p from employee e left outer join secret s on e.empno = s.empno) as w group by workdept;
select * from politics;
Some additional remarks

CREATE FOREIGN TABLE EMPLOYEE (EMPNO CHAR(6), FIRSTNME VARCHAR(12), WORKDEPT CHAR(3), SALARY DECIMAL(9,2)) SERVER odbc_db2 OPTIONS (sql_query 'select cast (empno as CHAR(7)) as empno,firstnme,cast (workdept as CHAR(4)) as workdept,replace(salary,'','',''.'') as salary from employee', sql_count 'select count(*) from employee', EMPNO 'EMPNO', FIRSTNME 'FIRSTNME', WORKDEPT 'WORKDEPT', SALARY 'SALARY')

  • For some reason the last character is lost if the length of CHAR column is the same in DB2 and foreign table definition. I cannot tell where is the problem: in DB2, DB2 ODBC/CLI, fdw_odbc extension or FDW implementation in PostreSQL. The solution is simply to extend CHAR size in SELECT by DB2 CAST scalar function.
  • DECIMAL. DB2 ODBC/CLI maps DECIMAL column type to CHAR as a default.. In my locale as a decimal point , (colon) is used which is not acceptable by PostgreSQL. So the solution is to use DB2 REPLACE scalar function to replace , (colon) with . (dot). 
  • I did not test date, timestamp, UTF-8 string etc. But - if any problem arises - I hope that it is possible to overcome any issue using method described above. It is also possible to modify the defaults of DB2 ODBC/CLI connection - look for more details.
  • What about BLOB, LOB, XML column ?
  • EMPNO 'EMPNO' - use capital letters (not EMPNO 'empno'). DB2 ODBC/CLI returns all column names in upper case regardless the case used in SELECT statement.

Conclusion
Unfortunately, it was a painful job to setup this FDW for DB2. This solution is almost not documented, error handling is very poor (not existing at all). In case of any problem it can break the whole PostgreSQL or simply brings nothing without any error message. So I had to spend some time to dig through the source code and run this extension with debug mode enabled and additional logging. To my mind the extension requires more thorough test and improvements to be fully trusted.
But the result is amazing - we have a powerful tool for integrating PostgreSQL with DB2.

środa, 22 sierpnia 2012

DB2, HADR and Java

Introduction
HADR (High Availability and Disaster Recovery) is working well with C/C++ client but what about Java client. No problem, DB2 is also supporting Java without any problem for both DataSource and DriverManager client. It is described with details here.
Example for DriverManager client
I created a simple console Java application which connects to DB2 HADR controlled database and executes several JDBC action. Full source code is available here. How to setup a simple HADR configuration is described here.
Sample application description
There is a Java interfaces encapsulating some JDBC action.

public interface IJDBCAction {

        /**
         * Connect to database
         */
        public void connect();

        /**
         * Disconnect
         */
        public void disconnect();

        /**
         * Add new person to PERSON table.
         * 
         * @param re
         *            Person data to add (only first and family name should be
         *            filled)
         */
        public void addPerson(PersonRecord re);

        /**
         * Return the list of the persons
         * 
         * @return List of persons.
         */
        List<PersonRecord> getList();

        /**
         * Create PERSON table
         */
        public void createTable();

        /**
         * Drop PERSON table
         */
        public void dropTable();

        /**
         * Get current connection status
         * 
         * @return String describing the current status
         */
        public String getStatus();

        /**
         * Get the status of the last executed statement
         * 
         * @return String describing the status of the last statement
         */
        public String getLastResult();

        /**
         * Set autocommit mode
         * @param on true: set autocommit on, off otherwise
         */
        public void setAutocommit(boolean on);
}
There is also a concrete implementation of this interface, factory providing this implementation and simple console application with UI for this JDBC actions. Pay attention that it is the responsibility of the   IJDBCAction user to provide valid Connection reference. This way IJDBCAction implementation is free of any configuration details and can be reused in different context.
The application itself:

Connection status: Not connected
1) Connect
2) Disconnect
3) Read list
4) Add person
5) Create PERSON table
6) Drop PERSON table
7) Set autocommit on
8) Set autocommit off
99) Exit
There is nothing special in it, everything is a typical JDBC application. Because Java client does not use CLP connection pools, to make it HADR enabled connection URL string should be extended. It is described here.
The code sample used in the application.

  private static class GetConn implements IGetConnection {

                private final String user = "db2had1";
                private final String password = "db2had1";
                private final String url = "jdbc:db2://think:50001/sample";

                /** Parameters for HADR stand by, server name and port. */
                private final String serverAlternate = "think";
                private final int portAlternate = 50002;

                @Override
                public Connection getCon() throws SQLException, ClassNotFoundException {
                        Class.forName("com.ibm.db2.jcc.DB2Driver");
                        // Construct URL referencing HADR environment
                        String u = url
                                        + ":"
                                        + com.ibm.db2.jcc.DB2BaseDataSource.propertyKey_clientRerouteAlternateServerName
                                        + "=" + serverAlternate + ";";
                        u += com.ibm.db2.jcc.DB2BaseDataSource.propertyKey_clientRerouteAlternatePortNumber
                                        + "=" + portAlternate + ";";
                        // Connect !
                        Connection con = DriverManager.getConnection(u, user, password);
                        return con;
                }
        }
To make connection HADR enabled one has to add two additional parameters with server and port name for HADR stand by server and that is final. Nothing more is necessary.
Making the application HADR failover transparent
Extending URL string with HADR related parameters causes only that after HADR failover application is automatically redirected to another server. But current transaction is rollbacked with SQL error message thrown:
[jcc][t4][2027][11212][4.7.85] Użycie połączenia nie powiodło się, lecz połączenie to zostało ustanowione ponownie. Nazwa hosta lub adres IP to "think", a nazwa usługi lub numer portu to 50 001.
Ponowna próba ustawienia rejestrów specjalnych może, lecz nie musi zostać podjęta (kod przyczyny = 1). ERRORCODE=-4498, SQLSTATE=08506
The solution is to replay the last transaction. But the question could be raised - how to accomplish this task automatically without any user or db admin intervention, just make this problem completely transparent. There could be a lot of different ideas - the idea implements in this sample application is to use command design pattern. To run any JDBC related task one has to extend an abstract class:
/**
 * Command design pattern, abstract command
 * 
 * @author sbartkowski
 * 
 */
abstract class SqlCommand {

        protected final SQLCommandRunTime context;

        /** Error code 'connection reestablished' */
        private final int RETRYCODE = -4498;

        /**
         * Constructor
         * 
         * @param context
         *            SQLCommandRunTime context
         */
        SqlCommand(SQLCommandRunTime context) {
                this.context = context;
        }

        /**
         * Transactional command to be executed.
         * 
         * @throws SQLException
         *             SQLException thrown
         */
        abstract protected void command() throws SQLException;

        /**
         * Executes command, encapsulation around lower level transactional code.
         * Implements logic related to HADR failover
         */
        void runCommand() {
                if (context.con == null) {
                        context.lastResult = "Cannot execute, not connected";
                        return;
                }
                try {
                        command();
                        context.lastResult = "Command executed successfully";
                } catch (SQLException e) {
                        int errCode = e.getErrorCode();
                        if (errCode == RETRYCODE) {
                                // run again command in case of reestablish error
                                try {
                                        command();
                                        context.lastResult = "Command executed successfully";
                                } catch (SQLException e1) {
                                        // do not try again, throw exception unconditionally
                                        context.lastResult = e.getMessage();
                                        e.printStackTrace();
                                }
                        } else {
                                // throws exception if error code is not reestablish error
                                context.lastResult = e.getMessage();
                                e.printStackTrace();
                        }

                }
        }
}
This way all logic related to 'connection reestablished' SQL error is implemented in one place. Pay attention that command is replayed only once, second execution of the transaction should be completed without any problem. If it does not happen then it is much better to raise an exception at once without any delay.
Conclusion

  • It is quite simple to enable Java application for HADR, only simple URL string extension is required.
  • This sample application keeps all connection details hardcoded in the source code. In the most business application database connection details are reconfigurable outside the source code but by all means no problem also to extend it to HADR related parameters.
  • This example is related only to connection extracted from DriverManager. DataSource connection (typical for web application) should be enabled for HADR another way.
  • Making application HADR failover transparent requires proper handling of "A connection failed but has been reestablished." error exception. It is quite simple if application is started from the very beginning (for example by using the solution described above) but could be complicated in case of legacy code.

niedziela, 19 sierpnia 2012

JspWiki and Bach Cantatas

I started to collect Bach Cantatas recordings on YouTube service hoping to collect them all. The result of my research is available on JSPWiki implementation.
There are a lot recordings available on YouTube but I'm not happy with a lot of them. Some of them are partial  (only one movement), some of them are of very poor quality (for instance vinyl records ripped and uploaded to YouTube). There is almost complete recordings of Bach Cantatas made by Nikolaus Harnoncourt and Gustav Leonhardt but the quality is not perfect. I mean, of course, the quality of recording uploaded to You Tube - not the quality of performance which is excellent.
So I started to collect recording I'm happy with using the following rules:

  • Completed (not single movement).
  • Good recording quality. It goes without saying that 'quality' here reflects my personal taste.

środa, 8 sierpnia 2012

CellTree and CellTable

Introduction
I added a support for CellTree in my JavaHotel framework. Demo is available here. Source code containing  implementation is available here.
Of course - there is nothing special in it - samples are available together with the javadoc. My purpose was to make table view "tree view" enabled.  For some data tree (hierarchical) view is often more accurate. For instance - chart of accounts - has hierarchical structure. Some accounts are used as a collective for more specialized accounts related to them. Or we can have ledger accounts and subledger (subsidiary ledger) containing more specialized information.
On the other hand flat structure is also very convenient.
More information
So this is why this extension was created. We can make some tables (for instance char of accounts) to be "tree enabled" but all the time having the property to see them as a flat table.
The main problem was to include information about tree structure inside the table to avoid keeping additional metadata.
It was achieved by adding additional numeric column to the table. This column contains two information: whether the record is tree node or a leaf and depth of the node (leaf). The tree structure reads from the beginning to the end. If the column contains number greater or equal then 100 it is regarded as a node and leaf otherwise. The number itself (minus 100 for nodes) is regarded as a node (leaf) depth. So the notation is as follows: node is the first and then all next records with the level + 1 are its children until first record (node or leaf) with the level less then the starting node is detected.
Example (* marks the node) :


*A     (number = 100, node and level = 0)
 A1    (number = 1, leaf and level = 1
 A2    (number = 1, next leaf and level = 1)
 *A3   (number = 101, node and level = 1)
  A31  (number = 2, leaf and level = 2, child of the node A3)
  A32  (number = 3, next leaf and level = 2)
 A4    (number = 1, leaf and level = 1, next child for node A)
B      (number = 0, lead and level = 0, next child for the root) 
Additional info 

  • There is no need for this additional column to be displayed.
  • In the table (tree) view header additional property menu is added. It allows to switch between table view and tree view.
  • The disadvantage of the tree view is that a record filter cannot be applied. After removing some record (for instance nodes) it is unlikely to keep the proper tree structure. So when table view is filtered the switching to tree view is blocked.