Blog do projektu Open Source JavaHotel

czwartek, 29 grudnia 2016

IBM InfoSphere Streams, BigSql and HBase

Some time ago I created Java methods to load data into HBase using format understandable by BigSql. Now it is high time to move ahead and to create IBM InfoSphere operator making usage of this solution.
The Streams solution is available here. The short description is added here.
JConvert operator
JConvert operator does not load data into HBase, it should precede HBASEPut operator.
JConvert accepts one or more input streams and every input stream should have corresponding output stream. It simply encodes every attribute in the input stream to blob (binary) attribute in the output stream. The binary value is later loaded to HBase table by HBasePut operator.
Very important factor is to coordinate JConvert input stream with target HBase/BigSql table. Neither JConvert nor HBasePut can do that, if attributes and column types do not match then BigSql will not read the table properly. Conversion rules are explained here.
This operator is used for testing, it also contains a lot of usage examples.
Simple BigSql/HBase loading scenario.

On the left there is the producer, then JConvert translates all input attributes into binary format and HBasePut operator load binaries to HBase table.
More details about TestHBaseN.

poniedziałek, 26 grudnia 2016

Oracle -> DB2 migration, useful Java program

Some time ago I created a set of useful awk scripts to ease Oracle to DB2 migration. But facing the same task, I decided to rewrite it as Java program. The main disadvantage of the previous solution is that awk analyzes input file line after line. So, for an example, if CREATE OR REPLACE clause is split into two or more lines, the problem starts to be very complicated.
Java program
Java program is available as ready to use Java package or as an Eclipse project to be cloned and updated.
Java solution and a brief description are available as an open source project here. The "two lines problem" is resolved by means of simple Tokenizer. It decomposes input file to separate entities across lines boundaries: words, special characters like (, ),; etc. Then Extractor can pick up words one after one and recognize the beginning and end of particular SQL objects. Tokenizer is also reused during source file fixing.
Oracle Compatibility Mode in DB2 allows executing PL/SQL almost out of the box. Nevertheless, some common adjustment should be applied. A typical example is VARCHAR2(32767) or VARCHAR2(MAX). In DB2 the MAX is not supported and boundary for VARCHAR2 is 32672. If there are several occurrences of it no problem to fix it manually, but it could be the challenge in the case of hundreds or thousands of them.
So after splitting the input Oracle source file into separate SQL object, a set of adjustment is executed over them. Every adjustment is a single Java class implementing IFix interface. Current adjustments are stored in org.migration.fix.impl package. Every class should be registered in MainExtract.
This way, assuming that project is cloned as Eclipse project, adjustments can be easily extended.
DB2 comparison
As a part of the solution not covered by previous awk implementation is DB2 comparison tool. It compares the list of Oracle objects against the list of objects migrated and deployed into DB2 database. In the case of hundreds or thousands of objects, it is easy to miss something.
I was using this solution during real migration and found it very useful. It was worth spending several days to implement it. The most important part is the feature allowing easy extension of the existing set of source file adjustment. I was able to quickly implement next adjustment (together with Junit test) immediately after finding next problem in source Oracle code.