Blog do projektu Open Source JavaHotel

czwartek, 29 grudnia 2016

IBM InfoSphere Streams, BigSql and HBase

Some time ago I created Java methods to load data into HBase using format understandable by BigSql. Now it is high time to move ahead and to create IBM InfoSphere operator making usage of this solution.
The Streams solution is available here. The short description is added here.
JConvert operator
JConvert operator does not load data into HBase, it should precede HBASEPut operator.
JConvert accepts one or more input streams and every input stream should have corresponding output stream. It simply encodes every attribute in the input stream to blob (binary) attribute in the output stream. The binary value is later loaded to HBase table by HBasePut operator.
Very important factor is to coordinate JConvert input stream with target HBase/BigSql table. Neither JConvert nor HBasePut can do that, if attributes and column types do not match then BigSql will not read the table properly. Conversion rules are explained here.
This operator is used for testing, it also contains a lot of usage examples.
Simple BigSql/HBase loading scenario.

On the left there is the producer, then JConvert translates all input attributes into binary format and HBasePut operator load binaries to HBase table.
More details about TestHBaseN.

poniedziałek, 26 grudnia 2016

Oracle -> DB2 migration, useful Java program

Some time ago I created a set of useful awk scripts to ease Oracle to DB2 migration. But facing the same task, I decided to rewrite it as Java program. The main disadvantage of the previous solution is that awk analyzes input file line after line. So, for an example, if CREATE OR REPLACE clause is split into two or more lines, the problem starts to be very complicated.
Java program
Java program is available as ready to use Java package or as an Eclipse project to be cloned and updated.
Java solution and a brief description are available as an open source project here. The "two lines problem" is resolved by means of simple Tokenizer. It decomposes input file to separate entities across lines boundaries: words, special characters like (, ),; etc. Then Extractor can pick up words one after one and recognize the beginning and end of particular SQL objects. Tokenizer is also reused during source file fixing.
Oracle Compatibility Mode in DB2 allows executing PL/SQL almost out of the box. Nevertheless, some common adjustment should be applied. A typical example is VARCHAR2(32767) or VARCHAR2(MAX). In DB2 the MAX is not supported and boundary for VARCHAR2 is 32672. If there are several occurrences of it no problem to fix it manually, but it could be the challenge in the case of hundreds or thousands of them.
So after splitting the input Oracle source file into separate SQL object, a set of adjustment is executed over them. Every adjustment is a single Java class implementing IFix interface. Current adjustments are stored in org.migration.fix.impl package. Every class should be registered in MainExtract.
This way, assuming that project is cloned as Eclipse project, adjustments can be easily extended.
DB2 comparison
As a part of the solution not covered by previous awk implementation is DB2 comparison tool. It compares the list of Oracle objects against the list of objects migrated and deployed into DB2 database. In the case of hundreds or thousands of objects, it is easy to miss something.
I was using this solution during real migration and found it very useful. It was worth spending several days to implement it. The most important part is the feature allowing easy extension of the existing set of source file adjustment. I was able to quickly implement next adjustment (together with Junit test) immediately after finding next problem in source Oracle code.

sobota, 26 listopada 2016

Moving to Heroku

Finally, I decided to abandon Google App Engine and move on. The main reason for giving up this platform are:
  • Lack of support for Java 8, only Java 7 is available
  • "Whitelist" or black list. It sets a boundary and the software cannot grow. For instance: I'd like to add Scala support but Scala cannot be executed inside Google App Engine because it uses packages forbidden. And there is no way to bypass this restriction.
I came to the conclusion that being bound to this platform reduces my capabilities to grow. The glory of supporting one more environment does not prevail the losses.
In searching for alternatives my finger stopped at Heroku
I spent some time refactoring the application and removing redundant stuff and now the application is ready. I'm using free Heroku account and it can take some time until the application shows up.
Eclipse projects
I added the instruction how to set up Eclipse projects and run the sample application locally.
Tomcat deployment
I created build.xml file, so by running ant command the .war file is built ready for Tomcat. More information : Data source should be defined beforehand. The application was tested with Derby and Postgresql databases.
Heroku deployment
The same .war can be deployed as Heroku Tomcat application and run on Heroku Postgresql database. More information:
Heroku, Tomcat
Although Heroku supports Tomcat container, I discovered that I was unable to read environment variable from META-INF/context.xml file. It returns null all the time. So was forced to split the Heroku code from standard Tomcat code to retrieve some configuration settings.
The Heroku version is recognized by discovering webapp-runner.jar file in classpath.
  if (isJarOnPath(classPath, "webapp-runner.jar"))
   return ContainerType.HEROKU;
Then by virtue of guice it i easy to inject proper configuration properties, particularly related to datasource definition.
  IJythonUIServerProperties getServerProperties(IGetResourceJNDI getJNDI, IReadResourceFactory iFactory,
    IReadMultiResourceFactory mFactory, IGetEnvVariable iGet) {
   if (ContainerInfo.getContainerType() == ContainerType.HEROKU)
    return new HerokuServerProperties(iFactory, mFactory);
   return new ServerPropertiesEnv(getJNDI, iFactory, mFactory, iGet);

I also implemented GWT 2.8 and the latest gwt-polymer But finally, I can say : "Welcome to Java 8 world".

poniedziałek, 31 października 2016

Ubuntu 14.04, cups

I spent several hours trying to remove an unnecessary printer in Ubuntu 14.04. Unfortunately, I was unable to pass authentication required for cups administration tasks. It does not matter if WEB UI or Ubuntu System Preferences is used, both are using the same authentication mechanism. I spent a lot of time adding and removing lpadmin users and restarting cups service.
Log file: /var/log/cups/error_log reports constantly the same message:

E [31/Oct/2016:13:35:52 +0100] [Client 8] pam_authenticate() returned 3 (Error in service module)
E [31/Oct/2016:13:42:09 +0100] [Client 8] pam_authenticate() returned 3 (Error in service module)
E [31/Oct/2016:13:43:18 +0100] [Client 17] pam_authenticate() returned 3 (Error in service module)
E [31/Oct/2016:13:54:13 +0100] [Client 17] pam_authenticate() returned 3 (Error in service module)
E [31/Oct/2016:14:01:28 +0100] [Client 17] pam_authenticate() returned 3 (Error in service module)
E [31/Oct/2016:14:04:33 +0100] [Client 17] pam_authenticate() returned 3 (Error in service module)
E [31/Oct/2016:14:07:31 +0100] [Client 19] pam_authenticate() returned 3 (Error in service module)

More talkative was /var/log/syslog and /var/log/auth.log

Oct 31 10:50:18 sb-ThinkPad-W540 kernel: [ 3104.189523] type=1400 audit(1477907418.556:75): apparmor="DENIED" operation="signal" profile="/usr/sbin/cupsd" pid=7913 comm="cupsd" requested_mask="send" denied_mask="send" signal=term peer="unconfined
The solution is described here. The AppArmor Protection service is badly configured and blocks any attempt for cupds to authenticate lpadmin user.
The workaround is to disable AppArmor temporarily for cupsd

sudo aa-complain cupsd
execute all administrative tasks and when it is done to harden security again

sudo aa-enforce cupsd
I'm not very happy with that but finally I removed disliked printer from my desktop.

New features in Jython MVP framework, more Polymer

I implemented next bunch of Polymer web components from GWT Polymer show case. New components implemented: Header Panel, Icon Button, Item, Input, Material, Menu, Progress, Radio Button, Radio Group, Ripple, Spinner, Tabs.
The demo is launched from here, the source code is here.
Example screenshots

I also implemented ui:style element with CSS class name obfuscation. This way we can have a "private" CSS class name without the risk, that "local" class name will collide with other CSS class definitions. Also, the framework does not allow to use misspelled class names.
An example for UIBinder with ui:style element.
Pending problems
  • Popup dialogs do not work in Firefox browser
  • Not all web components render well, for instance: Tabs
  • Implementation of Progress component is still waiting for action for "Start" button
Next steps
Continue with Paper components. Having ui:style element implemented, makes copying and pasting UIBinder files from GWT Polymer show case easier.

czwartek, 29 września 2016

IBM InfoSphere Streams, Big SQL and HBase

Big SQL can run over HBase. IBM InfoSphere Streams does not have any mean to load data directly to Big SQL. Although it is possible to use general-purpose Database Toolkit (DB2), running INSERT statement for bulk data loading is very inefficient.
Another idea is to define Big SQL table as HBase table (CREATE HBASE TABLE) and load data directly to underlying HBase table. But HBase, unlike Hive,  is a schemaless database, everything is stored as a sequence of bytes. So the content of HBase table should be stored using format supported by Big SQL.
IBM InfoSphere Streams provides HBase toolkit, but HBasePut operator cannot be used directly. For instance, string value stored by HBasePut operator is not valid CHAR column in terms of Big SQL, it should be pre-formated beforehand to binary format.
The solution is to develop additional Streams operator to encode all SPL types to binary (blob) format and such binary stream to be consumed by HBasePut operator.
Instead of (example):
stream<rstring key, tuple<rstring title, rstring author_fname,
   rstring author_lname, rstring year, rstring rating> bookData> toHBASE =
    toHBASE : key = title + ":" + year, bookData = bookStream ;

  // Now put it into HBASE.  We don't specify a columnQualifier and the attribute
  // given by valueAttrName is a tuple, so it treats the attribute names in that 
  // tuple as columnQualifiers, and the attribute values 
  () as putsink = HBASEPut(toHBASE)
    rowAttrName : "key" ;
    tableName : "streamsSample_books" ;
    staticColumnFamily : "all" ;
    valueAttrName : "bookData" ;

stream<rstring key, tuple<rstring title, rstring author_fname,
   rstring author_lname, rstring year, rstring rating> bookData> toEncode =
    toHBASE : key = title + ":" + year, bookData = bookStream ;

 stream<blob key, tuple<blob title, blob author_fname, blob author_lname, blob year, blob rating> bookData> toHBase

  // Now put it into HBASE.  We don't specify a columnQualifier and the attribute
  // given by valueAttrName is a tuple, so it treats the attribute names in that 
  // tuple as columnQualifiers, and the attribute values 
  () as putsink = HBASEPut(toHBASE)
    rowAttrName : "key" ;
    tableName : "streamsSample_books" ;
    staticColumnFamily : "all" ;
    valueAttrName : "bookData" ;
HBaseEncode operator is to be implemented as Java operator. Before developing it I decided to create a simple Java project encoding a subset of Java types to the binary format accepted by Big SQL.
The Big SQL HBase data format is described here,so it looked as a simple coding task. But unfortunately, the description is not accurate, so I was bogged down by unexpected problems. For instance: NULL value is marked by 0x01 value, not 0x00. Also, the source code for Hive SerDe is not very useful, because Big SQL encoding diverges in many points.
So I ended up with loading data through Big SQL INSERT command and analyzing a binary content of underlying HBase table trying to guess the proper binary format.
Java project
The Java project is available here. It consists of several subprojects.
HBaseBigSql  (Javadoc) will be used by Streams operator directly. It does not have any dependency. Big SQL types supported are specified by enum BIGSQLTYPE. The most important class is class containing the result of painstaking process revealing HBase binary format for all Big SQL types.
HBasePutGet (Javadoc) subproject contains several supporting classes to put data into HBase table. It has HBase client dependency.
TestH is Junit tests. The test case was very simple. CREATE HBASE TABLE, load data to underlying HBase table and get data through Big SQL SELECT statement and compare the result, whether data stored in HBase equals to data retrieved by Big SQL.
Possible extensions
  • The solution only writes data to HBase in Big SQL format. Opposite is to code a methods to read data from HBase table.
  • Compound indexes and columns are not supported.
Next step
Develop IBM InfoSphere Streams HBaseEncode operator.

poniedziałek, 29 sierpnia 2016

Jednolity Plik Kontrolny, Standard Audit File

Jednolity Park Komtrolny  (JPK), Standard Audit File, is a new requirement of Polish Ministry of Finance to improve efficiency in tax control and to narrow the space for tax evasion.
The protocol for sending finance data to tax office via REST gateway is described here (in Polish).
The taxpayer should send tax reports using the specified format and procedure. Unfortunately, although the protocol comprises of well-known encrypting and signing methods, Ministry of Finance does not share any code sample how to accomplish the task.
In order to alleviate the burden, because creating everything from scratch could be a painstaking process, I decided to develop a simple Open Source project in Java, covering all area of encoding and data sending via REST protocol. The source code is available here.
The project can be utilized in two ways. As an API, Javadoc will be available soon, for Java solutions, or as a command line for a non-Java solution.
The project requires preparing a property file containing a set of parameters. The sample property file is available here. Following parameters should be defined.
  • conf Directory containing a set of artifacts necessary to run the solution. Sample directory is available here.
    • JPKMFTest-klucz publiczny do szyfrowania.pem :  Public key to encode the symmetric key
    • initupload-enveloped-pattern.xml : Pattern for creating InitUpload XML file used to initiate the data transmition. The file contains a number of position markers to be replaced by current values. I found this solution more applicable then creating XML file on the fly.
    • log.conf : JUL logging configuration. FileHandler is added automatically, only ConsoleLogger should be defined here.
    • : Certificate used to authorize access to the public gateway.
  • workdir : Working directory to keep temporary data between different phases of data transmission. It is  also a place to keep a log file. This directory is cleaned at the beginning of the first phase,  so it is a responsibility of solution user to backup this directory.
  • publickey : The name of the file with a public key in conf directory.
  • cert : The name of the certificate file (X.509) in conf directory.
  • url : The URL to send InitUpload.xml file, transfer initialization.
  • finish: The URL to signal the transmission completion.
  • get: The URL to receive UPO (Urzędowe Potwierdzenie Odbioru), Official Receipt Confirmation.
Solution structure
The solution is developed as several static API methods and corresponding main methods for command line application.
Sending financial data to gateway comprises several steps described in manual
1. Preparing initial data
During this step InitUpload.xml file is created and input financial data are zipped and encrypted. Preparing UnitUpload.xml requires several steps like generating the symmetric key, making MD5 and HASH-256 hash for symmetric key and input data. The procedure is described in comments embedded in source code.
API method : JPK.Prepare method.
Command line: Transform.main method.
2. Signing Initupload.xml
This step should be done manually. There are several public certification authorities in Poland and every one provides its own application for signing documents.
3. Uploading data to gateway
This step uses InitUploadSigned, PutBlob and FinishUpload REST API methods.
API method: UPLOAD.upload method.
Command line: Upload.main method.
4. Receiving UPO, Official Receipt Confirmation
This step uses Status REST API method. UPO is available for download after some time, so the method should be launched is some time intervals untill UPO is received.
API method : UPLAOD.getUpo method.
Command line: GetUPO.main method.

sobota, 30 lipca 2016

New features in Jython MVP framework

I started playing around with Vaadin Gwt Polymer and decided to implement this feature in my Jython MVP framework. The source code is available here, show demo version can be launched through this address (it could take several minutes until Google App Engine instance is activated). Important : does not work properly in Firefox, tested with Chrome only.
New features are consumed in two ways:
Polymer implementation

XML definition
In order to activate Polymer rendering "polymer" attribute should be added. Widgets implemented so far: menu, combo (PaperDropdownMenu), main panel, editing. A list is not implemented yet. A date picker is implemented by standard GWT DatePicker widget, I was unable to find Polymer equivalent.
Dynamic UI Binder
Polymer rendering described above comes with one obstacle. In order to give access to all Polymer properties, it would be necessary to extend the xml definition by a huge number of additional attributes mapping to appropriate Polymer properties. It is resolved by GWT UiBinder which is not applicable for Jython MVP framework because UI Binder is resolved statically, during compilation. So I decided to develop my own UI Binder implementation, resolved dynamically. In a sample application, I implemented several Paper Elements from Vaadim GWT Polymer show case.
GWT UI Binder format is not mutually intelligible with MVP Jython UI Binder format. But both share the logic, I found quite easy to adapt UI Binder from Vaadim show case for a  sample application. An example of UIBinder for above dialog is available herefieldid attribute binds UIBnder element with corresponding field or button element, like GWT UIBinder ui:field attribute.
Java code for dynamic UI Binder is here. Of course, it is not possible to use reflection in GWT context, so the only solution is a painstaking manual setting of all corresponding attributes.
Next steps
Implementing all samples from GWT Vaadin Polymer show case.

niedziela, 12 czerwca 2016

Byliśmy na operze

23 kwietnia byliśmy na operze Francisa Poulenca "Głos ludzki" w Teatrze Wielkim, podobało nam się bardzo. "Głos ludzki" to jednoaktowa opera albo monodram. Głównymi bohaterami jest kobieta porzucona przez kochanka oraz telefon, zaś treścią jest rozmowa telefoniczna. Premiera opery odbyła się w 1959 roku, we współczesnej wersji tradycyjny telefon ze słuchawką i przewodem jest zastąpiony komórką, ale, odgrywające istotną rolę w przedstawieniu problemy techniczne charakterystyczne dla ówczesnej telefonii jak obcy głos na linii czy zerwane połączenie, musiały być zostawione.
Poulenc stworzył dwie orkiestracje, pierwotnie na fortepian i potem także na pełną orkiestrę. W przedstawieniu w Teatrze wystawiona została wersja na fortepian.
Nie wiemy kim jest główna bohaterka i jej rozmówca. Dowiadujemy się, że doszło do rozstania, i telefon jest ostatnią nicią łączącą bohaterkę z jej kochankiem, który porzucił ją dla innej. Opera oddaje rytm rozmowy telefonicznej, gdzie słyszymy tylko jedną osobą, zaś chwile ciszy zapadają gdy mówi niesłyszalny dla nas rozmówca. Wiemy, że ta rozmowa jest niezwykle istotna dla kobiety, jest końcem lub początkiem czegoś bardzo ważnego. Emocje, ów tytułowy ludzki głos, oddają wagę tej chwili. Tak naprawdę nie jest ważne co dokładnie zaszło, ale jak to się odbiło na stanie psychicznym głównej bohaterki. Sama fabuła się rozpada na dwie wyraźne części, w pierwszej są zachowywane pozory zwyczajności, maskowane nawet drobnymi kłamstwami, zaś w drugiej bariera pęka, dowiadujemy się o próbie samobójczej, nie jest to przygnębienie czy smutek spowodowany rozstaniem, ale prawdziwy dramat. Wszystkie zmiany nastrojów są oddane przez śpiew, recytację. Akompaniament, fortepian, odgrywa istotną rolę, czasami tylko wspiera głos, czasami towarzyszy, czasami prowadzi a głos podąża.
Zazwyczaj "Głos ludzki" jest inscenizowany w sposób zgodny z pierwotnym zamysłem twórcy, widzimy pokój, aparat telefoniczny, przewód i słuchawkę oraz bohaterkę trzymającą słuchawkę przy uchu. Taki telefon z jednej strony daje wolność, gdyż pozwala rozmawiać z osobami przebywającymi w zupełnie innym miejscu, ale z drugiej strony ogranicza, osoba rozmawiająca jest uwiązana na długość przewozu. Tak jak główna bohaterka, jest wolna, gdyż zerwała ze swoim kochankiem, ale zarazem jest związana ze swoją przeszłościa, od której nie może się oderwać.
Przedstawienie w Teatrze Wielkim reżyserowane przez Maję Kleczewską jest odmienne. W pierwszej częsci akcja rozgrywa się w dwóch przestrzeniach. Na scenie oraz na ekranie, gdzie obserwujemy twarz kobiety jadącej bez celu samochodem. Światła wielkiego miasta w oddali tylko potęgują poczucie samotności. W drugiej części widzimy scenerię wypadku samochodowego. Do bohaterki dołączają postacie jak z sennego koszmaru albo filmu grozy odgrywające pantonimę. Szok spowodowany wypadkiem spowodował zerwanie masek i pozorów normalności, wywołał projekcję utajonych myśli, wspomnień, przeżywanego dramatu.
Takie podejście nie wszystkim musi się podobać. Wygląda jakby reżyserka nie ufała Poulencowi i nie uwierzyła, że specyficzna, jednoaktowa opera na głos ludzki, fortepian i telefon ze słuchawką na przewodzie, mająca premierę ponad pół wieku temu, może się spodobać współczesnemu widzowi i wymaga modernizacji. W konsekwencji mamy dwie warstwy przedstawienia, niekoniecznie dobrze ze sobą zsynchronizowane, operę Poulenca i dodatkową wizualizację a potem pantonimę. Treść wokalna i muzyczna musi konkurować z inscenizacją, słuchacz i widz jest zmuszony do przenoszenia uwagi między jednym i drugim, na czym traci muzyka i śpiew.
A wielka szkoda, że bo właśnie Joanna Woś w głównej roli bardzo dobrze dźwiga całe przedstawienie, to przecież bardzo trudna rola, wymagająca ogromnych umiejętności wokalnych i aktorskich,  tak samo bardzo dobrze brzmi akompaniament który na fortepianie gra Taras Hluszko. Bardzo chętnie byśmy wysłuchali Poulenca właśnie w tym wykonaniu, bez zbędnych dodatków.

wtorek, 24 maja 2016

Oracle to DB2 migration, DBMS_SQL package

Let's take a look at this simple Oracle stored procedure.
  v_refcursor SYS_REFCURSOR;
  v_cmd VARCHAR2(32767 CHAR);
  v_cur INTEGER;
  v_aux NUMBER;
  -- first part of the statement
  -- parameters enhancement
  IF (p_empname IS NOT NULL) THEN
    v_cmd := v_cmd || ' AND ENAME = :p_empname';
  END IF;  
  IF (p_deptname IS NOT NULL) THEN
    v_cmd := v_cmd || ' AND DNAME = :p_deptname';
  IF (p_mgmname IS NOT NULL) THEN
    v_cmd := v_cmd || ' AND MGR IN (SELECT EMPNO FROM EMP WHERE ENAME= :p_mgmname)';
  v_cmd := v_cmd || ' ORDER BY ENAME';
  -- preparation
  v_cur := dbms_sql.open_cursor;
  dbms_sql.parse(c => v_cur, statement => v_cmd, language_flag => dbms_sql.native);
  -- parameter substitution
  IF (p_empname IS NOT NULL) THEN
    dbms_sql.bind_variable(c => v_cur,name => ':p_empname',value => p_empname);
  END IF;  
  IF (p_deptname IS NOT NULL) THEN
    dbms_sql.bind_variable(c => v_cur,name => ':p_deptname',value => p_deptname);
  IF (p_mgmname IS NOT NULL) THEN
    dbms_sql.bind_variable(c => v_cur,name => ':p_mgmname',value => p_mgmname);
  -- final ivocation
  v_aux := dbms_sql.execute(v_cur);
  v_refcursor:= dbms_sql.to_refcursor(cursor_number => v_cur);
  RETURN v_refcursor;
The procedure is executed against Oracle sample data set. Depending on the parameters, it yields result set containing a list of employees working in the same department, having the same manager or the data of the single employee with the name specified. If all parameters are null the procedure returns the list of all employees.
is expanded as
What is more, we want to launch this procedure from JDBC and Hibernate client code.
Migration to DB2
While migrating to DB2 there are several problems to overcome.
  • DBMS_SQL package is implemented in DB2. It is only necessary to replace bind_variable method with appropriate bind_variable_varchar, bind_variable_number etc. because DB2 supports routine overloading only by a number of parameters, not by parameters type.
  • to_refcursor method. This method casts cursor id used in DBMS_SQL package to ref_cursor. It is not supported in DB2 and there is no way to overcome this issue. We must forget about DBMS_SQL implementation in DB2.
  • RETURN SYS_REFCURSOR. In DB2 SYS_REFCURSOR can be returned from a function. Nevertheless, it is not supported by DB2 JDBC client, there is no way to pick up this cursor and make usage of it as a standard ResultSet. The solution is to transform EMP_SEARCH to a procedure and return SYS_REFCURSOR in the OUT parameter.
  • Hibernate. It is another killer, from bad to worse. If DB2 stored procedure is called through Hibernate we cannot intercept the result set returned as OUT parameter. The only way is to call DB2 stored procedure having DYNAMIC RESULT SETS clause. But DYNAMIC RESULT SETS is not available in Oracle syntax, PL/SQL stored procedures have this parameter set to 0 meaning that they do not return any result set.
Judging from arguments provided above we are at a dead end. The only solution is to forget about Oracle Compatibility Mode and rewrite the procedure from scratch using native DB2 syntax. It is not a problem for a single and simple procedure as above. But what to do if the procedure is much more complicated, with more parameters and more complicated SQL statement to generate. And how to proceed if we have hundreds or thousands of procedures like that ? If we do not want to spend the rest of our life on it or do not have resources to hire the Mongolian horde of DB2 SQL developers and testers to do it we have to find another solution.
Solution highlights
  • Although we cannot avoid modification of the stored procedure body, keep the main logic generating the SQL statement and passing parameters to the statement without modification. This way the risk of injecting regression errors is minimalized.
  • Replace DBMS_SQL package with a similar package having the same methods and implementing the same logic.
  • The stored procedure launched by JDBC or Hibernate client should be native DB2 routine having DYNAMIC RESULTS SETS clause.
Solution, DBMSJ_SQL package
The solution is available here. The JDBC and Hibernate code for Oracle and DB2 and Junit tests are available here
To avoid name conflict the package name is DBMSJ_SQL. It comprises Java code and DB2 SQL code. Java and DB2 SQL wrapper covers part of DBMS_SQL package.
  • OPEN_CURSOR New cursor, return cursor id to be reused in later calls
  • PARSE Unlike original DBMS_SQL routine it simply keeps SQL statement.
  • BIND_VARIABLE Binds string value with position marker in SQL statement.
  • PREPARE_S_JAVA Used internally. Returns modified SQL statement ready to use by DB2 PREPARE command and list of values set in the order equivalent to the position of the variable marker in SQL statement. 
  • EXECUTE Prepares the statement and opens the cursor to be returned to the client application. This method should be invoked by native DB2 stored procedure.
Stored procedure after migration to DB2
The stored procedure is split into two parts. The first one is the original Oracle SP. The procedure is almost identical to the Oracle version. The only difference is replacement DBMS_SQL with DBMSJ_SQL and removing closing DBMS_SQL.EXECUTE and DBMS_SQL.TO_REFCURSOR. The main logic is the same.
  v_cmd VARCHAR2(32672);
  v_cur INTEGER;
  v_aux NUMBER;
  -- first part of the statement
  -- parameters enhancement
  IF (p_empname IS NOT NULL) THEN
    v_cmd := v_cmd || ' AND ENAME = :p_empname';
  END IF;  
  IF (p_deptname IS NOT NULL) THEN
    v_cmd := v_cmd || ' AND DNAME = :p_deptname';
  IF (p_mgmname IS NOT NULL) THEN
    v_cmd := v_cmd || ' AND MGR IN (SELECT EMPNO FROM EMP WHERE ENAME= :p_mgmname)';
  v_cmd := v_cmd || ' ORDER BY ENAME';
  -- preparation
  v_cur := dbmsj_sql.open_cursor;
  dbmsj_sql.parse(c => v_cur, statement => v_cmd);
  -- parameter substitution
  IF (p_empname IS NOT NULL) THEN
    dbmsj_sql.bind_variable(c => v_cur,name => ':p_empname',value => p_empname);
  END IF;  
  IF (p_deptname IS NOT NULL) THEN
    dbmsj_sql.bind_variable(c => v_cur,name => ':p_deptname',value => p_deptname);
  IF (p_mgmname IS NOT NULL) THEN
    dbmsj_sql.bind_variable(c => v_cur,name => ':p_mgmname',value => p_mgmname);
  RETURN v_cur;
The second is a native DB2 SQL procedure having DYNAMIC RESULT SETS clause and SQL code to prepare and open SQL cursor. This procedure is called directly by JDBC or Hibernate client. Its signature is identical to Oracle procedure except returning SYS_REFCURSOR

  p_empname VARCHAR2(32672) DEFAULT NULL,
  p_deptname VARCHAR2(32672) DEFAULT NULL,
  p_mgmname VARCHAR2(32672) DEFAULT NULL)
  SET V_CUR = EMP_SEARCH_INT(p_empname,p_deptname,p_mgmname);
  CALL dbmsj_sql.execute(v_cur);
Java JDBC client
The JDBC client  code should be modified with respect to picking up the result set.
CallableStatement stmt = connection.prepareCall("{ ? = call EMP_SEARCH ( ?, ?, ?)");
  int i = 0;
  stmt.registerOutParameter(++i, OracleTypes.CURSOR);
  stmt.setString(++i, empName);
  stmt.setString(++i, depName);
  stmt.setString(++i, mgmName);
  ResultSet rset = (ResultSet) stmt.getObject(1);
CallableStatement stmt = connection.prepareCall("{ call db2inst1.EMP_SEARCH ( ?, ?, ?) } ");
  int i = 0;
  stmt.setString(++i, empName);
  stmt.setString(++i, depName);
  stmt.setString(++i, mgmName);
  ResultSet rset = stmt.executeQuery();

Java Hibernate client 
The only difference is Hibernate mapping file.
        { ? = call EMP_SEARCH( :p_empname, :p_deptname, :p_mgmname )}
        { call EMP_SEARCH( :p_empname, :p_deptname, :p_mgmname )}
JUnit test
A project containing Junit test is available here.  The same test suite covers all four versions: Oracle JDBC, Oracle Hibernate, DB2 JDBC and DB2 Hibernate.
Although modification is necessary, the majority of Oracle code can be migrated almost as-is. What is more, the main logic of the application, creating SQL SELECT statement and argument bindings, is preserved thereby lowering the risk of injecting regression errors. The wrapping DB2 stored procedure is very simple and can be created almost automatically.
The stored procedure migrated here is very simple but this pattern can be applied also to a much more complicated example.
Looking back we can identify several migration milestones:
  1. Enthusiastic. DBMS_SQL package, the core of the SP migrated here, is supported in DB2 Oracle Compatibility Mode, the migration seems trivial.
  2. Dead end. DBMS_SQL.TO_REFCURSOR is not implemented and there is no way to overcome it. The grim reality seems to lurk, rewrite everything from scratch in native DB2 SQL with gritted teeth.
  3. Realistic. By means of DBMSJ_SQL package modifications are necessary but seems feasible. Also, client code requires only small adjustments.

środa, 27 kwietnia 2016

Oracle, DB2, Hive, data migration

I created a simple program to migrate data from Oracle, MS/SQL and any JDBC reachable relational database to DB2 and Hive.  The source code is available here.
For DB2, the program extracts data from the source database and prepares SQL script to load data to DB2 database. A script should be executed on the DB2 host machine and all extracted data should be transferred there.
For Hive, the program creates a script with Hive table declaration based on the source database and Hive script to push extracted data into Hive.
Solution description
The solution is a combination of Java program and bash script. Can be executed on Linux system.
Eclipse project for Java part can be downloaded here. The jar file containing Java code extracting data from a source database is launched in wrapping bash script, it is not accessible directly.
The solution layout:
  • Main script
  • export.rc Environment variables
  • {db}.properties Property file referenced by CONNPROP variable in export.rc. Contains connection parameters.
  • jars Directory containing jar exported from Eclipse project with Java code
  • jdbc Directory containing all dependencies, particularly JDBC drivers appropriate for the database used.
  • log Log directory
Configuration, export.rc file
  • EXPORT_DIR Directory where extracted data are downloaded. 
  • LOGIR Log directory
  • LOADSQL File name for load data script created by the solution
  • CONNPROP Property file with JDBC connection details

# EXPORT_DIR : directory where all export file are stored
# log directory
# DB2 load script created
# connection properties

Property file 
  • sourceurl
  • user
  • password
  • drivername
  • sourcedb  db2, mssql, oracle. It is important to specify oracle value for Oracle database. For some reason, Oracle JDBC driver does not report properly Oracle temporary table. In order to exclude temporary tables from data extraction process, a specific Oracle catalog view is referenced.
  • destdb db2 or hive. The default is db2. It is important to specify hive if data extracted is going to be consumed by Hive later on.
Example of property file for MS/SQL

destdb=hive # Data extraction for Hive
Example of property file for Oracle database
Example of property file for DB2
sourcedb=db2, main script description
./ {action} parameters related to {action}
  • Parameters: list of schemas
  • Creates a list of tables extracted from source database and schemas. The list is stored in the table.list file. The list is later reused by extractfromlist action. The list can be reviewed and modified manually.
  • Example: ./ listoftables DATA ACCOUNT
  • Parameters: list of tables.
  • Extract data from tables specified. More than one table can be provided.
  • Example: ./export extracttable data.invoices data.lines account.customers
  • No parameters
  • Extract data from tables found in table.list file. The table.list file can be created by listoftable action.
  • Example: extractfromlist
  • No parameters
  • Extract list of schemas from a source database. The list is stored in schema.list file. This action is not connected with any other actions, can be used as a data discovery.
  • Example: ./ extractschemas
  • No parameters
  • Creates create.list script file containing Hive table CREATE commands. The table.list file contains a list of tables in source database. Look below for more details.
  • Example: ./ extracthivetables
  • Parameters: list of tables
  • Extract data from tables specified as the parameters. Data is going to be consumed by Hive later on. Important: in order to have data extracted properly destdb=hive parameter should be specified additionally in the property file.
  • Example: extracthivetable Sales.CountryRegionCurrency Sales.CreditCard Sales.Currency
  • No parameters
  • Extract data from all tables found in table.list file. Data are extracted in the format expected by Hive.
  • Example: extracthivefromlist
  • Parameters: output file name
  • Extract number of records for tables found in table.list file. CSV file is created, every line containing information related to a single table. The command can be used as a rough data migration verifier. Look below.
  • Example: extractnumberofrecords number.db2
DB2 data loader
The solution does not have any support for database schema migration. To migrate database schema use free tool IBM Database Conversion  Workbench.
Exported data are downloaded to EXPORT_DIR directory (for instance /tmp/data). Together with data extraction a LOADSQL script is created with a sequence of DB2 LOAD command to load data into DB2 database. DB2 command line processor, db2, is used for data consumption.  An example of LOAD command prepared

LOAD FROM /tmp/data/sales_personal_data.txt
MESSAGES /tmp/data/msg/sales_personal_data.txt

Also, all LOB columns are migrated. If input table contains any LOB column, additional /tmp/data/{table name} directory is created and in this directory files containing data extracted from LOB columns are stored and LOAD command contains a clause to load LOB data from that directory.
Before starting loading process, all extracted data should be moved to the DB2 host machine, otherwise LOAD command fails.
A typical scenario for data migration from Oracle or MSSQL database to DB2
  1. Prepare property file containing connection data for Oracle database. The property file should contain sourcedb=oracle parameter
  2. Extract all schemas from Oracle database using ./ extractschemas. Identify schema or schemas containing application data to be migrated.
  3. Extract list of tables using command ./ listoftables. Verify and modify the list if necessary.
  4. Extract data from source database using ./ exportfromlist command.
  5. Move extracted data and db2load.sql script to the DB2 host machine. Important: extracted data should be moved to the same directory. 
  6. Connect to DB2 instance and execute command: db2 -tf db2load.sql
  7. Verify the migration using ./ extractnumberofrecords command. Look below for details.
  8. If necessary, repeat the process for all tables or for a single table using ./ extracttable command.
HIVE data loader
For HIVE not only data migration is implemented but also schema migration. A mapping between JDBC data types and Hive data types can be checked out in Java source code. Columns not compatible with any Hive data types, for an instance LOB columns, are ignored. Because data are extracted in CSV format, it is very important that Hive data table schema is synchronized with data extracted.
A typical scenario for data migration from MSSQL to Hive.
  1. Prepare property file with connection data for MS/SQL database. The property file should contain destdb=hive parameter.
  2. Extract all schemas from MS/SQL database using ./ extractschemas. Identify schema or schemas containing application data to be migrated.
  3. Extract list of tables using command ./ listoftables. Verify and modify the list if necessary.
  4. Identify all schemas, create necessary databases/schemas manually in hive.
  5. Prepare script to create Hive tables by executing ./ extracthivetables.
  6. Move script to Hive host machine and run beeline -f create.list command.
  7. Extract data from source database using ./ exportfromlist command.
  8. Move extracted data and db2load.sql script to the DB2 host machine. Important: extracted data should be moved to the same directory.
  9. Load data into Hive using beeline -f db2load.sql command
  10. If necessary, repeat the process for all tables or for a single table using ./ extracttable command.
Data migration verification
./ extractnumberofrecords action can be used as a rough verification if data migration did not fail. Assume that we have an access to DB2 instance.
  1. Make sure that table.list contains a list of all table to be migrated.
  2. Execute ./ extractnumberofrecords for Oracle source database.
  3. Prepare property file for DB2 connection. Modify export.rc configuration file.
  4. Execute ./ extractnumberofrecords number.db2 for DB2 source database. The command should be executed over the same table.list file
  5. Execute db2 commands:
  6. db2 "create table oracle_table(tname varchar(2500),nor int)"
  7. db2 load client from $PWD/ of del MODIFIED BY COLDEL',' insert into oracle_table
  8. db2 "create table db2_table like oracle_table"
  9. db2 load client from $PWD/number.db2 of del MODIFIED BY COLDEL',' insert into db2_table
  10.  db2 "select cast (d.tname as char(30)),d.nor as db2,o.nor as oracle from db2_table d,oracle_table o where d.tname = o.tname and d.nor != o.nor"
The last command should select all tables where number of records does not match and requiring additional concern.

czwartek, 21 kwietnia 2016

Byliśmy na operze

22 marca byliśmy na wykonaniu Stabat Mater Gioacchino Rossiniego w Kościele Świętej Anny, podobało nam się bardzo. Stabat Mater jest jednym z dzieł Rossiniego powstałych po zaprzestaniu przez niego pisania oper w 1929 roku. Pracę nad tym dziełem zaczął w 1831 roku, ale, po różnych perturbacjach, ostateczna wersja powstała dopiero w 1841 roku. Stabat Mater to pobożny i posiadający długą tradycję katolicki hymn do Matki Bożej będący natchnieniem dla wielu kompozytorów przez stulecia. W Stabat Mater Rossiniego słychać wyraźnie, że autorem jest kompozytor oper, to dzieło dynamiczne, pełne pięknych melodii, emocji, radości i smutku, przykuwające uwagę od początku do końca. Wykonanie w Kościele Świętej Anny przez solistów, chór i orkiestrę Warszawskiej Opery Kameralnej wydobyło całe piękno tej kompozycji.

23 marca byliśmy także na przedstawieniu Salome Richarda Straussa w Teatrze Wielkim. Salome to dzieło trudno poddające się prostym ocenom, zaś niestandardowa inscenizacja Mariusza Trelińskiego tę złożoność dodatkowo pogłębia. Całkowicie zniknął starożytny kontekst, sceną jest współczesny, nowobogacki dom, w którym mieszka uwikłana w toksycznych relacjach rodzina, topiąca smutki i stresy w alkoholu. Słynny taniec siedmiu zasłon nie jest przesyconą erotyzmem wschodnią egzotyką, ale restrospektywną pantonimą, gdzie w siedmiu miniaktach odsłaniania jest ponura, rodzinna tragedia o molestowaniu dziecięcej Salome przez ojczyma, Heroda.  Jak w Hamlecie, gdzie przedstawienie teatralne odsłania królowi Klaudiuszowi własną zbrodnię. Salome w tej pantonimie ma założoną nieruchomą maskę, trauma wyryła na jej duszy niemożliwą do zatarcia ranę, jej życie uczuciowe zastygło w wyniku doznanej krzywdy, co ma tłumaczyć obsesyjne i prowadzące do śmierci dążenie do bliskości z prorokiem Jochanaanem. Sam prorok nie pojawia się na scenie, jest tylko głosem, nawet nie wiemy do końca czy to prawdziwa osoba czy tylko złowieszczy głos sumienia prześladujący głównych aktorów spektaklu. Dla Salome materializuje się jako muskularna sylwetka młodego mężczyzny, a potem odciętej głowy, dla Herodiady to nawracający koszmar nie pozwalający zapomnieć o popełnionym grzechu, jakim było kazirodcze małżeństwo z Herodem, zaś dla chwiejnego i miotającego się Heroda to niepewność, czy Jochanaam to prawdziwy prorok i łącznik z zewnętrznym światem dobra, od którego Herod oddalił się dawno temu, czy tylko oszczerca którego należy ściąć. Debata uczonych Żydów jest projekcją niepokoju Heroda próbującego rozwiązać tę rozterkę za pomocą teologicznej debaty mędrców. Ale debata nie przynosi żadnego rozstrzygnięcia, wprowadza tylko zapowiedź jeszcze jednego proroka, prawdziwego Mesjasza, który czyni cuda i wskrzesza zmarłych. Skoro ani alkohol ani uczeni mędrcy nie przynoszą uspokojenia, ukojeniem dla Heroda ma być taniec Salome, co prowadzi do tragicznego końca.
Sceny przenoszą nad od jednego ponurego epizodu do drugiego, wszystko przesycone jest księżycową, nierealistyczną poświatą, scenografia dynamicznie nadąża za obsesjami bohaterów przedstawienia. Prawdziwy dwór Heroda, gdzie pośliźnięcie się na krwi i znalezienie zwłok nie wywołuje nawet specjalnego zdziwienia, a wyłącznie chwilowy dyskomfort. Sam Herod jest ubrany tylko w szlafrok i bieliznę, zas Herodiada w sportowy dres. W końcowej scenie wszyscy pokryci się krwawymi plamami niczym na filmie Tarantino.
W przedstawieniu zagubił się tylko wątek tragicznej miłości Narrabotha do Salome, przechodzi bez echa.
Z inscenizacją współgra bardzo dobre wykonawstwo. Głowną postacią, która dźwiga całe przedstawienie od początku do końca jest Salome i w tej roli doskonale się sprawdzała Alex Penda. Także Jacek Laszczykowski jako zamroczony alkoholem i chwiejny Herod wypadł bardzo dobrze. To samo można powiedziec o pozostałych wykonawcach i orkiestrze.
Przedstawienie na pewno bardzo udane pod każdym względem, budzące niepokój i zostające w pamięci.

piątek, 25 marca 2016

Oracle -> DB2 migration, useful awk scripts

Database migration is a painstaking process but by means of Oracle Compatible Feature migration from Oracle to DB2 can be done very easy. There is an excellent redbook explaining the process in more details. Although the complexity is reduced I created some useful awk scripts to support it.
While working on migration of SQL code it is very convenient to split all objects into separate files and in the case of any trouble to focus on a single file containing the object under scrutiny.
So I created a set of useful and simple awk scripts which make it easy. The source code is available here. It can be modified and extended according to needs.
Splitting Oracle source file
The following objects can be extracted and split into separate files: tables including indexes and constraints, foreign keys, packages headers, packages bodies, triggers, views, stored procedures outside packages, UDF, sequences and types.
The main script is obj.awk.
The launching sequence:
awk -v extracttype={value} -v objtype={value} -f obj.awk {input file(s)}
Two parameters are recognized: extracttype and objtype
objtype can have the following values
  • VIEW
extracttype can have the following values
  • LIST
 The following actions are triggered:
  • extracttype=TABLES, no objtype, all tables object are extracted
  • extracttype=ALLOTHERS, no objtype, all objects but tables are extracted
  • extracttype=LIST, objtype can have value or be ignored, only object names are listed
  • no extracttype, objtype=value, only objects pointed by objtype are extracted
  awk -v extractype=TABLES -f obj.awk {input file}
All tables are extracted.

awk -v extracttype=LIST objtype=SEQUENCE -f obj.awk {input file}
All sequence names are printed.

awk -f objtype=PACKAGE -f obj.awk {input file}
All package headers are extracted.

Extracted objects are saved in the following directories
  • TABLE, tables
  • VIEW, views
  • TRIGGER, triggers
  • PROCEDURE, procedures
  • FUNCTION, functions
  • FOREIGN, foreign
  • TYPES, types
  • PACKAGE, packages
  • PACKAGEBODY, packagesbody
  • SEQUENCE, sequences
Extracted code modification
Although very little is supposed to be done to have Oracle SQL code running on DB2, sometimes small adjustments can make life easier.
The following awk scripts are executed on the fly against every extracted object.
  • TABLE, modiftable.awk
  • VIEW, modifview.awk
  • TRIGGER, modiftrigger
  • PROCEDURE, (none)
  • FUNCTION, (none)
  • FOREIGN, (none)
  • TYPES, modiftypes.awk
  • PACKAGE, modifpackage.awk
  • PACKAGEBODY, (none)
  • SEQUENCE, modifsequence.awk
An example of modification script for the sequence.
@include "fun.awk"

  line = gensub(/MAXVALUE\ +[0-9]+\ /,"NO MAXVALUE ","g",line)
The modification scripts can be extended or modified as needed.

poniedziałek, 29 lutego 2016

BoaTester. Netezza PDA, testing


 IBM PureData System Analytics, Netezza, is IBM warehouse solution for a huge volume of data. Netezza comes with a lot of tools, most commonly used is nzsql, command line tool. The nzsql command can be used as a typical REPL tool, for running ad-hoc queries and do managerial or administrative tasks and also a part of a broader solution, engine to run scripts and queries in batch mode. As it happens with development, a question arises how to test it, and, next step, how to create a suite of regression test to be sure that during maintenance, updates, bug fixing, the software still performs correctly

Assume that we developed a script doing very important stuff inside big PDA appliance. It could be backuping, user and groups synchronization with external user maintenance system, like Active Directory or LDAP, vacuuming, refreshing, moving data between databases, loading and transforming external data. The script covers several use cases and should run over a different combination of data. The script should be tested after every fix or introducing a new feature. It is a good idea to have this task automated to avoid tedious and mundane manual testing. Every test should start with a predefined precondition and finish with expected result. If the current result matches the expected result then the test passes otherwise test fails. All setup and verify task can be done as a single SQL statement or a nzsql script in case of more complex setting. So what is necessary is a test harness allowing run aforementioned tasks in a simple way.
So what I need is a test harness allowing executing the following tasks in a simple way.
  • nzsql script
  • nzsql and single SQL statement
  • bash script
  • verification feature to determine whether the test passed or failed
I found that an extension of BoaTester framework provides the solution to the aforementioned problem. The whole add-on logic is contained in a single Python module. The source code is available here. Once installed and configured, it makes defining and adding new test scenarios very easy.
Technical description and usage example is available here.

niedziela, 31 stycznia 2016

Byliśmy na koncercie i operze

8 stycznia 2016 roku byliśmy na koncercie na Zamku Królewskim w wykonaniu orkiestry Warszawskiej Opery Kameralnej, ale wyszliśmy z mieszanymi uczuciami. Repertuar tworzyła muzyka barokowa, w pierwszej części utwory Vivaldiego, zaś w drugiej części dodatkowo muzyka Georga Philippa Telemanna. Co powodowało uczucie niedosytu u słuchaczy pięknej muzyki we wspaniałym wnętrzu Sali Wielkiej warszawskiego Zamku ? W pierwszej części wykonawcy wyraźnie się męczyli z instrumentami i partyturą, strona wykonawcza pozostawiała wiele do życzenia. Znacznie lepiej się słuchało w drugiej części, gdy artyści się rozgrzali. Wspaniale zwłaszcza zabrzmiał koncert na dwie altówki Telemana, brzmienie rzadziej spotykane. Także brawurowy koncert h-moll Vivaldiego na czworo skrzypiec ogromnie się podobał. Ale nierówny poziom wykonawczy zostawił nas z mieszanymi wrażeniami.

18 stycznia 2016 byliśmy na przedstawieniu mało popularnej opery Mozarta "Łaskawość Tytusa" w Teatrze Wielkim, podobało nam się ogromnie. Sam spektakl, w innym wykonaniu, można obejrzeć na Opera Platform.
"Łaskawość Tytusa" zawsze przedstawiała kłopot dla miłośników Mozarta, dla wielu najlepiej by było, gdyby Mozart do tej opery nigdy by nie usiadł. Jedni twierdzą, że kunszt kompozytora wzniósł się ponad statyczny i ociekający patosem tekst, inni uważają, że geniusz Mozarta poległ w starciu z tą materią, "nec Hercules contra plures". Najlepiej jednak zdanie wyrobić sobie samemu kupując bilety.
Mimo widocznych mielizn libretta kto kupił bilety na pewno nie żałował. Od strony muzycznej i wokalnej słuchało się znakomicie. Trochę słabo brzmiała Anna Bonitatibus w roli Sesta, chociaż dużo nadrabiała ekspresją aktorską. Doskonale wypadł majestatyczny i dostojny Charles Workman w tytułowej roli. Także ogromie się podobała Anna Bernacka śpiewająca jako Annio.
Trudny natomiast do odczytania był zamysł inscenizatorski, aczkolwiek od strony estetycznej i kolorystycznej oglądało się bardzo dobrze. Na scenie patrzyliśmy na współczesne wnętrze zamożnego domu, śpiewacy odczytują wiadomości na tabletach, rozmawiają przez komórki i wysyłają SMSy. Gdy ujawniany jest spisek przeciwko Tytusowi na scenie pojawiają się detektywi oznaczający ślady policyjnymi marketami.
Jednak takie współczesne odczytanie tekstu nie przydawało tutaj żadnego znaczenia, sprawiało wrażenie wystawiania pustych gadżetów. Namiętności targające bohaterami, miłość, namiętność, zdrada, przyjaźń, szlachetność, w osobie Vitelli przemawia żądza władzy, polityczne wyrachowanie, to wszystko brzmiało blado, tekst nie miał oparcia w inscenizacji.
Znacznie lepiej byłoby wrócić do źródeł, niech Tytus nadal będzie rzymskim cesarzem, Publio dowódcą Gwardii Pretorianów, Senat senatem a nie widownią telewizyjnego show, zaś w spisku na życie cesarza niech pojawią się Kapitol, spiskowcy, sztylety i zbuntowany rzymski motłoch. W ten sposób tekst miałby oparcie w inscenizacji i przedstawienie nabrałoby kolorów. Główna oś dramatyczna opery, konflikt między prawem nakazującym skazać zdrajcę na śmierć a wzdragającym się przed przemocą miłosierdziem, znacznie mocniej by wybrzmiał, gdyby tymi rozterkami był miotany cesarz, od którego podpisu na dokumencie zależy los milionów poddanych, niż gość pięciogwiazdkowego hotelu czy lokator luksusowego apartamentu.
Ale przedstawienie powinno być na pewno obowiązkowym punktem każdego melomana w Warszawie, trochę więc dziwiło, że widownia świeciła pustkami.

niedziela, 3 stycznia 2016

IBM InfoSphere Streams, PowerBI operator

Previously I created a simple Java helper package to get access to PowerBI REST API. Now the time has come to develop Streams operator shipping data out  to PowerBI. Full source code is available here as StreamsStudio project.
Operator description, parameters
Operator (Java source code is available here) accepts seven parameters, five mandatory and two optional.
  • oauth_username
  • oauth_password
  • oauth_clientid: Parameters are required to receive access token to PowerBI. More details about PowerBI REST API and authentication is available here.
  • datasetName
  • tablename: Defines the table in PowerBI namespace.
  • flushsize: optional (default 1). Specifies the buffer size threshold before pushing data to PowerBI. Increasing the value improves performance but there is a delay between data is received and stored to PowerBI. The default is 1, every tuple is sent immediately to PowerBI. Value 0 has special meaning. Data is pushed to PowerBI when punctuation marker is received.
  • cleanfirstly: optional (default false). A logical parameter, if true then PowerBI table is truncated at the beginning.
Operator description, data loading
Pushing data is very simple, just create input streams reflecting the table structure in PowerBI and let the tuples flow. Only primitive data types are allowed (source).

private Map createTableSchema(OperatorContext context) throws PowerException {
  StreamSchema sche = context.getStreamingInputs().get(0).getStreamSchema();
  Map bischema = new HashMap();
  for (String name : sche.getAttributeNames()) {
   Type.MetaType ty = sche.getAttribute(name).getType().getMetaType();
   String biType = null;
   switch (ty) {
   case RSTRING:
   case USTRING:
   case ENUM:
    biType = PowerBI.STRING_TYPE;
   case INT8:
   case INT32:
   case INT64:
   case UINT8:
   case UINT16:
   case UINT32:
   case UINT64:
   case INT16:
    biType = PowerBI.INT64_TYPE;
   case DECIMAL128:
   case DECIMAL32:
   case DECIMAL64:
   case FLOAT32:
   case FLOAT64:
    biType = PowerBI.DOUBLE_TYPE;
   case BOOLEAN:
    biType = PowerBI.BOOL_TYPE;
   case TIMESTAMP:
    biType = PowerBI.DATETIME_TYPE;
   if (biType == null)
    failure("Attribute " + name + " type " + ty.getLanguageType() + " not supported");
   log.log(TraceLevel.DEBUG, "Attribute " + name + " type " + ty.getLanguageType() + " mapped to " + biType);
   bischema.put(name, biType);
  return bischema;

Usage example

namespace application.test ;

use ;

composite Main
   Beacon_1_out0 as O) as Beacon_1 = Beacon()
    state : mutable int32 i = 0 ;
    iterations : 100 ;
    period : 0.1f ;
    O : name = "Name " +(rstring)(i ++), num =(uint32)(i ++), flo = 1.23, log =
     i / 2 == 0 ? true : false, ti = createTimestamp(1000l, 100u) ;

  () as PowerBI_2 = PowerBI(Beacon_1_out0)
    oauth_clientid : "22dfcddc-d5b4-4e8a-8358-1658fabdad0b" ;
    oauth_password : "Szczuja123" ;
    oauth_username : "" ;
    datasetName : "InfoS6" ;
    tablename : "table6" ;
    flushsize : 10 ;

  () as Custom_3 = Custom(Beacon_1_out0 as inputStream)
    onTuple inputStream :
     println(inputStream) ;


More examples.
Loading modes
Three loading modes are supported. It is specified by parameter flushsize, described above.

  • flushsize = 1 (default). The incoming tuple is immediately pushed to PowerBI. Data is immediately available but there is a performance penalty.
  • flushsize > 1. Sending tuples to PowerBI is held off until the threshold is exceeded and then the whole buffer is shipped out in a single operation. It is more efficient but there is a gap between the time when data arrive and are available in PowerBI.
  • flushsize = 0. Tuples are buffered and sent to PowerBI when punctuation marker is received.
In addition to PowerBI Java package, additional dependency is imposed by this library. It is specified by pom.xml file. The dependencies are stored in imp/lib directory in operator structure. The directory is defined by Lib annotation in operator body.

Additional remarks
  • Only one table can be defined per dataset. I cannot tell if it is a limitation of PowerBI or defect in PowerBI REST API
  • If a table does not exist then it is created during initialization of the operator. The table schema mirrors the schema of the operator input stream. Table schema cannot be changed later. If the table is already created then changing operator input stream schema later will cause a crash.