Netezza and InfoSphere Streams are the part of IBM Big Data offering. Although both are commercial products, good news is that for educational and non-production purpose also free versions are available.
Netezza Software Emulator can be downloaded here.
Non-production InfoSphere Streams package is available here. Although fresh installation of IBM InfoSphere Streams is not complicated it is a good idea to start with ready to use VMware image.
Netezza Software Emulator
Netezza in production requires dedicated hardware but for developing and discovering one can play with VMware machines (Host and SPU) behaving exactly like the real Netezza database. The Netezza Emulator is available only for Windows but it is possible to execute it also under Linux. But keep in mind that IBM supports only Windows version so in case of any problems you are left on your own.
Netezza Software Emulator for Linux
Download and execute INSEfDsetup.exe (it requires at least 13GB free disk space). After downloading transfer unpacked VMware machines (c:\Program Files\IBM\Netezza Software Emulator for Developer\VMS) to the Linux machine. Before starting them create additional vmnet3 network.
This network connects Host and SPU machine. Important: switch off DHCP and set IP address as 10.0.0.1.
Starting Netezza Emulator
Firstly start Host machine. After several minutes log in as nz user (standard password: nz) and execute nzstate. If output is like:
[nz@netezza ~]$ nzstate System state is 'Discovering'. [nz@netezza ~]$ nzhw Description HW ID Location Role State ----------- ----- ---------- ------ ------- SPA 1001 spa1 None Ok Disk 1002 spa1.disk1 Active None Disk 1003 spa1.disk2 Active None Disk 1004 spa1.disk3 Active None Disk 1005 spa1.disk4 Active None Disk 1006 spa1.disk5 Spare None SPU 1007 spa1.spu1 Active Booting [nz@netezza ~]$In case of any problem simply launch nzstart command. Next step is to start SPU machine. The SPU machine should connect to Host and start networked booting. Do not bother about a good number of technical messages flying through the screen, it is as expected and does not mean that something wrong is happening. After some time execute again nzstate command in the Host machine. If the screen is like:
[nz@netezza ~]$ nzhw Description HW ID Location Role State ----------- ----- ---------- ------ ------ SPA 1001 spa1 None Ok Disk 1002 spa1.disk1 Active OkSQL> create table testt (numb integer,name varchar(100)) SQLRowCount returns -1 SQL> select * from testt; +------------+-----------------------------------------------------------------------------------------------------+ | NUMB | NAME | +------------+-----------------------------------------------------------------------------------------------------+ +------------+-----------------------------------------------------------------------------------------------------+ SQLRowCount returns 0 Disk 1003 spa1.disk2 Active Ok Disk 1004 spa1.disk3 Active Ok Disk 1005 spa1.disk4 Active Ok Disk 1006 spa1.disk5 Spare Ok SPU 1007 spa1.spu1 Active Online [nz@netezza ~]$ nzstate System state is 'Online'. [nz@netezza ~]$it means that Netezza is started and ready to handle the requests. Then prepare a test database, a test database and a test user.
nzsql create database test; create user testuser with password 'secret'; grant all admin to testuser; grant all on test to testuser;Check if testuser can log in into test database and execute some basic queries.
nzsql test -U testuser -W secret create table x (x int); insert into x values(1); select * from x; drop table x;Obtain Netezza connection software
This software can be downloaded from this place.Important: this software is available only for IBM customers and business partners.
Install and configure ODBC connection to Netezza on RedHat 6
Installing Netezza client is very simple and straightforward. Just unpack the Netezza client package (something like nz-linuxclient-v7.0.3-P2.tar.gz) and execute two unpack scripts, one in linux directory and the second in linux64 directory. The default installation location is /usr/local/nz.
Firstly test the connection by executing nzsqlodbc utility from /usr/local/nz/bin64 directory (netez is the host name of the HOST VMmachine in my environment).
[root@oc8442647460 bin64]# ./nzodbcsql -h netez test testuser secret NZODBCSQL - program to test Netezza ODBC driver NOT FOR PRODUCTION USE Type 'quit' or '\q' or CTRL-D or ENTER at the prompt to quit NOTE: Max 100 rows are displayed (for selects) Driver version : 'Release 7.0.3 (P-2) [Build 32506]' NPS version : '07.00.0003 Release 7.0.3 (P-2) [Build 32506]' Database : 'TEST' nzodbc >Secondly configure unixODBC connection. That's very simple, just follow instructions in /usr/local/nz/lib64/ODBC_README note. In my environment I replaced host, database name, user name and password with netez,test,testuser and secret. Then test the connection using isql utility.
[sb@oc8442647460 ~]$ rlwrap isql NZSQL +---------------------------------------+ | Connected! | | | | sql-statement | | help [tablename] | | quit | | | +---------------------------------------+ SQL> create table x (x int); SQLRowCount returns -1 SQL> insert into x values(1); SQLRowCount returns 1 SQL> select * from x; +------------+ | X | +------------+ | 1 | +------------+ SQLRowCount returns 1 1 rows fetched SQL>Important: I failed trying to configure unixODBC Netezza connection in Ubuntu (although nzodbcsql utility is working). Probably there are some incompatibilities with unixODBC version installed as default into Ubuntu.
Configure InfoShophere Streams environment to work with Netezza
It is described in detail here.
It looks complicated but it is enough to add three entries to .bashrc file.
export STREAMS_ADAPTERS_ODBC_NETEZZA=NETEZZA export STREAMS_ADAPTERS_ODBC_INCPATH=/usr/include/ export STREAMS_ADAPTERS_ODBC_LIBPATH=/usr/lib64/Important: InfoSphere Streams can work with one database only. So if a connection with Netezza is specified the access to other database (like DB2) is not possible.
Create a simple InfoSphere Streams application to load data to Netezza
Source code (StremsStudio project) is available here. There are two standard methods to load data to Netezza.
- Using ODBCAppend operator.
- Using specialized NetezzaLoad operator.
SQL> create table testt (numb integer,name varchar(100)) SQLRowCount returns -1 SQL> select * from testt; +------------+-----------------------------------------------------------------------------------------------------+ | NUMB | NAME | +------------+-----------------------------------------------------------------------------------------------------+ +------------+-----------------------------------------------------------------------------------------------------+ SQLRowCount returns 0Both methods require preparing connection.xml file. The connection.xml file used in this example is available here.
Hint. While working with connection.xml I found pretty difficult to identify error in this file because the StreamsStudio is not very talkative on this. But you can execute xmlint (XML syntax checker) directly from command line to get more explanatory output. Just copy and past command line invocation from StremsStudio console and remove redirection of error output to null device.
[sb@oc8442647460 NetezzaLoader]$ pwd /home/sb/workspace/testp/NetezzaLoader [sb@oc8442647460 NetezzaLoader]$ xmllint --noout --schema /opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.db/com.ibm.streams.db/Common/connection.xsd ./etc/connections.xml ./etc/connections.xml:29: parser error : Opening and ending tag mismatch: table line 23 and access_specification </access_specification> ^ ./etc/connections.xml:32: parser error : expected '>' </access_specifications> ^ ./etc/connections.xml:34: parser error : Opening and ending tag mismatch: access_specifications line 10 and st:connections </st:connections> ^ ./etc/connections.xml:34: parser error : Premature end of data in tag connections line 1 </st:connections> ^ [sb@oc8442647460 NetezzaLoader]$The application code is available here. It is very simple. Beacon operator produces a series of randomly generated tuples. By modifying the iteration parameter one can increase the number of tuples generated. Removing this parameter for good produce the infinite sequence. Then the output is directed to ODBCAppend operator and and the same output is directed NetezzaPrepareLoad and NetezzaLoad operator putting it to the Netezza database.
Next step is to launch application (as distributed), wait for the moment and verify the content of testt table.
--+ SQLRowCount returns 20 20 rows fetched SQL> select * from testt; +------------+-----------------------------------------------------------------------------------------------------+ | NUMB | NAME | +------------+-----------------------------------------------------------------------------------------------------+ | 31099 | I'm here | | 35 | Good bye, cruel world | | 16515 | Good bye, cruel world | | 31099 | I'm here | | 35 | Good bye, cruel world | | 16515 | Good bye, cruel world | | 5334 | Good bye, cruel world | | 1586 | Good bye, cruel world | | 5334 | Good bye, cruel world | | 1586 | Good bye, cruel world | | 14369 | Hello Kitty | | 14369 | Hello Kitty | | 8129 | Good bye, cruel world | | 4465 | Hello world | | 8129 | Good bye, cruel world | | 4465 | Hello world | | 15072 | I'm here | | 20664 | Good bye, cruel world | | 15072 | I'm here | | 20664 | Good bye, cruel world | +------------+-----------------------------------------------------------------------------------------------------+ SQLRowCount returns 20 20 rows fetched SQL>