javahotel

poniedziałek, 23 lipca 2012

DB2, HADR and C++

Introduction
HADR works well with CLP (Command Line Processor) but what about applications ? Could they also benefit from being connected to HADR environment ? To check it I created simple C++/ODBC application.
Application
The application is available here. There is a simple class encapsulating access to DB2.

class person {
  friend class QueryCommand;
  person(int pid,const char *pname, const char *pfamilyname) : id(pid),name(pname),familyname(pfamilyname) {}
public:
  int id;
  std::string name;
  std::string familyname;
};

class hadrmodel
{
  friend class SqlCommand;
  friend class LogoutCommand;
  std::string lastActionStatus;
  SQLHANDLE  EnvHandle;
  SQLHANDLE  ConHandle;
public:
  hadrmodel() {
    EnvHandle = ConHandle = 0;
  }
  void connect();
  void disconnect();
  void createPersons();
  void dropPersons();
  void autoCommit(bool on);
  std::vector<person> getListOfPersons();
  void addPerson(const std::string &name, const std::string &familyName);
  SQLHANDLE getConHandle() const { return ConHandle; }
  std::string getConnectionStatus() const;
  std::string getLastActionStatus() const { return lastActionStatus; }
};

And simple console application making usage of the methods exposed.

#include <iostream>

#include "hadrmodel.h"

namespace {
void P(std::string s) {
    std::cout << s << std::endl;
}

void textInterface(hadrmodel &ha) {
    while (1) {
        P("=======================");
        P(ha.getConnectionStatus());
        P("1) Connect");
        P("2) Disconnect");
        P("3) Get list of persons");
        P("4) Add person");
        P("5) Create table PERSONS");
        P("6) Drop table PERSONS");
        P("7) Switch on autommit");
        P("8) Switch off autommit");
        P("99) Exit");
        int iChoice;
        std::cout << "Enter:";
        std::cin >> iChoice;
        bool action = true;
        switch (iChoice) {
        case 99:
            return;
        case 1:
            ha.connect();
            break;
        case 2:
            ha.disconnect();
            break;
        case 3:
        {
            std::vector<person> pList = ha.getListOfPersons();
            std::vector<person>::iterator i = pList.begin();
            for (; i != pList.end(); i++) {
                std::cout<< "id:" << i->id << " name: " << i->name << " family name:" << i->familyname << std::endl;
            }
        }
        break;
        case 4:
        {
            std::string name;
            std::string familyName;
            std::cout << "Name: ";
            std::cin >> name;
            std::cout << "Family name: ";
            std::cin >> familyName;
            ha.addPerson(name,familyName);
        }
        break;
        case 5:
          ha.createPersons();
          break;
        case 6:
          ha.dropPersons();
          break;
        case 7:
        case 8:
          ha.autoCommit(iChoice == 7 ? true : false);
          break;
        default:
            action = false;
            break;
        } // switch
        if (!action) {
            P("Invalid action.");
        }
        else {
            P(ha.getLastActionStatus());
        }
    } // while
}

} // namespace

int main(int argc, char **argv) {
    P("Hello, world!");
    hadrmodel ha;
    textInterface(ha);
    P("I'm done");
}

Test
One can replay all test described before and be sure that everything is working perfectly also with this simple C++ application.
Autocommit on/off
There is only a difference in behavior related to "autocommit" mode. More information about autocommit mode here. Below is an example where autocommit mode is "on". The HADR takeover was performed at option 4) - adding a new person to the person table.

One interesting thing - there is nothing special in this screen. Although takeover has been performed the new person have been added smoothly without throwing any exception. So C++/ODBC application in autocommit mode "on" is takeover resistant.
But most database application must have autocommit switch "off" and mark transaction boundaries by releasing "commit" command manually.
So run again the same test but with autocommit mode off (option 8).

This time 'SQL30108N' exception is thrown and transaction is rolled back. But the connection is still alive. We can repeat the last operation and this time it will be successful.
Conclusion
Also C++ application can benefit from HADR without any changes in the code. What is mode - if the autocommit mode is "on" the application is takeover transparent. In case of autocommit mode "off" (probably most applications are running in this mode) the exception is thrown exactly like CLP behaves but the connection still exists. But there is a question - is it possible to create a C++/ODBC application takeover resistant also for autocommit mode "off".

niedziela, 8 lipca 2012

JSPWiki in the Cloud and download/upload

Finally I added download and upload option to my JSPWiki cloud implementation. You have to click "More .." option in the right-up corner of page tab. "Download" wiki content as XML file and opposite function - "Upload". Unfortunately, what seemed quite easy at the beginning ended up with huge refactoring. During testing of "Upload" function additional refresh references functionality is fired and it revealed circular dependency in spring beans created. It took me a lot of time to understand the dependency and to entangle them. It was necessary to modify the creation pattern for several beans and ended up with a bean factory.
Also wiki uploading is not simple, there could be several use scenarios.

Remove current wiki content and upload new one. Something like similar "backup" and "restore" option in databases, new wiki content is a clone of the content read from source file.
Merge existing content with the new one. But what to do if there exists a page with the same name. May be the solution is to add an "overwrite" option, globally set or per case. But there is a risk that wiki content could be a chaotic mesh of old and new pages.
Merge content and in the case of the pages with the same name page content read from source is added as a new version to the existing. It sounds much better then before but there is a risk that the number of version will double, quadruple etc after every upload.
Merge content and in the case of the pages with the same page recognize also page version. Overwrite page content at the page and version level.
Merge content and add only the last version from the input.
....

There could be a lot of scenarios. Finally I decided to implement the scenario number 4. I cannot tell for certain if it is the best one but for the time being it seems to me as being the most sound.

The next and very important task is to change the database schema and reduce the number of reads by adding a cache for pages. The current implementation is very inefficient in terms of number of reads and Google App Engine quota for free access is hit very quickly causing the freezing the application for 24 hours until quotas are reset.

wtorek, 26 czerwca 2012

They are not different

Girl with a Pearl Earring

My father had worked with Catholics and told me that they were no different from us. If anything they were less solemn. They liked to eat and drink and sing and game. He said this almost as if he envied them.

czwartek, 21 czerwca 2012

DB2, HADR and server code

Introduction
So far we have been talking about data. But what about server sql code. We have deployed new version of our very important packages, stored procedures, trigger and UDFs and we have to switch to secondary server.
Test continue
Let's create a new UDF.

CREATE OR REPLACE FUNCTION MULTIPLY (IN FACTOR1 INTEGER, IN FACTOR2 INTEGER)
 RETURNS INTEGER
 NO EXTERNAL ACTION
F1: BEGIN ATOMIC
  RETURN FACTOR1 * FACTOR2; 
END
@

Then, being connected to primary server (db2had1) deploy it.

db2 -td@ -sf MULT.sql

And check how it works.

db2 "values(multiply(2,2))"

1          
-----------
          4

  1 record(s) selected

Make takeover, switch to secondary server
And now force takeover on a secondary server(db2had2) and down the primary server (db2had1). All the time the client is connected. When takeover is finished rerun the command again.

 db2 "values(multiply(2,2))"
SQL30108N  A connection failed but has been re-established. Special register 
settings might have been replayed. Host name or IP address of the new 
connection: "think". Service name or port number of the new connection: 
"50002".  Reason code: "1".  SQLSTATE=08506

The sql error code is as expected and we can rerun the command again.

db2 "values(multiply(2,2))"

1          
-----------
          4

  1 record(s) selected.

So after switching to secondary server 2 by 2 is 4 all the time. As we see also sql server code is replicated to secondary server. Now make a little mischief - modify our business procedure a little bit.

CREATE OR REPLACE FUNCTION MULTIPLY (IN FACTOR1 INTEGER, IN FACTOR2 INTEGER)
 RETURNS INTEGER
 NO EXTERNAL ACTION
F1: BEGIN ATOMIC
  RETURN FACTOR1 * FACTOR2 + 1;
END
@

And deploy modified UDF - this time it is deployed to db2had2 acting as primary server. db2 -td@ -sf MULT.sql DB20000I The SQL command completed successfully.
Switch the role again, make db2had1 primary server.
Log in to db2had1 and make db2had1 as primary

db2start
db2 activate database sample
db2 takeover hadr on database sample
 db2 get db cfg for sample | grep HADR
 Rola w bazie danych HADR                                                 = PRIMARY

All the time the client is connected. And now rerun the command again - keep in mind that it throws SQL30108N for the first time after server switching. And grande finale :

db2 "values(multiply(2,2))"

1          
-----------
          5

  1 record(s) selected.

Summary
So not only data but also sql server code is duplicated to secondary server. No worry about update or upgrade server code in HADR environment. It is good news. But there is also bad news. DB2 is able to duplicate only sql code. So if we have external procedure or function (for instance C++ or Java) the corresponding .jar or .so file is not replicated and it should be done manually.

środa, 20 czerwca 2012

DB2 and HADR

Introduction
HADR stands for High Availability and Disaster Recovery. The general idea is quite simple - we have two servers : primary and secondary running in parallel. The client is connected to primary and all changes made on primary server are replicated to the secondary server. In case of any failure of primary server the client is switched to secondary and works with secondary until primary is restored. Then all changes made on secondary server are replayed to primary and we can return to our first configuration.
It is described in more detail here.
There is also an IBM Redbook covering the topic in every possible way.
How to set up HADR for testing and evaluating purpose.
Unfortunately - all that stuff seems very complicated at first glance. But good news is that actually it is very simple - we can set up HADR on single machine (using two instances) in no more then 10-20 minutes. Of course - it does not make any sense to run such a configuration in a customer environment but it is enough to test how our application will behave after connecting to HADR installation.
The detailed procedure how to set up HADR on a single machine for testing purpose is described here. But how to check that HADR is running and behaving as expected.
Test HADR - do something on primary server and switch roles having client connected all the time.
Log in to client machine and connect to SAMPLE database installed on primary (db2had1) server and do something.

db2 connect to sample user db2had1
db2 "create table testx (int x)"
db2 "insert into testx values(1)"

Now we want to switch off primary machine for some reason but - of course - the show must go on.
So log in to the secondary server and switch roles.

ssh -X db2had2@think
db2 takeover hadr on database sample

Switch off the primary server and start cleaning it. If it is AIX machine then probably nobody has touched it for ages and is covered with dust clods.

ssh -X db2had1@think
db2 deactivate database sample
db2stop

Pay attention that the client is connected all the time.
Now run statement from the client machine

db2 "select * from testx"

After running this statement for the first time the SQL error is thrown.

SQL30108N A connection failed but has been re-established. Special register settings might have been replayed. Host name or IP address of the new connection: "think". Service name or port number of the new connection: "50009". Reason code: "1". SQLSTATE=08506

But it is as expected, it informs us that switching roles has happened but we can safely repeat the last statement.

db2 "select * from testx"
X
-----------
1

What has happened ?

We are now connected by all means to secondary server, primary server is stopped
The reconnection took place automatically, the client did not connect again.
All changes : DDL (CREATE TABLE) and DML (INSERT INTO) has been replicated to secondary server, secondary server contains the latest commited version of database on primary server.
We can continue our business as usual.

Continue test, primary server is still not ready.

Finally it is high time to call it a day and go home - so disconnect.

db2 terminate

Tomorrow we start again.

db2 connect to SAMPLE user db2had1

Connection is successful and we can continue. The primary server is not active so actually we are connected to standby (now acting as primary). Pay attention that we still connect to db2had1 - the connection parameters and credentials are the same regardless which server is acting as primary. It is very important - it means that no changes are necessary in application in case of failover. So now continue our very important work:

db2 "insert into testx values(2)"

Primary server is ready

Assume that our primary server is ready and we want to come back to the preferred configuration.

So log in to db2had1 and activate server.

ssh -X db2had1@think
db2start
db2 activate database SAMPLE

Now check the role - is db2had1 primary or secondary.

db2 get db cfg for SAMPLE | grep HADR

Rola w bazie danych HADR = STANDBY

So although db2had1 is alive again it acts now as secondary server. So in order to have it as primary we have to force takeover again.

db2 takeover hadr on database sample

and check again

db2 get db cfg for SAMPLE | grep HADR
Rola w bazie danych HADR = PRIMARY

So now db2had1 is working as primary.

But what about the client still connected to the database ?

db2 "select * from testx"

For the first time again the SQL error SQL30108N is received But after rerunning the command again:

db2 "select * from testx"
X -----------
1
2
2 record(s) selected.

So - all changes made on db2had2 while db2had1 was not ready has been replayed and we can continue our business as usual.

czwartek, 14 czerwca 2012

JSPWiki in the Cloud and security

I've just deployed a new version of JSPWiki in the Cloud (new sources are commited also) and it seems that authentication is working. User can login and logout.
But what seemed to be quite easy at the beginning ended up with a huge refactoring and rebuilding.
Firstly I had to remove an error which caused me a lot of headache. It worked in development environment but failed after deploying to production server with nice looking error message like:

/JSON-RPC
java.lang.IllegalStateException: WRITER
 at org.mortbay.jetty.Response.getOutputStream(Response.java:594)
 at com.metaparadigm.jsonrpc.JSONRPCServlet.service(JSONRPCServlet.java:137)

What was more difficult this error popped up in some specific scenario. The same execution path first time worked as expected but next time failed. Because debugging in production server is not possible the only way to find a bug is adding next and next trace messages.
Finally I was able to find a malicious code in WikiJSPFilter.java. It seemed that this code:

response.getWriter().write(r);

tried to write after response was commited. So the solution was to enclose that code with something like:

if (!response.isCommitted()) {
...
response.getWriter().write(r);
...
}

But I still do not understand why this code worked in the main trunk of JSPWiki and why the execution path worked for the first time and the next time the same execution path failed.

But after overcoming this problem the next appeared:

Uncaught exception from servlet
java.lang.IllegalArgumentException: Task size too large
 at com.google.appengine.api.taskqueue.QueueImpl.fillAddRequest(QueueImpl.java:347)

It seemed that size of data assigned to the session grew too large. The only solution was to reduce drastically the size of data being persisted with the session. So I decided to make WikiEngine.java class (the main culprit) scoped for the request only and WikiSession.java class (with user credentials attached) to be session scoped.
Next step was to rebuild WikiEventManager.java which kept a lot of data as static. In the cloud environment dynamic data cannot be persisted as static because nobody guarantee that next request will be executed in the same JVM.
So finally I made a lot of classes as a request scoped and put them under the Spring control as a bean. But I discovered that it did not go quite easy because of mutual dependencies between them. So I had to spend a lot of time trying to understand and untangle this dependencies.
But finally it seems working. The main problem now is to improve performance because almost everything is initialized at every request. Another problem is to reduce number of reads from the datastore (the same data is read several times in one request) by introducing a cache local to the one request and shared between requests by using Google App Engine mamcache.

środa, 6 czerwca 2012

Google App Engine, Spring and session bean

I created very simple Spring application with one session bean. Sources for : bean class (in Google App Engine environment it requires to be Serializable), servlet source code and web.xml (whole project source code) . I expected that every page refresh will increase the id attribute in PersonBean. But it works only in development environment, after deploying to production server the bean is persisted only for the first time (it keep number 1) - any next refresh does not change it, as is the first bean version has been persisted for good.
It was changed after setting additional attribute to session which does have not any meaning.

req.getSession().setAttribute("attrib", "X");

After adding this setting it works as expected also in production environment. I can hardly understand this behaviour - it looks that setting any attribute to session triggers also session bean being persisted at the end of the request.