Blog do projektu Open Source JavaHotel

wtorek, 26 czerwca 2012

They are not different

Girl with a Pearl Earring
My father had worked with Catholics and told me that they were no different from us. If anything they were less solemn. They liked to eat and drink and sing and game. He said this almost as if he envied them. 

czwartek, 21 czerwca 2012

DB2, HADR and server code

Introduction
So far we have been talking about data. But what about server sql code. We have deployed new version of our very important packages, stored procedures, trigger and UDFs and we have to switch to secondary server.
Test continue
Let's create a new UDF.
CREATE OR REPLACE FUNCTION MULTIPLY (IN FACTOR1 INTEGER, IN FACTOR2 INTEGER)
 RETURNS INTEGER
 NO EXTERNAL ACTION
F1: BEGIN ATOMIC
  RETURN FACTOR1 * FACTOR2; 
END
@
Then, being connected to primary server (db2had1) deploy it.
db2 -td@ -sf MULT.sql
And check how it works.
db2 "values(multiply(2,2))"

1          
-----------
          4

  1 record(s) selected
Make takeover, switch to secondary server 
And now force takeover on a secondary server(db2had2) and down the primary server (db2had1). All the time the client is connected. When takeover is finished rerun the command again.
 db2 "values(multiply(2,2))"
SQL30108N  A connection failed but has been re-established. Special register 
settings might have been replayed. Host name or IP address of the new 
connection: "think". Service name or port number of the new connection: 
"50002".  Reason code: "1".  SQLSTATE=08506
The sql error code is as expected and we can rerun the command again.
db2 "values(multiply(2,2))"

1          
-----------
          4

  1 record(s) selected.
So after switching to secondary server 2 by 2 is 4 all the time. As we see also sql server code is replicated to secondary server. Now make a little mischief - modify our business procedure a little bit.
CREATE OR REPLACE FUNCTION MULTIPLY (IN FACTOR1 INTEGER, IN FACTOR2 INTEGER)
 RETURNS INTEGER
 NO EXTERNAL ACTION
F1: BEGIN ATOMIC
  RETURN FACTOR1 * FACTOR2 + 1;
END
@
And deploy modified UDF - this time it is deployed to db2had2 acting as primary server. db2 -td@ -sf MULT.sql DB20000I The SQL command completed successfully.
Switch the role again, make db2had1 primary server. 
Log in to db2had1 and make db2had1 as primary
db2start
db2 activate database sample
db2 takeover hadr on database sample
 db2 get db cfg for sample | grep HADR
 Rola w bazie danych HADR                                                 = PRIMARY
All the time the client is connected. And now rerun the command again - keep in mind that it throws SQL30108N for the first time after server switching. And grande finale :
db2 "values(multiply(2,2))"

1          
-----------
          5

  1 record(s) selected.
Summary
So not only data but also sql server code is duplicated to secondary server. No worry about update or upgrade server code in HADR environment. It is good news. But there is also bad news. DB2 is able to duplicate only sql code. So if we have external procedure or function (for instance C++ or Java) the corresponding .jar or .so file is not replicated and it should be done manually.

środa, 20 czerwca 2012

DB2 and HADR

Introduction
HADR stands for High Availability and Disaster Recovery. The general idea is quite simple - we have two servers : primary and secondary running in parallel. The client is connected to primary and all changes made on primary server are replicated to the secondary server. In case of any failure of primary server the client is switched to secondary and works with secondary until primary is restored. Then all changes made on secondary server are replayed to primary and we can return to our first configuration.
It is described in  more detail here.
There is also an IBM Redbook covering the topic in every possible way.
How to set up HADR for testing and evaluating purpose.
Unfortunately - all that stuff seems very complicated at first glance. But good news is that actually it is very simple - we can set up HADR on single machine (using two instances) in no more then 10-20 minutes. Of course - it does not make any sense to run such a configuration in a customer environment but it is enough to test how our application will behave after connecting to HADR installation.
The detailed procedure how to set up HADR on a single machine for testing purpose is described here. But how to check that HADR is running and behaving as expected.
Test HADR - do something on primary server and switch roles having client connected all the time.
Log in to client machine and connect to SAMPLE database installed on primary (db2had1) server and do something.
db2 connect to sample user db2had1
db2 "create table testx (int x)"
db2 "insert into testx values(1)"
Now we want to switch off primary machine for some reason but - of course - the show must go on.
So log in to the secondary server and switch roles.
ssh -X db2had2@think
db2 takeover hadr on database sample
Switch off the primary server and start cleaning it. If it is AIX machine then probably nobody has touched it for ages and is covered with dust clods.
ssh -X db2had1@think
db2 deactivate database sample
db2stop
Pay attention that the client is connected all the time.
Now run statement from the client machine
db2 "select * from testx"
After running this statement for the first time the SQL error is thrown.
SQL30108N A connection failed but has been re-established. Special register settings might have been replayed. Host name or IP address of the new connection: "think". Service name or port number of the new connection: "50009". Reason code: "1". SQLSTATE=08506
But it is as expected, it informs us that switching roles has happened but we can safely repeat the last statement.
db2 "select * from testx"
X
-----------
1
What has happened ?
  • We are now connected by all means to secondary server, primary server is stopped
  • The reconnection took place automatically, the client did not connect again.
  • All changes : DDL (CREATE TABLE) and DML (INSERT INTO) has been replicated to secondary server, secondary server contains the latest commited version of database on primary server.
  • We can continue our business as usual.
Continue test, primary server is still not ready.

Finally it is high time to call it a day and go home - so disconnect.
db2 terminate
Tomorrow we start again.
db2 connect to SAMPLE user db2had1
Connection is successful and we can continue. The primary server is not active so actually we are connected to standby (now acting as primary). Pay attention that we still connect to db2had1 - the connection parameters and credentials are the same regardless which server is acting as primary. It is very important - it means that no changes are necessary in application in case of failover. So now continue our very important work:
db2 "insert into testx values(2)"
Primary server is ready

Assume that our primary server is ready and we want to come back to the preferred configuration.

So log in to db2had1 and activate server.
ssh -X db2had1@think
db2start
db2 activate database SAMPLE
Now check the role - is db2had1 primary or secondary.
db2 get db cfg for SAMPLE | grep HADR
Rola w bazie danych HADR = STANDBY
So although db2had1 is alive again it acts now as secondary server. So in order to have it as primary we have to force takeover again.
db2 takeover hadr on database sample
and check again
db2 get db cfg for SAMPLE | grep HADR
Rola w bazie danych HADR = PRIMARY
So now db2had1 is working as primary.

But what about the client still connected to the database ?
db2 "select * from testx"
For the first time again the SQL error SQL30108N is received But after rerunning the command again:
db2 "select * from testx"
X -----------
1
2
2 record(s) selected.
So - all changes made on db2had2 while db2had1 was not ready has been replayed and we can continue our business as usual.

czwartek, 14 czerwca 2012

JSPWiki in the Cloud and security

I've just  deployed a new version of JSPWiki in the Cloud (new sources are commited also) and it seems that authentication is working. User can login and logout.
But what seemed to be quite easy at the beginning ended up with a huge refactoring and rebuilding.
Firstly I had to remove an error which caused me a lot of headache. It worked in development environment but failed after deploying to production server with nice looking error message like:

/JSON-RPC
java.lang.IllegalStateException: WRITER
 at org.mortbay.jetty.Response.getOutputStream(Response.java:594)
 at com.metaparadigm.jsonrpc.JSONRPCServlet.service(JSONRPCServlet.java:137)

What was more difficult this error popped up in some specific scenario. The same execution path first time worked as expected but next time failed. Because debugging in production server is not possible the only way to find a bug is adding next and next trace messages.
Finally I was able to find a malicious code in WikiJSPFilter.java. It seemed that this code:

response.getWriter().write(r);
tried to write after response was commited. So the solution was to enclose that code with something like:
if (!response.isCommitted()) {
   ...
  response.getWriter().write(r);
  ...
}
But I still do not understand why this code worked in the main trunk of JSPWiki and why the execution path worked for the first time and the next time the same execution path failed.

But after overcoming this problem the next appeared:

Uncaught exception from servlet
java.lang.IllegalArgumentException: Task size too large
 at com.google.appengine.api.taskqueue.QueueImpl.fillAddRequest(QueueImpl.java:347)

It seemed that size of data assigned to the session grew too large. The only solution was to reduce drastically the size of data being persisted with the session. So I decided to make WikiEngine.java class (the main culprit) scoped for the request only and WikiSession.java class (with user credentials attached) to be session scoped.
Next step was to rebuild WikiEventManager.java which kept a lot of data as static. In the cloud environment dynamic data cannot be persisted as static because nobody guarantee that next request will be executed in the same JVM.
So finally I made a lot of classes as a request scoped and put them under the Spring control as a bean. But I discovered that it did not go quite easy because of mutual dependencies between them. So I had to spend a lot of time trying to understand and untangle this dependencies.
But finally it seems working. The main problem now is to improve performance because almost everything is initialized at every request. Another problem is to reduce number of reads from the datastore (the same data is read several times in one request) by introducing a cache local to the one request and shared between requests by using Google App Engine mamcache.


środa, 6 czerwca 2012

Google App Engine, Spring and session bean

I created very simple Spring application with one session bean. Sources for : bean class (in Google App Engine environment it requires to be Serializable), servlet source code and web.xml (whole project source code) . I expected that every page refresh will increase the id attribute in PersonBean. But it works only in development environment, after deploying to production server the bean is persisted only for the first time (it keep number 1) - any next refresh does not change it, as is the first bean version has been persisted for good.
It was changed after setting additional attribute to session which does have not any meaning.
req.getSession().setAttribute("attrib", "X");
After adding this setting it works as expected also in production environment. I can hardly understand this behaviour - it looks that setting any attribute to session triggers also session bean being persisted at the end of the request.