Blog do projektu Open Source JavaHotel

środa, 20 czerwca 2012

DB2 and HADR

Introduction
HADR stands for High Availability and Disaster Recovery. The general idea is quite simple - we have two servers : primary and secondary running in parallel. The client is connected to primary and all changes made on primary server are replicated to the secondary server. In case of any failure of primary server the client is switched to secondary and works with secondary until primary is restored. Then all changes made on secondary server are replayed to primary and we can return to our first configuration.
It is described in  more detail here.
There is also an IBM Redbook covering the topic in every possible way.
How to set up HADR for testing and evaluating purpose.
Unfortunately - all that stuff seems very complicated at first glance. But good news is that actually it is very simple - we can set up HADR on single machine (using two instances) in no more then 10-20 minutes. Of course - it does not make any sense to run such a configuration in a customer environment but it is enough to test how our application will behave after connecting to HADR installation.
The detailed procedure how to set up HADR on a single machine for testing purpose is described here. But how to check that HADR is running and behaving as expected.
Test HADR - do something on primary server and switch roles having client connected all the time.
Log in to client machine and connect to SAMPLE database installed on primary (db2had1) server and do something.
db2 connect to sample user db2had1
db2 "create table testx (int x)"
db2 "insert into testx values(1)"
Now we want to switch off primary machine for some reason but - of course - the show must go on.
So log in to the secondary server and switch roles.
ssh -X db2had2@think
db2 takeover hadr on database sample
Switch off the primary server and start cleaning it. If it is AIX machine then probably nobody has touched it for ages and is covered with dust clods.
ssh -X db2had1@think
db2 deactivate database sample
db2stop
Pay attention that the client is connected all the time.
Now run statement from the client machine
db2 "select * from testx"
After running this statement for the first time the SQL error is thrown.
SQL30108N A connection failed but has been re-established. Special register settings might have been replayed. Host name or IP address of the new connection: "think". Service name or port number of the new connection: "50009". Reason code: "1". SQLSTATE=08506
But it is as expected, it informs us that switching roles has happened but we can safely repeat the last statement.
db2 "select * from testx"
X
-----------
1
What has happened ?
  • We are now connected by all means to secondary server, primary server is stopped
  • The reconnection took place automatically, the client did not connect again.
  • All changes : DDL (CREATE TABLE) and DML (INSERT INTO) has been replicated to secondary server, secondary server contains the latest commited version of database on primary server.
  • We can continue our business as usual.
Continue test, primary server is still not ready.

Finally it is high time to call it a day and go home - so disconnect.
db2 terminate
Tomorrow we start again.
db2 connect to SAMPLE user db2had1
Connection is successful and we can continue. The primary server is not active so actually we are connected to standby (now acting as primary). Pay attention that we still connect to db2had1 - the connection parameters and credentials are the same regardless which server is acting as primary. It is very important - it means that no changes are necessary in application in case of failover. So now continue our very important work:
db2 "insert into testx values(2)"
Primary server is ready

Assume that our primary server is ready and we want to come back to the preferred configuration.

So log in to db2had1 and activate server.
ssh -X db2had1@think
db2start
db2 activate database SAMPLE
Now check the role - is db2had1 primary or secondary.
db2 get db cfg for SAMPLE | grep HADR
Rola w bazie danych HADR = STANDBY
So although db2had1 is alive again it acts now as secondary server. So in order to have it as primary we have to force takeover again.
db2 takeover hadr on database sample
and check again
db2 get db cfg for SAMPLE | grep HADR
Rola w bazie danych HADR = PRIMARY
So now db2had1 is working as primary.

But what about the client still connected to the database ?
db2 "select * from testx"
For the first time again the SQL error SQL30108N is received But after rerunning the command again:
db2 "select * from testx"
X -----------
1
2
2 record(s) selected.
So - all changes made on db2had2 while db2had1 was not ready has been replayed and we can continue our business as usual.

Brak komentarzy:

Prześlij komentarz