Blog do projektu Open Source JavaHotel

poniedziałek, 14 września 2009

Migration to Google App Engine


Google App Engine for Java (beta version) was released on April 2009. JPA (Java Persistence Api) interface encourages developers to port existing applications to Google clouds. It is possible but does not come cheap.


Introduction


John Keats "Hyperion"
“But cannot I create?   
“Cannot I form? Cannot I fashion forth   
“Another world, another universe,   
“To overbear and crumble this to nought?   
“Where is another chaos? Where?”

Unfortunately, for the time being we cannot form another world from newly found chaos. To avoid being crumbled we have to follow what more mighty discovered and settled.

It is very enticing to migrate to Google clouds and to be near heaven.  JPA implementation promises a lot and makes all process  possible. But it does not come cheap, below I highlight some problems I found during migration OpenSource project (EJB3/JPA) to Google App Engine keeping backward compatibility.

http://code.google.com/p/javahotel/

Problems and solutions

Problem #1


Google App Engine does not support @MappedSuperClass (open issue), also relationships like @ManyToMany could not work as expected (I will handle this problem later). Type 'com.google.appengine.api.datastore.Key' is not supported outside Google App Engine Google App Engine supports only GeneratedValue(strategy = GenerationType.IDENTITY). It means that it is rather unlikely that any no trivial entity classes will move to Google App Engine/JPA smoothly.

Solution:
Creating two different sets of entity classes (data classes), one for Google App Engine/JPA  and the second for non-Google App Engine/JPA. It means redundancy and code duplication but seems much simpler than trying to achieve 100% percent source code compatiblity. But this approach comes with a neccessity to have a different method for setting up a development environment and also two different ways for packaging and deploying the application. It comes also with all problems connected with having duplicated code like any other poor programming technics.

Example:

Google App Engine entity beans:
http://code.google.com/p/javahotel/source/browse/#svn/trunk/javahotel/src/hotelgaenginejpa/com/javahotel/db/hotelbase/jpa

non Google App Engine entity beans:
http://code.google.com/p/javahotel/source/browse/#svn/trunk/javahotel/src/hotelejb3jpa/com/javahotel/db/hotelbase/jpa


Problem #2

EJB3 annotation like @Local, @Remote, @TransacationAttribute are not supported. Also looking up beans through JNDI is not supported.

Solution:
Very simple, just create your own empty declaration. Also some substititution for remote EJB interface looking up should be created.

Example:
Empty EJB3 annotations:
http://code.google.com/p/javahotel/source/browse/#svn/trunk/javahotel/src/emptyejb3

Looking up substitution ("ServiceLocator" pattern proves to be very useful).
http://code.google.com/p/javahotel/source/browse/trunk/javahotel/src/hotellocallogin/com/javahotel/loginhelper/


Problem #3

It is not a Google App Engine limitation but simply it stems from the fact that Google App Engine environment is a clustered environment and also the application itself should be "cloud" aware. In standard J2EE application it is a matter of choice, application can be or cannot be run in clustered environment. In the case of Google App Engine application there is no choice.

Solution:
Application should be analyzed before migration if it is ready to run in the clustered environment. For instance: cannot be assumed that every request is run in the same JVM so any method keeping some data between requests other then datastore should be changed. Statefull beans are not supported and there is also no any substitution to functionality offered by this type of beans. Making application "cloud" aware could be very simple or very complicated if application relies heavily on statefull beans.

There is a very good presentation on "programming in the cloud".
http://www.infoq.com/presentations/programming-cloud-gregor-hohpe


Problem #4

Google App Engine implementation of JPA queries comes with a lot of limitations. It is listed in http://code.google.com/appengine/docs/java/datastore/queriesandindexes.html. Also Google App Engine does not have any SQL engine behind. It means that not onlyJPA createNativeQuery is not supported and does not make any sense but also a lot (and probably the most) JPA/JPQL createQuery will not work as expected.

Solution:
One solution is simply to rewrite all queries, reduce them to the "lowest common denominator" queries supported by Google App Engine and then, by means of multiply queries and filtering in memory, resolve all queries. But, of course, nobody will be happy with this "poor man" policy.
Better approach is to hide all queries behind a facade  then split the code into two parts: one for Google App Engine/JPA and the second for non-Google App Engine/JPA. This way "lowest common denominator" effect could be avoided but it comes with the cost of redundancy and code duplication.
Example:
Query facade Google App Engine/JPA
http://code.google.com/p/javahotel/source/browse/trunk/javahotel/src/hotelgaenginejpa/com/javahotel/db/hotelbase/queries/

Query facade for non Google App Engine/JPA
http://code.google.com/p/javahotel/source/browse/trunk/javahotel/src/hotelejb3jpa/com/javahotel/db/hotelbase/queries/

Another problem connected with Google App Engine/JPA queries.
Programmatic approach (multiply queries and filtering in memory) will work but could be very inefficient. Resolving queries by datastore engine is much more effective. More data should be read through by query, the greater is the difference between datastore engine and in memory filtering. The only solution (suggested by Google App Engine doc) is to break the "normalization" dogma and duplicate some properties. But it means changing the entity classes and the all persistence logic and makes backward compatibility more diffucult.

Problem #5:

Google App Engine comes with a different approach to relationships. Instead of standard terminology based on multiplicity (OneToOne, OneToMany ..) it uses its own terminology: "owned" and "unowned" relationships (http://code.google.com/appengine/docs/java/datastore/relationships.html). , "Owned" relationship is fully supported, "unowned" relationship is not supported at all. There is no simple mapping between relationships based on multiplicity and "owned/unowned", so it is not possible to decide which relationships falls into which category by looking at the definition. It means that porting existing entity classes keeping the same persistence logic could be non trivial.


Solution:
Analyze existing relationships, not only by looking at the annotations attached, but also analyze usage and logic connected with this relationships and decide if this relationships are "owned" or "unowned". "Unowned" relationship are not supported, there is a hint in Google App Engine doc (http://code.google.com/intl/pl/appengine/docs/java/datastore/relationships.html#Unowned_Relationships) "The App Engine implementation of JDO does not yet implement this facility, but don't worry, you can still manage these relationships using Key values in place of instances (or Collections of instances) of your model objects". This problem can be resolved in various ways, some logic should be implemented before persisting and after loading data classes.

Example:

http://code.google.com/p/javahotel/source/browse/trunk/javahotel/src/hotelgaenginejpa/com/javahotel/db/hotelbase/jpa/RoomStandard.java
Annotation: @KeyObject(keyField = "hotelId", objectField = "hotel")

(Annotation handling)
http://code.google.com/p/javahotel/source/browse/trunk/javahotel/src/hotelgaenginejpa/com/javahotel/db/hotelbase/jpa/AfterBeforeLoadAction.java


Problem #6

Connected with Problem #5. How to avoid spoiling application code with constant adding something like 'beforePersist' or 'afterLoad' in every place where loading or persisting data is performed.

Solution:
Avoid calling JPA directly from application code. Create additional facade and hide core JPA there. This way all "dressing" is added in one module.

Example:

http://code.google.com/p/javahotel/source/browse/trunk/javahotel/src/hoteldbjpa/com/javahotel/dbjpa/ejb3/JpaEntity.java


Problem #7

Changing to 'strategy=GenerationType.IDENTITY' (the only supported by Google App Engine) could cause problem. It is not Google App Engine limitation, it works as expected. But using  'strategy=GenerationType.IDENTITY'  could mean a problem if your application was banking on sequential or increasing order of keys (GenerationType.SEQUENTIAL).

Solution:
If 'GenerationType.SEQUENTIAL' or 'GenerationType.AUTO' strategy was used before migrating to Google App Engine it is necessary to analize the flow and make sure that no any specific order of keys is assumed. If so, than refactoring is needed before migrating to get rid of this assumption.

Problem #8

Debugging Google App Engine application after deploying to production environment is difficult. Cannot use standard java debugger.

Solution:
Debug and test application in local (development) environment and deploy to production environment only having all tests passed. But sometimes it is not enough and debugging application in production environment is necessary. But the only way is via cycles: adding more logging messages, deploy, run, analyze logs and go to the beginning with the next set of logging messages.

Problem #7

Cannot use query inside transaction. Only specific type of queries could be run inside transaction.

Solution:
Some refactoring is needed. All queries should be placed outside transaction boundaries. It could be easy or complicated, it depends how this query is entangled with partial results of transaction.

Problem #8

Cannot regard Long as a general type for identyfying and retrieving instances of entity classes (data classes). For instance: class being a child in a "owned relationship" should always has a 'Key' type declared as its key. Because sometimes it is necessary passing to and fro identity key between server and client part it could cause a problem. Additionaly, 'Key' type is not supported on client side

Solution:
This problem could be resolved in a different ways. Below is an example.

Example:
Create additional type to pass entity identifier between server and client side and split the code for decoding/coding and retrieving entity class object (data class) into two parts: Google App Engine/JPA and non Google App Engine/JPA.

Additional type.
http://code.google.com/p/javahotel/source/browse/trunk/javahotel/src/hoteltypes/com/javahotel/types/LId.java

Google App Engine/JPA implementation:
http://code.google.com/p/javahotel/source/browse/trunk/javahotel/src/#src/hotelgaenginejpa/com/javahotel/db/jtypes

non Google App Engine implementation:
http://code.google.com/p/javahotel/source/browse/trunk/javahotel/src/#src/hotelejb3jpa/com/javahotel/db/jtypes

Problem #9

Having an existing suite of EJB3 unit (junit) tests how to run them in Google App Engine. The problem is not Google App Engine limitation, it is a problem similar to: how to test local (not remote) EJB3 interface.

Solution:
This problem can be solved in different ways, below is an example.

Example:
Deploy junit.jar together with the rest of the application and run the tests on the server side. Simple servlet is necessary to start the process. Some simple coding is necessary to evaluate result. The solution implemented is very basic and rough.
Important information: this test "harness" is possible to run only in the local (development) environment. Cannot be run in the production environment in regard of 30 secund limitation per one request.

http://code.google.com/p/javahotel/source/browse/trunk/javahotel/src/gaetest/com/javahotel/javatest/

Another ways to test Google App Engine application
http://code.google.com/appengine/docs/java/howto/unittesting.html

Assuming that client is based on GWT (but keep in mind that it is a bad practice to test application only via UI).
http://code.google.com/intl/pl/webtoolkit/tutorials/1.6/JUnit.html


Problem #10

There is a limitation of 1000 entities returned in one query, also request time cannot be longer than 30 second. It is not a problem for a demo version or for an application that this limitation does not matter. But otherwise could have a significant impact because the application will not work when this limitation is hit.
http://code.google.com/appengine/docs/java/datastore/overview.html#Quotas_and_Limits

Solution.
There is no simple and ready to use solution. Before migrating existing application the analysis is needed if there is a scenario when that limitation could be hit. If so, a redesign and refactoring is needed. It could be simple or very complicated task.

Problem #11

Google App Engine comes with a specific approach to transactions.  Traditionally J2EE applications use  EntityTransaction interface and begin - commit/rollback method to set transaction boundaries (could be managed manually or by a framework). Google App Engine follows the same pattern but inside transaction boundaries only objects belonging to the same "entity group" could be manipulated. The first object touched defines the whole "entity group".

Entity group: http://code.google.com/appengine/docs/java/datastore/transactions.html#Entity_Groups

"Entity group" is defined by a primary key of the object. It means that during designing of the entity classes also transaction logic should be kept in mind. After object is persisted the primary key cannot be changed later. 
Also "optimistic" approach is used. Two (or more) competing requests are not hanged on "commit" method but less successful throws exception inside transaction and should have rescue plan on its own (for instance: retry transaction).

Solution:
There is no simple and ready to use solution or any obvious rule of thumb. It is unlikely that existing transactions will run successfully after migrating to Google App Engine. A thorough analysis of existing transaction managing code is necessary and changes applied. It could be simple or very difficult, not only breaking existing multi-record transactions into smaller, more atomic grouped around entity groups, but it could also involve changes in entity classes and a whole persistence logic.

http://code.google.com/events/io/sessions/DesignDistributedTransactionLayerAppEngine.html



Problem #12

The last but not least.
Google App Engine sandbox comes with a lot of restrictions and limitations.

http://code.google.com/appengine/docs/java/jrewhitelist.html
http://code.google.com/appengine/docs/java/runtime.html#The_Sandbox

Also application can use external tools which are not supported on Google App Engine.

Solution:
There is no any simple solution. Before migration application should be analyzed and refactored (if necessary) to be Google App Engine compatible. The same goes for external tools or jars used. This process could be very simple or very time consuming and complicated or impossible at all.

Summary, some advice on migrating.

Before doing any migration task it is necessary to have a sound understanding of Google App Engine concepts like: "entity groups", "owned relationships". Also well known concepts like "transactions" and "primary keys" are loaded with additional meaning. Spending some time on learning will save much more time in the future.
It is rather unlikely to achieve 100% source compatibility, the better way is to take on a more realistic approach and break "no redundancy" and "no duplication" dogma.

Before migration thorough analysis and "feasibility study" should be performed. The crucial points are:
  • External tool and jar used (Problem #12)
  •  Not being "clustered ready" (Problem #3),
  •  Limitation on query and request time (Problem #10).
  •  Transactions (Problem #11)


Particularly Problem #10 is very important because it could be easily neglected at the beginning of migration process. But neglecting this problem could have devastating effect later. Out of the blue application refuses to work without any workaround available.


Before doing any changes a huge and thorough suite of unit tests should be created (if not done already) and passed. Migration process involves a lot of changes and fixes with a great risk of regression and side effects.

Firstly apply all changes in existing, non-Google App Engine solution, and make sure that existing application still works as expected and no regression was injected. Then start development for the Google App Engine version. After that it is highly probable that again some changes are necessary for the non-Google App Engine solution and this cycle could be repeated several times in a row.

Some general thoughts on Google App Engine for Java.

Google App Engine is not just another J2EE container or another database. So it is unrealistic expectation for Google App Engine to be a better JBoss or a better MySql.

Google App Engine opens a window to Google internal infrastructure for the scalable, clustered application and it is the main advantage incomparable to the other solutions. The purpose of providing JPA/JDO interface is to make this process more easy for Java developers, to make a learning curve less steep.

The Google App Engine JPA/JDO implementation is full of holes but I don't think that the future of Google App Engine for Java is to have JPA or JDO well cooked. It is too much influenced by underlying technology and it is rather unlikely that, even having JPA/JDO full implemented, to forget that it is only a thin abstraction layer on the Big Table.

Google App Engine for Java is at his early childhood, it is an evolving platform. Of course, issues should be resolved and bugs fixed. But the main problem is the lack of good examples, patterns and best practices how to make the best usage of Big Table and advantages it offers: distribution, load balancing, replication. The quality of this solution is tested by millions of users every minute, so how to catch this train ?

Also some lessons should be learned as quick as possible. For instance, it is rather unlikely that migrating any EJB3/JPA application to Google App Engine makes any sense. Not only because of the problems described above, but simply that I cannot imagine any large scale business, database application without effective and general purpose query engine.