Google AppEngine limitations / workarounds

Posted: June 19, 2009 in IT/Dev, Troubleshooting
Tags: 1000 files, AppEngine, cache synchronization, compatibility, Datanucleus, enhancement, Google, Hibernate, Java, JCache, JDO, JPA, limit, limitations, mapping, memcache, offstet, restricted, unsupported, workarounds, WTP integration

Here are some limitations to consider when we plan using AppEngine. They are most likely due to clouds constraints and consistent performance concern. We can read a compatibility list of various J2EE frameworks and supported Java APIs.

1000 files limit
Each application is limited to 1000 files. Then when using Cappuccino we have to use Press tool (with flatten option – to be tested with 0.71 version as it was broken in 0.70 beta) or remove the .j files (keep only the .sj). For CP2JavaWS, images resources from unused components were also removed (load time should be however better using Press).
As the concept of physical machine is gone with clouds computing, we cannot write in files, and we can only read files in WEB-INF (or accessibles through the classloader).

1000 results limit / limited offset for requests
For performance concerns, requests results are limited to 1000 records. Moreover the offset cannot be higher (requests return 0 records when the offset reaches the 1000th position). Then we cannot browse a thousands of lines table with limit(offset, count) if not using a filter/condition.
We could still add a criteria on the index to fetch records in multiple parts (for each thousand step). However this would only work if we are sorting by the index (wouldn’t work if sorting by another column). It is the same if using two requests (one for retrievinig the indexes and another using selection where index in (index range) limit count), as the first request result will be limited. Same problem if we use a temporary sort table (the result from the select request used to retrieve the data to insert is limited), moreover that solution requires to compute again the temporary table if the sort criteria (or condition criterias) vary.
One solution would be to define a special key (see Python section Queries on Keys, _key_), in memory managed and without read limit (same section for Java/JDO doesn’t however include these informations). It would require to be able to modify dynamically this key (as it depends from the sort column), or to define for each sort column an additional column (setted alongwith each insert), composed with the sort column value and index value. We could then add to the request a _key_>previous limit value criteria (value of the composite column fot the last element retrieved during the last 1000 elements fetch). Adding a criteria on the primary key wouldn’t work if using another sort column, as the criteria applies before sorting (that problem is worked around if the criteria uses the composite column that corresponds to the sort column).
We can however assume that search criterias have to be refined/tighten if more than 1000 results are expected.

Mapping limitation / caches synchronization
Hibernate isn’t supported as it instancies statically a UUID generator, using inetAddress class (is among the restricted/unsupported APIs by AppEngine, as well as other machine related instructions : threads, etc.) Then a modified version of an Hibernate class is provided in CP2JavaWS (uses code from JUG framework instead).
The demo of CP2JavaWS uses an HSQLDB in-memory database to easier the example install (no databse server to create), and table and initial data are created at launch time (from a context listener). The values of the third column (age) are generated randomly, in order to allow testing of sort feature. Then these values can change depending the timeframe we access the application (as these values are in memory, they are different from an application instance to another). That isn’t a problem however for that demo (no persistance required).
We could configure an url to a database server, however it would have to be hosted elsewhere (if not datastore). And we cannot also use a local file to persist the database (only read allowed).

The main concern is memory synchronization between cloud nodes (application instances), as mapping solutions use two objects caches. The first level cache (one per persistanceManager/MappingSession – typically per user session) allows to compare a working object copy with the corresponding original (fieldLocking), and second level cache (one per persistanceManagerFactory/SessionFactory) allows to compare orignal object from persistanceManagers with current corresponding objects in the central cache (that is necessary for optimistic locking. Direct access – back door – to the database without passing through the persistanceManagerFactory – that is generally retrieved from JNDI – is forbidden, in order to keep integrity).

The GAE datastore is based on the DataNucleus mapping framework (implements JDO and JPA). By default the level 2 cache is off in DataNucleus, however by activating it (through settings) we can choose among various implementations : EHCache, Oracle Coherence, memcached, etc. In that list only Oracle Coherence and memcached allow work in a distributed environment (second level cache replication) – also the case for more recent versions of EHCache. Thanks to the plugin architecture in DataNucleus we can develop extensions to use another cache framework that manages distributed mode : JBoss cache, OSCache, Terracotta, etc.
The Google datastore uses a proprietary implementation of JCache specification (JSR107) to allow a distributed mode, and manages replication automatically. DataNucleus provides a manual API to manage datastores replication (for example the JDOReplicationManager to synchronize PersistanceManagerFactory). The AppEngine SDK also provides Memcache APIs, to manage (manually) replication of custom objects if needed.

Replacing DataNucleus with Hibernate (can use distributed caches like JBossCache, OSCache, Coherence and more recent versions of EHCache) in a GAE application would require to have hooks into the replication process (synchronize the HibernateSessionFactory). The distributed caches configuration (static) would however require to know exactly the nodes hosts, and GAE doesn’t provide such information.

Finally we are tied with the Google datastore, with the following restrictions (do not come from Datanucleus limitations) : no aggregate requests, no polymorphic requests, limited filters, limited joins, limited many-to-many relations support, etc.
As BigTable isn’t relational, JDO looks interesting as it isn’t restricted to relational datastores (contrary to JPA). DataNucleus manages various datastore types, and an extension (plugin) to BigTable had to be developed by Google (notably to manage access through JPA interface). Despite the GAE datastore being at a higher level of abstraction than BigTable, some limitations seem directly tied with that implementation choice, whose goal is to provide consistent response time whatever the request (thus the above limitations).

No support for SOAP webservices
This shouldn’t be a problem as recent RDA solutions (GWT and Cappuccino/CP2JavaWS) use JSON (enhanced with proprietary fields).

Inter-applications communication
Applications have to use URL Fetch APIs from AppEngine SDK to communicate.

Tools and WTP integration
The Eclipse plugin allows to create a web project (with AppEngine webapp configuration file, and required jars – have to be added manually to the buildpath), but doesn’t provide integration with WTP (and no mean to stop the AppEngine server once stared – have to use WTP stop button). We can in fact deploy an AppEngine project from a WTP project, by renaming WebContent to war, and by adding the appengine-web.xml file to WEB-INF folder of the WTP project. However we then haven’t support for classes enhancement (required for the datastore JDO and JPA implementations). We could probably add this automatic task by adding the appropriate builder in the .project file.
The enhancement step required for the mapping adds to the previous constraint of client code generation if using GWT. Some JDO solutions like LIDO still allowed to remove the enhancement step if needed (leading however to some performance hit as expected).

Comments

Antonio says:

April 20, 2010 at 11:25 am

Perfect idea about using HSQLDB+MemCache. Thanks!

Reply
Jerome Denanot says:

June 25, 2010 at 10:37 am

Some links about mapping a custom domain to an AppEngine application : http://stackoverflow.com/questions/184541/how-are-people-using-google-app-engine-apps-with-their-own-domains
http://stackoverflow.com/questions/1990041/many-custom-domains-for-appengine-instance
http://stackoverflow.com/questions/976127/
http://code.google.com/intl/fr/appengine/docs/domain.html
http://blog.charlvn.com/2009/03/custom-domains-on-google-app-engine.html

Reply

Cjed Audio

Blogroll

Best hits

Archives