Posts Tagged ‘AppEngine’

Here are some limitations to consider when we plan using AppEngine. They are most likely due to clouds constraints and consistent performance concern. We can read a compatibility list of various J2EE frameworks and supported Java APIs.

1000 files limit
Each application is limited to 1000 files. Then when using Cappuccino we have to use Press tool (with flatten option – to be tested with 0.71 version as it was broken in 0.70 beta) or remove the .j files (keep only the .sj). For CP2JavaWS, images resources from unused components were also removed (load time should be however better using Press).
As the concept of physical machine is gone with clouds computing, we cannot write in files, and we can only read files in WEB-INF (or accessibles through the classloader).

1000 results limit / limited offset for requests
For performance concerns, requests results are limited to 1000 records. Moreover the offset cannot be higher (requests return 0 records when the offset reaches the 1000th position). Then we cannot browse a thousands of lines table with limit(offset, count) if not using a filter/condition.
We could still add a criteria on the index to fetch records in multiple parts (for each thousand step). However this would only work if we are sorting by the index (wouldn’t work if sorting by another column). It is the same if using two requests (one for retrievinig the indexes and another using selection where index in (index range) limit count), as the first request result will be limited. Same problem if we use a temporary sort table (the result from the select request used to retrieve the data to insert is limited), moreover that solution requires to compute again the temporary table if the sort criteria (or condition criterias) vary.
One solution would be to define a special key (see Python section Queries on Keys, _key_), in memory managed and without read limit (same section for Java/JDO doesn’t however include these informations). It would require to be able to modify dynamically this key (as it depends from the sort column), or to define for each sort column an additional column (setted alongwith each insert), composed with the sort column value and index value. We could then add to the request a _key_>previous limit value criteria (value of the composite column fot the last element retrieved during the last 1000 elements fetch). Adding a criteria on the primary key wouldn’t work if using another sort column, as the criteria applies before sorting (that problem is worked around if the criteria uses the composite column that corresponds to the sort column).
We can however assume that search criterias have to be refined/tighten if more than 1000 results are expected.

Mapping limitation / caches synchronization
Hibernate isn’t supported as it instancies statically a UUID generator, using inetAddress class (is among the restricted/unsupported APIs by AppEngine, as well as other machine related instructions : threads, etc.) Then a modified version of an Hibernate class is provided in CP2JavaWS (uses code from JUG framework instead).
The demo of CP2JavaWS uses an HSQLDB in-memory database to easier the example install (no databse server to create), and table and initial data are created at launch time (from a context listener). The values of the third column (age) are generated randomly, in order to allow testing of sort feature. Then these values can change depending the timeframe we access the application (as these values are in memory, they are different from an application instance to another). That isn’t a problem however for that demo (no persistance required).
We could configure an url to a database server, however it would have to be hosted elsewhere (if not datastore). And we cannot also use a local file to persist the database (only read allowed).

The main concern is memory synchronization between cloud nodes (application instances), as mapping solutions use two objects caches. The first level cache (one per persistanceManager/MappingSession – typically per user session) allows to compare a working object copy with the corresponding original (fieldLocking), and second level cache (one per persistanceManagerFactory/SessionFactory) allows to compare orignal object from persistanceManagers with current corresponding objects in the central cache (that is necessary for optimistic locking. Direct access – back door – to the database without passing through the persistanceManagerFactory – that is generally retrieved from JNDI – is forbidden, in order to keep integrity).

The GAE datastore is based on the DataNucleus mapping framework (implements JDO and JPA). By default the level 2 cache is off in DataNucleus, however by activating it (through settings) we can choose among various implementations : EHCache, Oracle Coherence, memcached, etc. In that list only Oracle Coherence and memcached allow work in a distributed environment (second level cache replication) – also the case for more recent versions of EHCache. Thanks to the plugin architecture in DataNucleus we can develop extensions to use another cache framework that manages distributed mode : JBoss cache, OSCache, Terracotta, etc.
The Google datastore uses a proprietary implementation of JCache specification (JSR107) to allow a distributed mode, and manages replication automatically. DataNucleus provides a manual API to manage datastores replication (for example the JDOReplicationManager to synchronize PersistanceManagerFactory). The AppEngine SDK also provides Memcache APIs, to manage (manually) replication of custom objects if needed.

Replacing DataNucleus with Hibernate (can use distributed caches like JBossCache, OSCache, Coherence and more recent versions of EHCache) in a GAE application would require to have hooks into the replication process (synchronize the HibernateSessionFactory). The distributed caches configuration (static) would however require to know exactly the nodes hosts, and GAE doesn’t provide such information.

Finally we are tied with the Google datastore, with the following restrictions (do not come from Datanucleus limitations) : no aggregate requests, no polymorphic requests, limited filters, limited joins, limited many-to-many relations support, etc.
As BigTable isn’t relational, JDO looks interesting as it isn’t restricted to relational datastores (contrary to JPA). DataNucleus manages various datastore types, and an extension (plugin) to BigTable had to be developed by Google (notably to manage access through JPA interface). Despite the GAE datastore being at a higher level of abstraction than BigTable, some limitations seem directly tied with that implementation choice, whose goal is to provide consistent response time whatever the request (thus the above limitations).

No support for SOAP webservices
This shouldn’t be a problem as recent RDA solutions (GWT and Cappuccino/CP2JavaWS) use JSON (enhanced with proprietary fields).

Inter-applications communication
Applications have to use URL Fetch APIs from AppEngine SDK to communicate.

Tools and WTP integration
The Eclipse plugin allows to create a web project (with AppEngine webapp configuration file, and required jars – have to be added manually to the buildpath), but doesn’t provide integration with WTP (and no mean to stop the AppEngine server once stared – have to use WTP stop button). We can in fact deploy an AppEngine project from a WTP project, by renaming WebContent to war, and by adding the appengine-web.xml file to WEB-INF folder of the WTP project. However we then haven’t support for classes enhancement (required for the datastore JDO and JPA implementations). We could probably add this automatic task by adding the appropriate builder in the .project file.
The enhancement step required for the mapping adds to the previous constraint of client code generation if using GWT. Some JDO solutions like LIDO still allowed to remove the enhancement step if needed (leading however to some performance hit as expected).

A new version of CP2JavaWS is available at sourceforge :

Note : now uses json2lib stringify and parse methods from latest Cappuccino main branch (and future 0.71 version). Then it does not work for version 0.7 (or you should replace these methods manually by these from CPValue). These two new methods should lead to general better performance.

– Modified server-side demo HabilitationServiceImpl to return true for the 
genericDAOService methods.

– CP2JavaWSEndpoint init had a bug with testing presence of sameDomain 
argument : replaced if(aSameDomain) with if (aSameDomain!=nil).

– CP2JavaWSTableViewDelegate : in sendSynchRequest, added the test for 
sameDomain (was ok for sendAsynchRequest) to use JSONP mode if not same 
domain.

– Modified hibernate config .hbm file to use assigned generator for the id, as AppEngine does not provides the InetAddress class.
However Hibernate’s SessionFactoryImpl still owns a static field of type UUIDHexGenerator that is always initialized, and that extends AbstractUUIDGenerator, whose uses InetAddress in its static init part.
Then had to redefine AbstractUUIDGenerator (included in same original package name in the demo project’s src folder) : when deploying to AppEngine we have to comment the line that uses InetAddress and uncomment the line that uses a random generated address (based on JUG project, also licenced under LGPL).

A working demo is now available on AppEngine (same code as the included webapp example except the endpoint’s url and use of the random generated address in the redefined AbstractUUIDGenerator class) :

http://cp2javaws.appspot.com

(tested on Safari 4 and Firefox 3.0.x. Is ways faster on Safari, however performance should be improved with upcoming Firefox 3.5 release).

Google added support for Java in its free AppEngine hosting platform. Among additional services provided is BigTable database, that is optimized for scalability. Use of the datastore (BigTable is accessed through the DataNucleus Access Platform open source framework), is really easy (complies with JDO spec – also supports JPA).

To deploy a war we have to add an appengine-web.xml file into the WEB-INF (that file contains the required application id, and some other optional parameters).

Google also provides an Eclipse SDK, that brings a GWT project wizard, and server test (simulates the AppEngine, including the services and the datastore) !

Details for uploading a war can be found here. Scalability features of the solution are great, for the database and static files.

See also : If you elect to use the free appspot.com domain name, the full URL for the application will be http://application-id.appspot.com/. You can also purchase a top-level domain name for your app, or use one that you have already registered.
There is no way to delete an application in App Engine. You can register up to 10 application IDs per Google account. If you do not want to use one of your allotted application IDs for the tutorial, you can just read this section, and refer to it later when you are ready to upload your first application.

It is stated that the offer is for now restricted to some users, however I did success registering my first application id using my Google account (a verification process of account through sms is required).

We can then now easily host a Cappuccino application : client part in the war’s web root (index.html entry point, Cappuccino application’s .js file that has been optimized through press tool, objective-j.js file), services code (Java classes), Spring jars and configuration, CP2JavaWS jars and configuration.
Download of application’s js files can be optimized (can use separate servers, others than the main server that serves dynamic resources -servlets/jsp) if they are marked as static in the application configuration file (via exclude/include parameters). By default JSP files and resources residing in the WEB-INF folder are considered dynamic.