Thursday, November 4, 2010

Mapreduce to be built in to Google App Engine Environment

If you read the release notes for the latest Google App Engine python SDK (1.3.8) you might have noticed the available declarations in the app.yaml application configuration has been increased. There are some interesting additions.

The include directive allows the app.yaml file to be broken apart for better reuse. Within the one project this might not make sense, but you could imagine entire utility modules suddenly being a lot more transportable between projects and easily added to an existing application.

The builtin directive, built on top the include functionality, provides easy inclusion of built-in handlers, including appstats, the administration console, the remote API and a 'datastore_admin'.

The 'datastore_admin' builtin seems to be built on the appengine-mapreduce project. As I have previously posted, MapReduce is a methodology of dividing a large task into smaller pieces of work that can be performed in parallel, before being reduced to a single a outcome.

I am excited about the new builtin - since I discovered appengine-mapreduce I have considered it essential for maintaining a large set of data in the datastore.

In case you were excited to try out the new Mapreduce functionality, you need to wait a little longer - until the next release of the SDK at least. Oddly enough the documentation lists the 'datastore_admin' as a valid builtin option, and the libraries are now included in the SDK, but some import errors prevent it from being used (namely simplejson and graphy.backends).

As this discussion thread on the Google Group shows, it isn't quite ready for primetime yet. Hopefully the next release will tidy things up.

No comments:

Post a Comment