Tuesday, August 25, 2009

Paging Resultsets in Google App Engine

The datastore used by Google App Engine is not your typical relational database engine, and because of this some typical operations - such as splitting your query results into pages - should not be approached the traditional way.

On first glance of the datastore API you would be forgiven for thinking that out of the box Google App Engine has excellent support for paging. When executing the fetch() method you can pass in a limit and offset.

For example, to get the third set of 10 results you could use the following code:

myquery.fetch(10,20)

But there are several reasons why this is not an appropriate solution for paging. As the Google article Paging Through Datasets describes this has several fundamental limitations. The datastore still retrieves the offset results, so from a performance point of view the above code has the same cost as retrieving 30 results, not the 10 results that you actually use. The fetch() method also has a maximum limit of 1000. That is, you can never retrieve more than 1000 objects at a time, so the following code would return no results at all:

myquery.fetch(10,1000)

There are many use cases where a query might need to return with more than 1000 items, so this limit will force you to consider changing the way you code your queries, even if the performance aspect does not.

What to do? The Paging Through Datasets article presents a couple of solutions, and they all revolve around changing your original query to only return the set of results for the current page. This is relatively easy if your entities have a unique property already ordered in the sort direction you require. You use the last value of this property in your resultset as a 'bookmark' for the next page's result set. You know all following values of that property will be higher, so you can add a condition to your query that will filter on the property, selecting only higher values.

If your model does not have a property which is unique and ordered for your sort order you might need to create one. It is not ideal to change your entity model to overcome datastore limitations, but this is sometimes a recurring theme when using Google App Engine.

This solution will not scale very well if you would like sort order to be determined dynamically, however. Should you created a special sort order field for each one? Worse, some desired sort orders have no sensible unique values that will be known at runtime.

The good news is that Google donated an algorithm for using a the entity key as a subordinate sort order which, in combination with a non-unique sort field, can be used to create unique bookmarks. Rodrigo Moraes did some hard work in implementing a Query class that implemented the algorithm and posting a Google Cookbook Recipe describing how it is used. The source code is being maintained at bitbucket and the comments are reasonably clear about how it should be used.

This solution works well, although it is unfortunate that Google does not provide a native approach to tackling this problem.

The hints Googlers have left in their article and Google Group postings imply that support for paging will receive some attention in a future Google App Engine release. There is no time line for this but it could be included with dataset cursor, currently being worked on as per the App Engine Roadmap. In the meatime, the Rodrigo Moraes impementation works well enough for me that I can wait for Google to catch up.

1 comment:

  1. So bummed the latest release didn't include the built in cursors.

    ReplyDelete