Wednesday, June 16, 2010

PrefetchingQuery: Prefetch Reference Properties and Parents in App Engine

Back in January Nick Johnson from Google wrote about the benefits of prefetching reference properties for a collection of App Engine datastore entities. Prefetching can dramatically reduce the amount of RPC calls* to the datastore by ensuring each entity referenced by a property is only retreived once.

It has taken me a while but I finally got a chance to go back to Nick's blog entry and build some infrastructure around his code to make prefetching reference properties easier. I call the effort PrefetchingQuery and you can find it over at he3-appengine-lib in performing.py

Like PagedQuery, PrefetchingQuery is a facade for a normal db.Query or db.GqlQuery object. You can configure the PrefetchingQuery with the properties to prefetch in its constructor, with a class attribute on your model entity or let it default to prefetching all reference properties. For example

#create a prefetching query
myPrefetchingQuery = PrefetchingQuery( MyEntity.all(), 
(MyEntity.myReferenceProp1, ))

configures the PrefetchingQuery to prefetch the myReferenceProp1 property. Actual prefetching occurs when you call fetch() on the query.

Still with me? Here are some additional details.

After you have instantiated your shiny new PrefetchingQuery, you can use it like the underlying db.Query or db.GqlQuery object. This means you call filter(), order() or ancestor() for db.Query or count() or cursor() for either db.Query or db.GqlQuery.

You can use PrefetchingQuery with PagedQuery if you like, but you need to wrap your original query first with PrefetchingQuery and then with PagedQuery, like this:

#create a paged, prefetching query
myPagedPrefetchingQuery = PagedQuery(PrefetchingQuery( MyEntity.all(), (MyEntity.myReferenceProp1, )), 10)

PrefetchingQuery can also prefetch the Parent entity of your models. You request this by adding 'parent' (ie. the string) into your list of reference properties to prefetch. But be warned: Setting the parent or an element manually uses a private/undocumented attribute _parent and may break in future App Engine platform or SDK releases. See this discussion for background information.

Thanks and credit for the important code behind PrefetchingQuery goes to Nick Johnson, whose code I only had to modify slightly for my packaging. Ubaldo Huerta and his discussion with Nick about parent properties also helped out.

Further documentation about Prefetching will appear in course on he3-appengine-lib, but for now, you should peruse the comments at the top of performing.py, if you want to find out more.

I welcome any bug reports or feature suggestions. Give it a spin.

* the amount of performance benefit will depend on your situation and programming goals. Past performance is no guarantee of future performance benefits. 

2 comments:

  1. Hi,

    Thanks a lot for simplifying the prefetching stuff. Appreciate your work. Saved us lot time in writing those things on our own.

    Your work on PagedQuery is also fantastic.

    Thanks again from,
    www.geoleaks.com Team

    ReplyDelete