Wednesday, August 12, 2009

When to use fetch (or not) on Google App Engine

When you are working with the Google App Engine datastore, keep aware of whether you are working with Query objects or result sets (lists of Model objects). As a newcomer to both Python and Google App Engine, I have been discovering the hard way that this makes a difference, even though your code may functionally act the same.

The Google app engine db package can return sets of model data from a number of methods such as db.get(), db.all() and collection properties. The return type is Query. My understanding is that Query is essentially a datastore query definition and not a result set. This means that once you have created a query, you can further filter, sort, or otherwise affect the result. For example:

my_query = mymodel.SomeModel.all()
my_query.filter('my_property=',10)


(Query has a sister class GqlQuery, created by the GQL interface, which is conceptually the same but lacks the Query's ability to filter, sort and manipulate the query definition).

It is important to note that no data is actually retrieved until you fetch() it. A fetch essentially runs the query and returns the requested results as a simple python list of model instances. This code fetch()es up to 1000 results from the query created above:


my_query.fetch(1000)


This all sounds very straight forward, right? My problem with understanding this I think was Query objects can transparently fetch() their results under a few different circumstances to allow you to perform operations you normally should on a Resultset. For example, you can iterate over a Query to process each member of the results. Many operations can work functionally as expected on a Query as well as a Resultset - they might just not be as efficient.

What does not help with the confusion is the fact that there are some methods you can call on both Queries and result sets that do not behave the same way. For example, both lists and Query objects have a count() method. In a Query object, count(x) returns the number of items, up to number x, that the query would return. In a list object, count(x) returns the number of items in the list that are equal to x. Consider these two case which behave quite differently:


#case1
my_query = MyModel.MyObject.all()
if my_query.count(1):
# do something if there is at least one object

#case1
my_query = MyModel.MyObject.all()
my_results = my_query.fetch(1000)
if my_results.count(1):
# do something if any of the model objects == 1
#(which they never will)
# We probably meant 'if len(my_results)' instead


Because of this, keeping in mind whether you are working with Query or result is important. Knowing when to use fetch() to convert a query into a result is also important, as is knowing which Query method transparently trigger fetch() (or other datastore) operations.

The best way to place your fetch()es will vary from situation to situation, but I think a good rule of thumb, where the number of results is unknown, is:

  1. Create the Query or GqlQuery object

  2. (Query only) Narrow it down as far as possible (using filters)

  3. Fetch() the query

  4. Do something with the results



Within the one block of code, this is easy to keep an eye on. Where it might become more challenging is where your controllers (ie. request handlers) are asking the Model for data in many places. I am trying to stick to the rule that outside of my Model, nothing knows about Queries and they should not be returned. If I need queries for a specific purpose, I will expose methods on the model that achieve the same effect. So far this is working - but it is still early days.

My knowledge of all the above is new and may be imperfect.If you have any comments, questions or even better, corrections, please let me know.

No comments:

Post a Comment