Wednesday, August 12, 2009

Maintaining ReferenceProperties Pt 2 - Efficiency (Google App Engine)

In a previous post I talked about how to override the delete() method on a Model object to ensure the referential integrity of Reference Properties was maintained.

A Google Engineer responded to the Google Group discussion and gave me some directions on how to make my datastore usage much more efficient (Thanks Nick!). This inspired my previous post about when to use fetch(), but it also impacts my prevous post on the topic of maintaining Reference Properties in the datastore.

This was the code I suggested:

class Thought(db.Model):
#some normal props here

def delete(self):
for actions in self.r_actions
actions.thought = None
actions.save()
db.Model.delete(self)

class Action(db.Model):
thought = ReferenceProperty(Thought, collection_name="r_actions")


Nick the Google Engineer had two very good suggestions for my Thought.delete() method.

  1. Accessing a collection returns a Query object. If you iterate over the members of a Query it will internally call fetch(20) to provision the results of the iteration. If you have say, one hundred items in the collection, this means 5 separate calls to the datastore. It is a lot more efficient to manually convert your query into a resultset (ie. list of model items) by calling fetch(100) to ensure the datastore retrieval operation is only done once.
  2. Similiarly, Within the loop, do not save model instances to the datastore individually; use the db package functions delete() and put() - both of which accept an ordinary list of model objects. Doing this outside of the loop in bulk results in much fewer calls to the datastore.


My new Thought.delete() method now reads:

def delete(self):
actions = self.r_actions.fetch(1000)
for action in actions:
action.thought = None
db.put(actions)
db.Model.delete(self)


If you need to filter the results prior to iterating across the results, Filter the query returned by the collection and then fetch it, iterate across the results and then perform your updates. For example, if I only wanted to update incomplete actions, I might write:


def delete(self):
action_query = self.r_actions
action_query.filter('isComplete=', False)
action_query.fetch(1000)
for action in actions:
#no test to perform, only incomplete actions are iterated across
action.thought = None
db.put(actions)
db.Model.delete(self)


With only a small set of data these changes do not make a huge difference to my application, but it is good to know that if or when it becomes more widely used the application model has all of the easy optimisations performed.

If you have any comments or questions, let me know.

No comments:

Post a Comment