Tuesday, September 22, 2009

Transactions in Google App Engine

Recently I had to retrofit transactional support into my Google App Engine application My Web Brain, and I thought share some of that experience. Traditional RDMS products offer strong transactional support - not only through their explicit transaction support but also through their implement of formal relationships, constraints and cascade-updates and cascade-deletes. The Google App Engine datastore - as I am frequently reminded - is not a traditional database.

Google imposes tight restrictions on the amount of time datastore operation can run without timing out. Without the niceties above (formal relationships, constraints, and cascades) and combined with the common datastore latency spikes, any datastore operation may fail. Without explicit transactions, your application data may be left in a inconsistent state.

In a previous post I described how the use of ReferenceProperty properties required explicit coding from the developer to clean references to objects when they are deleted. This is a good example of code which definitely needs to run within a transaction to make it safe.

We last saw our code like this:
class Thought(db.Model):
#some normal props here

def delete(self):
actions = self.r_actions.fetch(1000)
for action in actions:
action.thought = None
db.put(actions)
db.Model.delete(self)

class Action(db.Model):
thought = ReferenceProperty(Thought, collection_name="r_actions")


By overriding the model.delete() instance method in our Thought object, we can clean up all references to the thought in our Action objects. But this badly needs to be encapsulated in a transaction.

To encapsulate datastore operations in transactions we need to use db.run_in_transaction(). This function is passed a reference to another function that will perform the work considered part of the transaction. The other arguments are the parameters required by the function. Our delete implementation could be changed to use transactions like this:

def delete(self):
db.run_in_transaction(_txn_delete, self)

def _txn_delete(self):
actions = self.r_actions.fetch(1000)
for action in actions:
action.thought = None
db.put(actions)
db.Model.delete(self)
The datastore may try a number of time to successfully run the transaction, so the code running within the transaction should have no side effects apart from datastore changes. If all of the attempts by the datastore to run the transaction fail, a TransactionFailedError will be raised.

Seems pretty simple, right? It is, but there are additional limitations we have not touched on. All objects written to in a transaction must have a common ancestor (in the simplest case, a common parent). All queries performed within a transaction must have an ancestor filter.

If you have not been providing parent values for your model objects as you create them, you will find you need to both start assigning parent during the object creation and re-write existing data with the appropriate ancestry.

I am not overwhelmingly comfortable yet with the concepts of parents and ancestors in the datastore. Suffice to say they are clearly used to partition the data into smaller, useful sets that are easier to control (from Google's perspective) across the distributed environment. Google recommends that you choose parents for your objects in such a ways as to divide each user's data into its own set.

In my situation, it was a good opportunity to define a UserAccount object:
class UserAccount(db.Model):
#some inconsequential props here
We also need to modify our transaction method to comply with the second rule above (that all queries have an ancestor filter):

def delete(self):
db.run_in_transaction(_txn_delete, self)

def _txn_delete(self, user_account):
user_account = self.get_parent()
actions = self.r_actions.ancestor(user_account).fetch(1000)
for action in actions:
action.thought = None
db.put(actions)
db.Model.delete(self)
The follow code snippet shows how objects would be created and then deleted, using our transaction code:

def do_stuff(user_account):
#Create and save a thought (irrelevant props not set)
new_thought = Thought(parent=user_account)
new_thought.put()

#Create and save a thought (irrelevant props not set)
new_action = Action(parent=user_account)
new_action.thought = new_thought
new_action.put()

#Delete our action, using the transactional code from above
new_action.delete()
I hope this example is useful. I underestimated the time it would take to retrofit transactional support into my application. Next time I will consider it earlier in the piece.

You can read more about transactions in the Google App Engine transaction documentation. If you have any comments or questions, please let me know.

1 comment:

  1. How should I model a simple project management app with following entities?

    - Project
    - Project members (project has members)
    - Task (project has tasks)
    - Task Notes (task has notes)
    - Audit trail (Need trail for project and tasks)
    - User

    Deleting project should delete the project members, tasks, task notes. This means task should have parent as project, task notes should have parent as task.

    Multiple users will be updating the project, tasks and task notes. How much should I worry about the 1 sec write limit in entity group?

    Or should I rethink about the model?

    ReplyDelete