Sunday, August 9, 2009

GAE Maintaining Reference Properties on Deletion

When using Python with Google App Engine and the Datastore, a common problem is maintaining Reference Properties on the Many-side of a relationship when the one-side changes (especially when it gets deleted). For example, consider the following simple model for an application that allows you to group Contacts into Categories:


class Category(db.Model):
name = StringProperty()
#some other props here

class Contact(db.Model):
name = StringProperty()
category = ReferenceProperty(Category, collection_name='contacts')
email_address = EmailProperty()


In this simple model, a Contact object has a ReferenceProperty to a single category. What happens when need to delete the category? In this instance you might want to delete the related contacts but more likely simply want to clear the association with the category. Deleting the Category is easy, just instantiate it and call delete():


category_to_delete = db.get(some_category_key)
category_to_delete.delete()

Easy right? Not quite. The reference to the deleted Category still exists in each Contact that was in that category. If you try to dereference, you will raise an exception. For example:


category_name_of_contact = some_contact.category.name

Causes an error. The Google documentation suggests testing the ReferenceProperty first:


if contact.category:
contact_category_name = some_contact.category.name
else:
contact_category_name = 'No Category Assigned'

But for a number of reasons this might not be ideal. I am not sure about the associated overhead with this approach but I am sure that manipulating the entire result for the purposes of display every time it is shown (on whatever output page) does not sit well with me. Nor does simply leaving bad references that must be constantly checked. For this reason I asked in the Google App Engine - Python Google Group about what the recommended approaches for this problem are.

Some recommended the approach I originally took: Clean up the references in your controller code at the same time you delete the potentially referenced object:


for contact in category_to_delete.contacts:
contact.category = None
contact.save()
category_to_delete.delete()

This cleans up the datastore as a delete is performed. If a Category has many Contacts, this might be a very expensive operation (possibly it can be improved, I am a Google App Engine padawan, after all). On the other hand, deleting a category is probably much rarer an occurance than say, outputing a Contact, so possibly the economics of page request performance make it worthwhile.

But this approach still has problem that you need to perform it every time you delete a Category. Depending on your model and business rules, this might happen only in one location or in several. A cleaner way to encapsulate the deletion business rule, suggested by Vince Stross on the Google Group, is the override the delete() method on your model object. Here is how the updated model looks:


class Category(db.Model):
name = StringProperty()
#some other props here


def delete(self):
for contact in self.contacts
contact.category = None
contact.save()
db.Model.delete(self)

class Contact(db.Model):
name = StringProperty()
category = ReferenceProperty(Category, collection_name='contacts')
email_address = EmailProperty()

Using the overridden method is easy:

category_to_delete.delete()


Behind the scenes, your method cleans up the references to the Category from Contact items, and then passes control across to the original delete() method defined by Google.

I like this approach; it is much cleaner and barring any performance improvements that can be made on the iteration of the loop, it is the approach I will take.

I have asked the Google Group membership to consider Vince's approach with my sample code - I wait to see what they. Any thoughts here?

No comments:

Post a Comment