Sunday, November 29, 2009

Support Timezones in Google App Engine Pt. 0 - The Starting Point

As I mentioned in a previous post I recently implemented timezone in my GTD web application. I am writing a series on how I approached this on the off chance it would be useful someone or I might encourage someone to come forward with a better solution.

I thought I would start off examining the starting point, including:
  • What the relevant functions from the datetime module that ship with python and are available on Google App Engine (GAE).
  • Using DateTimeProperty, not DateProperty, for timezone sensitive dates in the GAE datastore
  • How the GAE datastore behaves with respect to timezones

Vanilla python includes a number of types for dates and times, including timezone support. The  datetime module includes two timezone aware classes - Datetime and Time. If you do not supply a tzinfo (Timezone Info - information about how a timezone is named and how it relates to UTC) object during creation, your Datetime or Time is considered 'naive', meaning it is not timezone aware.

import datetime
naive_date = datetime.DateTime(day=30,month=11, year=2009)
tz_date = datetime.DateTime(day=30,month=11, year=2009, tzinfo=est_tz)

Datetime and Time objects can be used in arithmetic and comparisons fairly naturally, but it is worth noting this only appears to work where all timezone aware objects used in the operation or comparison are consistently naive or have a timezone set. This makes sense. Some datetime objects are not timezone sensitive, like TimeDelta.

import datetime
naive_date = datetime.datetime(day=30,month=11, year=2009)
tz_date_est = datetime.datetime(day=30,month=11, year=2009, tzinfo=est_tz)
tz_date_pst = datetime.datetime(day=30,month=11, year=2009, tzinfo=pst_tz)

#this does not work
is_earlier = naive_date < tz_date_est

#this does

is_earlier = tz_date_est < tz_date_pst

#these both work
later = naive_date + datetime.timedelta(days=1)
earlier = tz_date_est - datetime.timedelta(hours=5)

You can change a naive date into a timezone aware date using the datetime replace() method. If you use the replace method on a datetime that has a timezone set, no conversion occurs. To properly convert times between timezones, use the datetime astimezone() method instead.

naive_date = datetime.datetime(hour=1, day=30, month=11,year=2009)
tz_date_est = naive_date.replace(tzinfo = est_tz)
#replaces the timezone information in the datetime without conversion
#tz_date_est is now 1:00 am, 30th of November

tz_date_pst = tz_date_est.astimezone(pst_tz)
#converts from datetime's timezone to pst
#tz_date is now 10:00pm, 29th of November

The missing link is the tzinfo (timezone information) objects.Python does not ship with them and the expectation appears to be that you or others will write your own.  Creating your own tzinfo is easy, but only assuming you know the exact rules for the timezone (for example, when daylight savings is in effect), have only a limited number to create, and are prepared to maintain it moving forward for any future changes. For a up to date and maintained list of timezones you can use pytz, which I will cover in more depth in a future post.

It is nice to have these timezone aware objects to work with to spare reinventing the wheel. To be useful though in most use cases you need to persist the data, and that is where the Google App Engine datastore becomes involved. How does the datastore handle the persistence of timezone aware datetime objects?

Datetimes are persisted as part of the application model using the DateTimeProperty. All of Google App Engine runs on UTC time (ie. returns the current UTC time). It is natural then (and in most cases best practice) that the datastore is optimised to work in UTC.  The interface is not completely consistent though:
  • If you persist a naive datetime, a naive datetimewill be returned (as you might expect)
  • If you persist a datetimein UTC, a naive datetimewill be returned (not quite as you would expect)
  • If you persist a datetime in some other timezone, it will be converted to UTC and still returned as a naive datetime
There is an obvious pattern here: No matter what the datetime you persist, a naive datetime will always be read back from the datastore, and it is not obvious to the reader what the original timezone, whether it was originally specified or even if the datastore has performed a conversion at the time of persistence. This means you need to be careful about how timezones are treated at both the writing and reading stages of your application. The simplest approach would be to:
  • Use the datastore for UTC date times only. In practice this means ensuring all DateTimeProperty properties should have a timezone set where they can differ from UTC. The datastore will kindly ensure all necessary conversion is done, but will not complain about naive datetimes (so be careful).
  • When retreiving date times, they will be naive, but assuming you ensure all datetimes entering the system are marked with the correct timezone, you can assume they will be UTC and should probably ensure the timezone information is set correctly before using then. 
In a future post I will show my UtcDateTimeProperty custom model class, which does little except ensure timezones are made explicit in both the datastore reads and writes. It should be quite doable to craft a TzDateTimeProperty which also stores the original timezone and converts datetimes back to the correct timezone when they are read.

In the meantime I have one rather obvious piece of advice: If you are using a date (only) in your model and you know it will be timezone sensitive, use a DateTimeProperty instead of the simple DateProperty. The underlying object is always naive, and even knowing the relevant timezone is non-deterministic.

In my application, I have a property duedate. On behalf of my users I am not interested in the time of day a due date might correspond with, but in order to properly serve timezone sensitive due dates (where a due date starts and ends in the correct timezone), a simple DateProperty - my original naive choice (thats a pun, by the way) - was not sufficient. Even knowing the timezone, a single date may correspond with more than one date in different timezones.

Because of this if you are preparing to include timezone support for a date only properties, you should convert them to DateTimeProperty objects. As with all model changes, you need to be careful to convert existing data to the correct type. The following code checks to see if an entity possesses data for a property duedate. If it does, the data-type is checked and converted:

if entity.duedate:
    if not isinstance(entity.duedate, datetime.datetime):
        entity.duedate = datetime.datetime.combine(entity.duedate,
            datetime.time(hour=0, minute=0, second=0))

Note we use the combine() method of datetime.datetime in order to combine a date with a time. The default time you use will vary depending on your application. The above example could be improved perhaps by using replace() to put the result into the user's timezone, so that it logically represents a real date from their perspective.

That concludes the topics I wanted to cover in this post. I plan the next in this series to talk about the pytz module and how to get your hands on prewritten and maintained tzinfo objects.

No comments:

Post a Comment