Troubles with timezones

In the last few days I have spent an extraordinary amount of time thinking hard about 2:30am on March 25th, 2012, and about 2:30am on October 30th, 2011. If you winced in sympathy on seeing these dates, you’ve written code that worked with daylight savings time transitions before. (I suspect vice versa is probably also true.)

If you haven’t had this dubious pleasure, you might not realise what is special about these times. It is this: the first one never happened, while the second one happened twice.

DST: despair, sadness, and terror

At 2am on the last Sunday of March, the Netherlands (along with large chunks of Europe) puts its clocks forward for Daylight Savings Time. That means that 1:59:59 is followed by 3:00:00, skipping 2 o’clock entirely. Then at 3am on the last Sunday of October we put our clocks back again: 2:59:59 CEST (Central European Summer Time) is followed by 2:00:00 CET (Central European Time), so clocks pass 2:30 twice that morning.

This becomes relevant for me because in some of our products at Buzzcapture we show time histograms: the number of tweets per day, or week or month, about some topic. These histograms are generated by a search engine but processed in Python, and it turns out Python has some ideosyncratic ways of dealing with these complications.

Python: in search of faith

If you search on StackOverflow for “increment date python”, the first result (at time of writing) tells you to use datetime.timedelta(days=1). Unfortunately, that’s not very helpful once DST enters the picture; for instance, it will tell you that one day after 11:30pm on March 24th is 12:30am on March 26th (at least in 2012). The reason is that timedelta objects are just a convenient way to specify a number of seconds: timedelta(days=1) is shorthand for 60*60*24 seconds, and the day that DST kicks in is less seconds long than that (it’s 23 hours, not 24).¹ For the same reason, “one day later than” October 30th can still be October 30th if calculated with timedelta (in 2011 that day was 25 hours long).

`pytz`: our saviour?

What we need is a timezones library that handles all this stuff for us; enter pytz. This is an interface to the Olson timezone database, which allows you to specify “political” timezones such as “Europe/Amsterdam”, which specify both summer and winter time, and the rules for when to transition between them.

import pytz
amsterdam = pytz.timezone('Europe/Amsterdam')

`localize()`: our prayers answered

Using this timezone object we can “localize”² a Python datetime object that is not timezone-aware, and the right offset from UTC will be applied (I’ve broken lines in the output to make them readable in the narrow column):³

>>> amsterdam.localize(datetime(2012, 3, 20, 9, , )) # not DST yet
datetime.datetime(2012, 3, 20, 9, , 
                   tzinfo=<DstTzInfo 'Europe/Amsterdam' CET+1:00:00 STD>)
>>> amsterdam.localize(datetime(2012, 3, 30, 9, , )) # now we're in DST
datetime.datetime(2012, 3, 30, 9, , 
                   tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>)

Think of it like this: if you’re in Amsterdam and you see a clock/calendar displaying datetime(2012, 3, 20, 9, 0, 0) (i.e., 09:00 on 20/3/2012), localize() figures out what that means as a fully timezone-specified instant.

Serpent in Paradise

So here is an exercise for the alert reader: what do you expect from amsterdam.localize(datetime(2011, 10, 30, 2, 30, 0))?

If you’re in Amsterdam and you see 02:30 30/10/2011 on the clock/calendar, what does it mean? You don’t know. You can’t tell, from the clock/calendar alone, if you’re seeing the first 2:30 of the day (still in CEST) or the second (an hour later, after the transition back to CET). On this reasoning, you might expect localize() to raise an exception (pytz.AmbiguousTimeError is mentioned in the documentation).

Sadly, you would be wrong.

`is_dst`: Temptation

The localize() method takes an optional argument is_dst, which despite the name does not always specify that the result should be in or out of DST (for instance amsterdam.localize(datetime(2012, 3, 20), is_dst=True) will produce a non-DST result). The argument does sometimes specify DST-ness; in particular, when the input is ambiguous then the argument says whether localize() should choose the DST or the not-DST option.

This argument defaults to False.

>>> amsterdam.localize(datetime(2011, 10, 30, 2, 30))
datetime.datetime(2011, 10, 30, 2, 30, 
                   tzinfo=<DstTzInfo 'Europe/Amsterdam' CET+1:00:00 STD>)

The good news is, if you pass it is_dst=None then it will raise that AmbiguousTimeError in this case. I can’t call this an intuitive interface, but I also can’t say it’s wrong.

The wrong bit comes at the other end.

`is_dst`: Fall

Armed with what we know about localizing at the October transition, what do you expect from amsterdam.localize(2012, 3, 25, 2, 30))?

If you were in Amsterdam and saw a clock/calendar showing 02:30 25/3/2012, you would know the clock/calendar was broken. There is no 2:30 on that day in CET or in CEST. Surely that calls for an exception?

>>> amsterdam.localize(datetime(2012, 3, 25, 2, 30))
datetime.datetime(2012, 3, 25, 2, 30, 
                   tzinfo=<DstTzInfo 'Europe/Amsterdam' CET+1:00:00 STD>)

Yes, localize() cheerfully reports a clock time that simply does not exist. The only small consolation is that specifying is_dst=None does force the exception that we expected (not AmbiguousTimeError but NonExistentTimeError):

>>> amsterdam.localize(datetime(2012, 3, 25, 2, 30), is_dst=None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tikitu/tmp/py/lib/python2.6/site-packages/pytz/tzinfo.py", line 327, in localize
    raise NonExistentTimeError(dt)
pytz.exceptions.NonExistentTimeError: 2012-03-25 02:30:00

`normalize()`: a shot at redemption

There is one more feature of pytz that we can apply to clean up this mess a little: the normalize() method on a “political timezone” such as amsterdam. This is for converting non-standard names for instants to standard names. Here is what I mean:

Take the description “12:00 on March 26th 2012, CET”. You can take this to unambiguously refer to a particular instant: CET is a one-hour offset from UTC, so it refers to the same instant as “11.00 on March 26th 2012, UTC” which is perfectly well-defined. But it is a non-standard name for that instant, because everywhere that uses CET is on summer time on March 26th: the standard name would be “13.00 on March 26th 2012, CEST”. This is exactly what amsterdam.normalize() does: it takes the instance unambiguously referred to by a datetime with timezone, and recasts it in the timezone that applies at that instant (perhaps applying a DST shift in the process) without changing the actual moment in time that is being referred to.

Where is that useful? Well, the intended use to judge by the documentation is to deal with naïve datetime arithmetic: adding a timedelta to a datetime with a timezone will produce a new datetime in the original timezone, so if you crossed a DST boundary the result of your operation is now a non-standard name for the instant it describes. So amsterdam.localize(datetime(2012, 3, 25, 1, 0, 0)) (a CET instant) plus timedelta(hours=2) is “3am on 25/3/2012 in CET” which is a nonstandard name for “4am on 25/3/2012 CEST”:

>>> amsterdam.localize(datetime(2012, 3, 25, 1, , ))
datetime.datetime(2012, 3, 25, 1, , 
                   tzinfo=<DstTzInfo 'Europe/Amsterdam' CET+1:00:00 STD>)
>>> amsterdam.localize(datetime(2012, 3, 25, 1, , )) + timedelta(hours=2)
datetime.datetime(2012, 3, 25, 3, , 
                   tzinfo=<DstTzInfo 'Europe/Amsterdam' CET+1:00:00 STD>)
>>> amsterdam.normalize(amsterdam.localize(datetime(2012, 3, 25, 1, , )) 
...                      + timedelta(hours=2))
datetime.datetime(2012, 3, 25, 4, , 
                   tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>)

Death-bed conversion

But there’s one other place this comes in handy: in correcting for the bug in the localize() method at the March DST boundary:

>>> amsterdam.normalize(amsterdam.localize(datetime(2012, 3, 25, 2, 30)))
datetime.datetime(2012, 3, 25, 3, 30, 
                   tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>)

Because of course when you see that clock reading 2:30 3/25/2012 what you really think is not “It’s broken” but “Someone forgot to switch it to DST”, and also “So the actual time must be 3:30”.

No, I’m kidding, localize() without is_dst=None is just pure evil and should be avoided if you want to keep your soul.

Notes:

There is a terminological difficulty in describing this stuff precisely. As a geophysical phenomenon the day is of course no shorter than normal: the span from sunrise on the 24th to sunrise on the 25th is just as long as that from sunrise on the 25th to sunrise on the 26th. (Disregarding that the days are also getting longer in that part of the year.) But the span from midnight on the 25th to midnight on the 26th is an hour shorter: the clock time moves sideways against the geophysical background. [↪]
I follow the library’s American spelling for its own components; I follow my own ideosyncratic hodgepodge of American, British, and Wrong for everything else. [↪]
This is not a datetime tutorial, so I’m assuming you already know how these objects are put together. If not, there are two things you need to know just to be able to read the examples. Firstly, Python datetime objects are constructed with datetime(year, month, day, hour, minute, second, microsecond) where everything after day can be left off to default to zero. Secondly, there is also a tzinfo argument for specifying an offset from UTC, which you can’t use if there is DST involved because the correct offset depends on where in the year the datetime falls: that’s exactly the problem localize() is designed to solve. A datetime without a tzinfo is called “naïve” in the Python docs. [↪]

DST: despair, sadness, and terror

Python: in search of faith

pytz: our saviour?

localize(): our prayers answered