Validating Time Zones in PostgreSQL

I recently needed to validate that a value stored in a TEXTcolumn was a valid time zone identifier. Why? Because I was using its value inside the database to convert timestamp columns from UTC to a valid zone. So I set about writing a function I could use in a constraint.

It turns out that PostgreSQL has a pretty nice view that lists all of the time zones that it recognizes. It’s called pg_timezone_names:

try=# select * from pg_timezone_names limit 5;
        name        | abbrev | utc_offset | is_dst 
--------------------+--------+------------+--------
 Africa/Abidjan     | GMT    | 00:00:00   | f
 Africa/Accra       | GMT    | 00:00:00   | f
 Africa/Addis_Ababa | EAT    | 03:00:00   | f
 Africa/Algiers     | CET    | 01:00:00   | f
 Africa/Asmara      | EAT    | 03:00:00   | f
(5 rows)

Cool. So all I had to do was to look up the value in this view. My first stab at creating a time zone validation function therefore looked like this:

CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
DECLARE
  bool BOOLEAN;
BEGIN
  SELECT TRUE INTO bool
    FROM pg_timezone_names
   WHERE LOWER(name) = LOWER(tz)
      OR LOWER(abbrev) = LOWER(tz);
  RETURN FOUND;
END;
$$ language plpgsql STABLE;

This should pretty well cover anything that PostgreSQL considers valid. So does it work? You bet:

sandy_development=# \timing
Timing is on.
sandy_development=# select is_timezone(’America/Los_Angeles’);
 is_timezone 
-------------
 t
(1 row)

Time: 457.096 ms
sandy_development=# select is_timezone(’Foo/Bar’);
 is_timezone 
-------------
 f
(1 row)

Time: 472.752 ms

Perfect! Well, except for just one thing: performance is abysmal. A half second per shot? Not very useful for constraints. And since pg_timezone_names is a view (and, under that, a function), I can’t create indexes.

But then I did something dangerous: I started thinking. I realized that I needed this function when our app started getting errors like this:

try=# select now() at time zone ‘Foo/Bar’;
ERROR:  time zone "Foo/Bar" not recognized

So the underlying C code throws an error when a time zone is invalid. What if I could just trap the error? Well, PL/pgSQL conveniently has exception handling, so I could do just that. But there was only one problem. PL/pgSQL’s exception handling syntax requires that you specify an error condition. Here’s what the documentation has:

EXCEPTION
    WHEN condition [ OR condition ... ] THEN
        handler_statements
    [ WHEN condition [ OR condition ... ] THEN
          handler_statements
      ... ]
END;

Conditions are error codes. But which one corresponds to the invalid time zone error? I tried a few, but couldn’t figure out which one. (Anyone know now to map errors you see in psql to the error codes listed in Appendex A? Let me know!) But really, my function just needed to do one thing. Couldn’t I just trap any old error?

A careful re-read of the PL/pgSQL documentation reveals that, yes, you can. Use the condition OTHERS, and you can catch almost anything. With this information in hand, I quickly wrote:

CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
DECLARE
  date TIMESTAMPTZ;
BEGIN
  date := now() AT TIME ZONE tz;
  RETURN TRUE;
EXCEPTION WHEN OTHERS THEN
  RETURN FALSE;
END;
$$ language plpgsql STABLE;

And how well does this one work?

sandy_development=# select is_timezone(’America/Los_Angeles’);
 is_timezone 
-------------
 t
(1 row)

Time: 3.009 ms
sandy_development=# select is_timezone(’Foo/Bar’);
 is_timezone 
-------------
 f
(1 row)

Time: 1.224 ms

Yes, I’ll take 1-3 ms over 400-500 ms any day! I might even create a domain for this and be done with it:

CREATE DOMAIN timezone AS TEXT
CHECK ( is_timezone( value ) );

Enjoy!

Need Help Reducing View Calculations

I could use some advice and suggestions for how to solve a performance problem due to the highly redundant calculation of values in a view. Sorry for the longish explanation. I wanted to make sure that I omitted no details in desribing the problem.

In order to support recurring events in an application I’m working on, we have a lookup table that maps dates to their daily, weekly, monthly, and yearly recurrences. It looks something like this:

try=# \d recurrence_dates
   Table "public.recurrence_dates"
   Column   |    Type    | Modifiers 
------------+------------+-----------
 date       | date       | not null
 recurrence | recurrence | not null
 next_date  | date       | not null
Indexes:
    "recurrence_dates_pkey" PRIMARY KEY, btree (date, recurrence, next_date)
    "index_recurrence_dates_on_date_and_recurrence" btree (date, recurrence)

try=# select * from recurrence_dates
try-# where date = '2007-11-04'
try-# order by recurrence, next_date;
    date    | recurrence | next_date  
------------+------------+------------
 2007-11-04 | daily      | 2007-11-04
 2007-11-04 | daily      | 2007-11-05
 2007-11-04 | daily      | 2007-11-06
 2007-11-04 | weekly     | 2007-11-04
 2007-11-04 | weekly     | 2007-11-11
 2007-11-04 | weekly     | 2007-11-18
 2007-11-04 | monthly    | 2007-11-04
 2007-11-04 | monthly    | 2007-12-04
 2007-11-04 | monthly    | 2008-01-04
 2007-11-04 | annually   | 2007-11-04
 2007-11-04 | annually   | 2008-11-04
 2007-11-04 | annually   | 2009-11-04
 2007-11-04 | none       | 2007-11-04

To get all of the permutations of recurring events, we simply select from a view rather than from the events table that contains the actual event data. The view joins events to recurrence_dates table like so:

CREATE OR REPLACE VIEW recurring_events AS
SELECT id, name, user_id, duration,
       (rd.next_date || ' ' ||
       (starts_at::timestamptz at time zone start_tz)::time)::timestamp
       at time zone start_tz AS starts_at,
       start_tz
  FROM events LEFT JOIN recurrence_dates rd
    ON (events.starts_at::timestamptz at time zone events.start_tz)::date = rd.date
   AND events.recurrence = rd.recurrence;

Then, to get all of the recurrences of events for a user within a week, we do something like this in the client code:

SELECT *
  FROM recurring_events
 WHERE user_id = 2
   AND starts_at BETWEEN '2007-11-04 07:00:00' AND '2007-11-10 07:59:59';

This works perfectly, as all of our dates and times are stored in UTC in timestamp columns. We pass UTC times for the appropriate offset to the query (Pacific Time in this example) and, because the view does the right thing in mapping the starts_at time for each event to its proper time zone, we get all of the events within the date range, even if they are recurrences of an earlier event, and with their times properly set.

The trouble we’re having, however, is all of those conversions. Until last week, the view just kept everything in UTC and left it to the client to convert to the proper zone in the start_tz column. But that didn’t work so well when an event’s starts_at was during daylight savings time and recurrences were in standard time: the standard time recurrences were all an hour off! So I added the repeated instances of events.starts_at::timestamptz at time zone events.start_tz. But now the view is really slow.

Since the only thing that has changed is the addition of the time zone conversions, I believe that the performance penalty is because of them. The calculation executes multiple times per row: once for the join and once again for the starts_at column. We can have an awful lot of events for a given user, and an awful lot of recurrences of a given event. If, for example, an event recurs daily for 2 years, there will be around 730 rows for that one event. And the calculation has to be executed for every one of them before the WHERE clause can be properly evaluated. Ouch! Worse still, we actually have three columns that do this in our application, not just one as in the example here.

So what I need is a way to execute that calculation just once for each row in the events table, rather than once for each row in the recurring_events view. I figure 1 calculation will be a heck of a lot faster than 730! So the question is, how do I do this? How do I get the view to execute the conversion of the starts_at to the start_tz time zone only once for each row in events, regardless of how many rows it ends up generating in the recurring_events view?

Suggestions warmly welcomed. This is a bit of a tickler for me, and since the query performance on these views is killing us, I need to get this adjusted post haste!

Meanwhile, tomorrow I’ll post a cool hack I came up with for validating time zones in the database. Something to look forward to as you ponder my little puzzle, eh?

Update 2008-01-30: Thanks to help from depesz, I came figured out what the underling problem was and solved it much more elegantly using PL/pgSQL. I’ve now written up the basic recipe. Enjoy!

Powered by KinoSearch