home :: computers :: databases :: postgresql

CITEXT Patch Submitted to PostgreSQL Contrib

On Friday, I submitted a patch to add a locale-aware case-insensitive text type as a PostgreSQL contrib module. This has been among my top requests as a feature for PostgreSQL ever since I started using it. And as I started work on yet another application recently, I decided to look into what it would take to just make it happen myself. I’m hopeful that everyone will be able to benefit from this bit of yak shaving.

I started out by trying to use the citext project on pgFoundry, but immediately identified two issues with it:

  1. It does not install properly on PostgreSQL 8.3 (it uses a lot of casts that were removed in 8.3); and
  2. It only case-insensitively compared ASCII characters. So accented multibyte characters work just as they do in the text type.

So I set about trying to create my own, new type, originally called “lctext”, since what it does is not true case-insensitive comparisons, but lowercases text and then compares, just as millions of us developers already do by using LOWER() on both sides of a query:

SELECT *
  FROM tab
 WHERE lower(col) = LOWER(?);

I just finally got fed up with this. The last straw for me was wanting to create a primary key that would be stored case-insensitively, which would have required that I create two indexes for it: One created for the primary key by default, the other a functional UNIQUE INDEX on LOWER(col), which would just be stupid.

So this patch is the culmination of my work to make a locale-aware case-insensitive text type. It’s locale-aware in that it uses the same locale-aware string comparison code as that used for the text type, and it uses the same C function as LOWER() uses. The nice thing is that it works just as if you had used LOWER() in all your SQL, but now you don’t have to.

So while this is not a true case-insensitive text type, in the sense that it doesn’t do a case-insensitive comparison, but changes the cases and then compares, it is likely more efficient than the LOWER() workaround that we’ve all been using for years, and it neater, too. Using this type, it will now be much easier to create, e.g, an email domain, like so:

CREATE OR REPLACE FUNCTION is_email(text)
RETURNS BOOLEAN
AS $$
    use Email::Valid;
    return TRUE if Email::Valid->address( $_[0] );
    return FALSE;
$$ LANGUAGE ‘plperlu’ STRICT IMMUTABLE;

CREATE DOMAIN email AS CITEXT CHECK ( is_email( value ) );

No more nasty workarounds to account for the lack of case-insensitive comparisons for text types. It works great for time zones and other data types that are defined to compare case-insensitively:

CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
BEGIN
  PERFORM now() AT TIME ZONE tz;
  RETURN TRUE;
EXCEPTION WHEN invalid_parameter_value THEN
  RETURN FALSE;
END;
$$ language plpgsql STABLE;

CREATE DOMAIN timezone AS CITEXT
CHECK ( is_timezone( value ) );

And that should just work!

I’m hoping that this is accepted during the July CommitFest. Of course I will welcome suggestions for how to improve it. Since I sent the patch, for example, I’ve been thinking that I should suggest in the documentation that it is best used for short text entries (say, up to 256 characters), rather than longer entries (like email bodies or journal articles), and that for longer entries, one should really make use of tsearch2, instead. There are other notes and caveats in the patch submission. Please do let me know what you think.

pgTAP pgFoundry Project and Lightning Talk

A couple of quick announcements:

pgTAP on pgFoundry

First, the PostgreSQL community approved my project, so now there is a pgTAP project page, including a couple of mail lists, a bug tracker, and downloads. I uploaded a new version shortly after the project was approved, and 0.03 should be there soon, as well.

pgTAP YAPC::NA Lightning Talk

I gave a Lightning talk at YAPC::NA 2008 in Chicago this afternoon. I’ve now posted the slides for your enjoyment.

Care to help me with development? Want to add your own test functions or make it all integrate better with standard PostgreSQL regression tests? Want to help me get Module::Build or Module::Install to run Perl and PostgreSQL and whatever tests side-by-side, all at once? Join the pgtap-users mail list and join the fun!

Introducing pgTAP

So I started working on a new PostgreSQL data type this week. More on that soon; in the meantime, I wanted to create a test suite for it, and wasn’t sure where to go. The only PostgreSQL tests I’ve seen are those distributed with Elein Mustain’s tests for the email data type she created in a PostgreSQL General Bits posting from a couple of years ago. I used the same approach myself for my GTIN data type, but it was rather hard to use: I had to pay very close attention to what was output in order to tell the description output from the test output. It was quite a PITA, actually.

This time, I started down the same path, then then started thinking about Perl testing, where each unit test, or assertion, in the xUnit parlance, triggers output of a single line of information indicating whether or not a test succeeded. It occurred to me that I could just run a bunch of queries that returned booleans to do my testing. So my first stab looked something like this:

\set ON_ERROR_STOP 1
\set AUTOCOMMIT off
\pset format unaligned
\pset tuples_only
\pset pager
\pset null '[NULL]'

SELECT foo() = 'bar';
SELECT foo(1) = 'baz';
SELECT foo(2) = 'foo';

The output looked like this:

% psql try -f ~/Desktop/try.sql
t
t
t

Once I started down that path, and had written ten or so tests, It suddenly dawned on me that the Perl Test::More module and its core ok() subroutine worked just like that. It essentially just depends on a boolean value and outputs text based on that value. A couple minutes of hacking and I had this:

CREATE TEMP SEQUENCE __tc__;
CREATE OR REPLACE FUNCTION ok ( boolean, text ) RETURNS TEXT AS $$
    SELECT (CASE $1 WHEN TRUE THEN '' ELSE 'not ' END) || 'ok'
        || ' ' || NEXTVAL('__tc__')
        || CASE $2 WHEN '' THEN '' ELSE COALESCE( ' - ' || $2, '' ) END;
$$ LANGUAGE SQL;

I then rewrote my test queries like so:

\echo 1..3
SELECT ok( foo() = 'bar'   'foo() should return "bar"' );
SELECT ok( foo(1) = 'baz', 'foo(1) should return "baz"' );
SELECT ok( foo(2) = 'foo', 'foo(2) should return "foo"' );

Running these tests, I now got:

% psql try -f ~/Desktop/try.sql
1..3
ok 1 - foo() should return "bar"
ok 2 - foo(1) should return "baz"
ok 3 - foo(2) should return "foo"

And, BAM! I had the beginning of a test framework that emits pure TAP output.

Well, I was so excited about this that I put aside my data type for a few hours and banged out the rest of the framework. Why was this exciting to me? Because now I can use a standard test harness to run the tests, even mix them in with other TAP tests on any project I might work on. Just now, I quickly hacked together a quick script to run the tests:

use TAP::Harness;

my $harness = TAP::Harness->new({
    timer   => $opts->{timer},
    exec    => [qw( psql try -f )],
});

$harness->runtests( @ARGV );

Now I’m able to run the tests like so:

% try ~/Desktop/try.sql        
/Users/david/Desktop/try........ok   
All tests successful.
Files=1, Tests=3,  0 wallclock secs ( 0.00 usr  0.00 sys +  0.01 cusr  0.00 csys =  0.01 CPU)
Result: PASS

Pretty damn cool! And lest you wonder whether such a suite of TAP-emitting test functions is suitable for testing SQL, here are a few exmples of tests I’ve written:

-- Plan the tests.
SELECT plan(4);

-- Emit a diagnostic message for users of different locales.
SELECT diag(
    E'These tests expect LC_COLLATE to be en_US.UTF-8,\n'
  || 'but yours is set to ' || setting || E'.\n'
  || 'As a result, some tests may fail. YMMV.'
)
  FROM pg_settings
 WHERE name = 'lc_collate'
   AND setting <> 'en_US.UTF-8';

SELECT is( 'a', 'a', '"a" should = "a"' );
SELECT is( 'B', 'B', '"B" should = "B"' );

CREATE TEMP TABLE try (
    name lctext PRIMARY KEY
);

INSERT INTO try (name)
VALUES ('a'), ('ab'), ('â'), ('aba'), ('b'), ('ba'), ('bab'), ('AZ');

SELECT ok( 'a' = name, 'We should be able to select the value' )
  FROM try
 WHERE name = 'a';

SELECT throws_ok(
    'INSERT INTO try (name) VALUES (''a'')',
    '23505',
    'We should get an error inserting a lowercase letter'
);

-- Finish the tests and clean up.
SELECT * FROM finish();

As you can see, it’s just SQL. And yes, I have ported most of the test functions from Test::More, as well as a couple from Test::Exception.

So, without further ado, I’d like to introduce pgTAP, a lightweight test framework for PostgreSQL implemented in PL/pgSQL and PL/SQL. I’ll be hacking on it more in the coming days, mostly to get a proper client for running tests hacked together. Then I think I’ll see if pgFoundry is interested in it.

Whaddya think? Is this something you could use? I can see many uses, myself, not only for testing a custom data type as I develop it, but also custom functions in PL/pgSQL or PL/Perl, and, heck, just regular schema stuff. I’ve had to write a lot of Perl tests to test my database schema (triggers, rules, functions, etc.), all using the DBI and being very verbose. Being able to do it all in a single psql script seems so much cleaner. And if I cand end up mixing the output of those scripts in with the rest of my unit tests, so much the better!

Anyway, feedback welcome. Leave your comments, suggestions, complaints, patches, etc., below. Thanks!

How to Generate Recurring Events in the Database

This is a followup to my request for help fixing the performance of a database view that generated recurrences for events. This was an essential feature of Sandy, and thus important to get right. The idea when I started was simple:

  • Add a recurrence domain to the database that supports a number of different values, including daily, weekly, and monthly.
  • Add a recurrence column to the events table that identify how an event recurs.
  • Add a recurrence_dates table that contains a pre-generated list of recurrences for a given date. I’d populated this table with five years of dates, each one mapped to five years worth of recurrence dates (see the originalblog entry for more on the format of this table.
  • Create a view that maps each events row to its date and recurrence in the recurrence_dates table.
  • Profit.

It was this last bullet point that didn’t quite work out: although the data was perfectly accurate, queries for a lot of events in the view were very expensive. I mean, the query could run for 3-4 minutes. It was just crazy! I couldn’t figure out the problem, so I posted my request for help. It was through discussions that followed with depesz that I finally figured out what the problem was: Although I was usually selecting only a week’s or months worth of events, the view was calculating rows for all five years worth of data for all of the events for a given user. Um, not efficient.

So here I finally document how, with a lot of help and example code from depesz, I solved the problem. The trick was to use a function instead of a view to generate the recurring event rows, and to limit it only to the dates we’re interested in. For convenience sake, I broke this down into two PL/pgSQL functions: one to generate recurring dates and one to return the recurring event rows. But first, here’s the recurrence domain and the events table, both of which are unchanged from the original approach:

CREATE DOMAIN recurrence AS TEXT
CHECK ( VALUE IN ( 'none', 'daily', 'weekly', 'monthly' ) );

CREATE TABLE events (
    id         SERIAL     PRIMARY KEY,
    user_id    INTEGER    NOT NULL,
    starts_at  TIMESTAMP  NOT NULL,
    start_tz   TEXT       NOT NULL,
    ends_at    TIMESTAMP,
    end_tz     TEXT       NOT NULL,
    recurrence RECURRENCE NOT NULL DEFAULT 'none'
);

Just assume the user_id is a foreign key. Now let’s populate this table with some data. For the purposes of this demonstration, I’m going to create one event per day for 1000 days, evenly divided between daily, weekly, monthly, and no recurrences, as well as five different times of day and six different durations:

INSERT INTO events (user_id, starts_at, start_tz, ends_at, end_tz, recurrence)
SELECT 1,
       ts::timestamp,
       'PST8PDT',
       ts::timestamp + dur::interval,
       'PST8PDT',
       recur
  FROM (
    SELECT '2007-12-19'::date + i || ' ' || CASE i % 5
               WHEN 0 THEN '06:00'
               WHEN 1 THEN '10:00'
               WHEN 2 THEN '14:00'
               WHEN 3 THEN '18:00'
               ELSE        '22:30'
               END,
           CASE i % 6
               WHEN 0 THEN '2 hours'
               WHEN 1 THEN '1 hour'
               WHEN 2 THEN '45 minutes'
               WHEN 3 THEN '3.5 hours'
               WHEN 4 THEN '15 minutes'
               ELSE        '30 minutes'
               END,
           CASE i % 4
               WHEN 0 THEN 'daily'
               WHEN 1 THEN 'weekly'
               WHEN 2 THEN 'monthly'
               ELSE        'none'
               END
    FROM generate_series(1, 1000) as gen(i)
  ) AS ser( ts, dur, recur);

This gives us some nicely distributed data:

try=# select * from events limit 10;
  id  | user_id |      starts_at      | start_tz |       ends_at       | end_tz  | recurrence 
------+---------+---------------------+----------+---------------------+---------+------------
    1 |       1 | 2007-12-20 10:00:00 | PST8PDT  | 2007-12-20 11:00:00 | PST8PDT | weekly
    2 |       1 | 2007-12-21 14:00:00 | PST8PDT  | 2007-12-21 14:45:00 | PST8PDT | monthly
    3 |       1 | 2007-12-22 18:00:00 | PST8PDT  | 2007-12-22 21:30:00 | PST8PDT | none
    4 |       1 | 2007-12-23 22:30:00 | PST8PDT  | 2007-12-23 22:45:00 | PST8PDT | daily
    5 |       1 | 2007-12-24 06:00:00 | PST8PDT  | 2007-12-24 06:30:00 | PST8PDT | weekly
    6 |       1 | 2007-12-25 10:00:00 | PST8PDT  | 2007-12-25 12:00:00 | PST8PDT | monthly
    7 |       1 | 2007-12-26 14:00:00 | PST8PDT  | 2007-12-26 15:00:00 | PST8PDT | none
    8 |       1 | 2007-12-27 18:00:00 | PST8PDT  | 2007-12-27 18:45:00 | PST8PDT | daily
    9 |       1 | 2007-12-28 22:30:00 | PST8PDT  | 2007-12-29 02:00:00 | PST8PDT | weekly
   10 |       1 | 2007-12-29 06:00:00 | PST8PDT  | 2007-12-29 06:15:00 | PST8PDT | monthly
(10 rows)

Now let’s get to the recurring date function:

CREATE OR REPLACE FUNCTION  generate_recurrences(
    recurs RECURRENCE, 
    start_date DATE,
    end_date DATE
)
    RETURNS setof DATE
    LANGUAGE plpgsql IMMUTABLE
    AS $BODY$
DECLARE
    next_date DATE := start_date;
    duration  INTERVAL;
    day       INTERVAL;
    check     TEXT;
BEGIN
    IF recurs = 'none' THEN
        -- Only one date ever.
        RETURN next next_date;
    ELSIF recurs = 'weekly' THEN
        duration := '7 days'::interval;
        WHILE next_date <= end_date LOOP
            RETURN NEXT next_date;
            next_date := next_date + duration;
        END LOOP;
    ELSIF recurs = 'daily' THEN
        duration := '1 day'::interval;
        WHILE next_date <= end_date LOOP
            RETURN NEXT next_date;
            next_date := next_date + duration;
        END LOOP;
    ELSIF recurs = 'monthly' THEN
        duration := '27 days'::interval;
        day      := '1 day'::interval;
        check    := to_char(start_date, 'DD');
        WHILE next_date <= end_date LOOP
            RETURN NEXT next_date;
            next_date := next_date + duration;
            WHILE to_char(next_date, 'DD') <> check LOOP
                next_date := next_date + day;
            END LOOP;
        END LOOP;
    ELSE
        -- Someone needs to update this function, methinks.
        RAISE EXCEPTION 'Recurrence % not supported by generate_recurrences()', recurs;
    END IF;
END;
$BODY$;

The idea here is pretty simple: pass in a recurrence (daily, weekly, or monthly), a start date, and an end date, and get back a set of all the recurrence dates between the start and end dates:

try=# \timing
Timing is on.
try=# select * from  generate_recurrences('daily', '2008-01-29', '2008-02-05');
 generate_recurrences 
----------------------
 2008-01-29
 2008-01-30
 2008-01-31
 2008-02-01
 2008-02-02
 2008-02-03
 2008-02-04
 2008-02-05
(8 rows)

Time: 0.548 ms
try=# select * from  generate_recurrences('weekly', '2008-01-29', '2008-03-05');
 generate_recurrences 
----------------------
 2008-01-29
 2008-02-05
 2008-02-12
 2008-02-19
 2008-02-26
 2008-03-04
(6 rows)

Time: 0.670 ms
try=# select * from  generate_recurrences('monthly', '2008-01-29', '2008-05-05');
 generate_recurrences 
----------------------
 2008-01-29
 2008-02-29
 2008-03-29
 2008-04-29
(4 rows)

Time: 0.644 ms

Not bad, eh? And PostgreSQL’s date and interval calculation operators are wicked fast. Check out how long it dates to generate two years worth of daily recurrence dates:

try=# select * from  generate_recurrences('daily', '2008-01-29', '2010-02-05');
 generate_recurrences 
----------------------
 2008-01-29
 2008-01-30
 2008-01-31
...
 2010-02-03
 2010-02-04
 2010-02-05
(739 rows)

Time: 4.982 ms

Awesome. And the great thing about this function is that any time I need to add new recurrences (yearly, biweekly, quarterly, weekends, weekdays, etc.), I just modify the domain and this function and we’re ready to go.

And now, part two: the recurring event function:

CREATE OR REPLACE FUNCTION recurring_events_for(
   for_user_id INTEGER,
   range_start TIMESTAMP,
   range_end   TIMESTAMP
)
   RETURNS SETOF events
   LANGUAGE plpgsql STABLE
   AS $BODY$
DECLARE
   event events;
   start_date TIMESTAMPTZ;
   start_time TEXT;
   ends_at    TIMESTAMPTZ;
   next_date  DATE;
   recurs_at  TIMESTAMPTZ;
BEGIN
   FOR event IN 
       SELECT *
         FROM events
        WHERE user_id = for_user_id
          AND (
                  recurrence <> 'none'
              OR  (
                     recurrence = 'none'
                 AND starts_at BETWEEN range_start AND range_end
              )
          )
    LOOP
       IF event.recurrence = 'none' THEN
         RETURN NEXT event;
         CONTINUE;
       END IF;

       start_date := event.starts_at::timestamptz AT TIME ZONE event.start_tz;
       start_time := start_date::time::text;
       ends_at    := event.ends_at::timestamptz AT TIME ZONE event.end_tz;

       FOR next_date IN
           SELECT *
             FROM generate_recurrences(
                      event.recurrence,
                      start_date::date,
                      (range_end AT TIME ZONE event.start_tz)::date
             )
       LOOP
           recurs_at := (next_date || ' ' || start_time)::timestamp
               AT TIME ZONE event.start_tz;
           EXIT WHEN recurs_at > range_end;
           CONTINUE WHEN recurs_at < range_start AND ends_at < range_start;
           event.starts_at := recurs_at;
           event.ends_at   := ends_at;
           RETURN NEXT event;
       END LOOP;
   END LOOP;
   RETURN;
END;
$BODY$;

The idea here is to select the appropriate events for a given user between two dates, and for each event iterate over all of the recurrences between the two dates and return a row for each one. So the lines starting with FOR event IN and ending with LOOP select the original events, looking for either recurring events or non-recurring events that are between the two dates. Note that if you needed to, you could easily refine this query for your particular application, or even use PL/pgSQL’s EXECUTE operator to dynamically generate queries to suit particular application needs.

Next, the block starting with IF event.recurrence = 'none' THEN simply returns any non-recurring events. Although the next block already handles this case, adding this optimization eliminates a fair bit of calculation for the common case of non-recurring events.

Then the lines starting with FOR next_date IN and ending with LOOP select all of the dates for the recurrence in question, using the generate_recurrences() function created earlier. From LOOP to END LOOP;, the function generates the start and end timestamps, exiting the loop when the start date falls after the range or when it falls before the range and the end date falls after the range. There are many other tweaks one could make here to modify which recurrences are included and which are excluded. For example, if you had a column in the events table such as exclude_dates TIMESTAMP[] NOT NULL DEFAULT ‘{}’ that stored an array of dates to ignore when generating recurrences, you could just add this line to go ahead and exclude them from the results returned by the function:

           CONTINUE WHEN recurs_at = ANY( exclude_dates );

But enough of the details: let’s see how it works! Here’s a query for a week’s worth of data:

try=# select * from recurring_events_for(1, '2007-12-19', '2007-12-26');
  id  | user_id |      starts_at      | start_tz |       ends_at       | end_tz  | recurrence 
------+---------+---------------------+----------+---------------------+---------+------------
    1 |       1 | 2007-12-20 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    2 |       1 | 2007-12-21 14:00:00 | PST8PDT  | 2007-12-21 06:45:00 | PST8PDT | monthly
    3 |       1 | 2007-12-22 18:00:00 | PST8PDT  | 2007-12-22 21:30:00 | PST8PDT | none
    4 |       1 | 2007-12-23 22:30:00 | PST8PDT  | 2007-12-23 14:45:00 | PST8PDT | daily
    4 |       1 | 2007-12-24 22:30:00 | PST8PDT  | 2007-12-23 14:45:00 | PST8PDT | daily
    4 |       1 | 2007-12-25 22:30:00 | PST8PDT  | 2007-12-23 14:45:00 | PST8PDT | daily
    5 |       1 | 2007-12-24 06:00:00 | PST8PDT  | 2007-12-23 22:30:00 | PST8PDT | weekly
    6 |       1 | 2007-12-25 10:00:00 | PST8PDT  | 2007-12-25 04:00:00 | PST8PDT | monthly
(8 rows)

Time: 51.890 ms

Note the time it took to execute this query. 52 ms is a hell of a lot faster than the several minutes it took to run a similar query using the old view. Plus, I’m not limited to just the recurrence dates I’ve pre-calculated in the old recurrence_dates table. Now we can use whatever dates are supported by PostgreSQL. It’s even fast when we look at a year’s worth of data:

try=# select * from recurring_events_for(1, '2007-12-19', '2008-12-19');
  id  | user_id |      starts_at      | start_tz |       ends_at       | end_tz  | recurrence 
------+---------+---------------------+----------+---------------------+---------+------------
    1 |       1 | 2007-12-20 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2007-12-27 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-03 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-10 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-17 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-24 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-31 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
...
  364 |       1 | 2008-12-17 22:30:00 | PST8PDT  | 2008-12-17 14:45:00 | PST8PDT | daily
  364 |       1 | 2008-12-18 22:30:00 | PST8PDT  | 2008-12-17 14:45:00 | PST8PDT | daily
  365 |       1 | 2008-12-18 06:00:00 | PST8PDT  | 2008-12-17 22:30:00 | PST8PDT | weekly
(19691 rows)

Time: 837.759 ms

Not stellar, but still respectable. Given that for a typical application, a user will be looking at only a day’s or a week’s or a month’s events at a time, this seems to be an acceptable trade-off. I mean, how often will your users need to see a list of 20,000 events? And even if a user waslooking at a year’s worth of data, it’s unlikely that 75% of them would be recurring as in the example data here.

I was fucking pumped with this solution, and Sandy has hummed along nicely since we put it into production. If you’re interested in trying it for yourself, I’ve you can get all the SQL from this blog entry here.

The only thing I would like to have been able to do differently was to encapsulate the recurring_events_for() function in a view. Such would have made it much easier to actually use this solution in Rails. If you know how to do that, please do leave a comment. As for how I hacked Rails to use the function, well, that’s a blog post for another day.

Validating Time Zones in PostgreSQL

I recently needed to validate that a value stored in a TEXTcolumn was a valid time zone identifier. Why? Because I was using its value inside the database to convert timestamp columns from UTC to a valid zone. So I set about writing a function I could use in a constraint.

It turns out that PostgreSQL has a pretty nice view that lists all of the time zones that it recognizes. It’s called pg_timezone_names:

try=# select * from pg_timezone_names limit 5;
        name        | abbrev | utc_offset | is_dst 
--------------------+--------+------------+--------
 Africa/Abidjan     | GMT    | 00:00:00   | f
 Africa/Accra       | GMT    | 00:00:00   | f
 Africa/Addis_Ababa | EAT    | 03:00:00   | f
 Africa/Algiers     | CET    | 01:00:00   | f
 Africa/Asmara      | EAT    | 03:00:00   | f
(5 rows)

Cool. So all I had to do was to look up the value in this view. My first stab at creating a time zone validation function therefore looked like this:

CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
DECLARE
  bool BOOLEAN;
BEGIN
  SELECT TRUE INTO bool
    FROM pg_timezone_names
   WHERE LOWER(name) = LOWER(tz)
      OR LOWER(abbrev) = LOWER(tz);
  RETURN FOUND;
END;
$$ language plpgsql STABLE;

This should pretty well cover anything that PostgreSQL considers valid. So does it work? You bet:

sandy_development=# \timing
Timing is on.
sandy_development=# select is_timezone(’America/Los_Angeles’);
 is_timezone 
-------------
 t
(1 row)

Time: 457.096 ms
sandy_development=# select is_timezone(’Foo/Bar’);
 is_timezone 
-------------
 f
(1 row)

Time: 472.752 ms

Perfect! Well, except for just one thing: performance is abysmal. A half second per shot? Not very useful for constraints. And since pg_timezone_names is a view (and, under that, a function), I can’t create indexes.

But then I did something dangerous: I started thinking. I realized that I needed this function when our app started getting errors like this:

try=# select now() at time zone ‘Foo/Bar’;
ERROR:  time zone "Foo/Bar" not recognized

So the underlying C code throws an error when a time zone is invalid. What if I could just trap the error? Well, PL/pgSQL conveniently has exception handling, so I could do just that. But there was only one problem. PL/pgSQL’s exception handling syntax requires that you specify an error condition. Here’s what the documentation has:

EXCEPTION
    WHEN condition [ OR condition ... ] THEN
        handler_statements
    [ WHEN condition [ OR condition ... ] THEN
          handler_statements
      ... ]
END;

Conditions are error codes. But which one corresponds to the invalid time zone error? I tried a few, but couldn’t figure out which one. (Anyone know now to map errors you see in psql to the error codes listed in Appendex A? Let me know!) But really, my function just needed to do one thing. Couldn’t I just trap any old error?

A careful re-read of the PL/pgSQL documentation reveals that, yes, you can. Use the condition OTHERS, and you can catch almost anything. With this information in hand, I quickly wrote:

CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
DECLARE
  date TIMESTAMPTZ;
BEGIN
  date := now() AT TIME ZONE tz;
  RETURN TRUE;
EXCEPTION WHEN OTHERS THEN
  RETURN FALSE;
END;
$$ language plpgsql STABLE;

And how well does this one work?

sandy_development=# select is_timezone(’America/Los_Angeles’);
 is_timezone 
-------------
 t
(1 row)

Time: 3.009 ms
sandy_development=# select is_timezone(’Foo/Bar’);
 is_timezone 
-------------
 f
(1 row)

Time: 1.224 ms

Yes, I’ll take 1-3 ms over 400-500 ms any day! I might even create a domain for this and be done with it:

CREATE DOMAIN timezone AS TEXT
CHECK ( is_timezone( value ) );

Enjoy!

Powered by KinoSearch