Just a Theory

Trans rights are human rights

Posts about Time Zones

Always Use TIMESTAMP WITH TIME ZONE

My recommendations for sane time zone management in PostgreSQL:

  • Set timezone = 'UTC' in postgresq.conf. This makes UTC the default time zone for all connections.
  • Use timestamp with time zone (aka timestamptz) and time with time zone (aka timetz). They store values as UTC, but convert them on selection to whatever your time zone setting is.
  • Avoid timestamp without time zone (aka timestamp) and time without time zone (aka time). These columns do not know the time zone of a value, so different apps can insert values in different zones no one would ever know.
  • Always specify a time zone when inserting into a timestamptz or timetz column. Unless the zone is UTC. But even then, append a “Z” to your value: it’s more explicit, and will keep you sane.
  • If you need to get timestamptz or timetz values in a zone other than UTC, use the AT TIME ZONE expression in your query. But be aware that the returned value will be a timestamp or time value, with no more time zone. Good for reporting and queries, bad for storage.
  • If your app always needs data in some other time zone, have it SET timezone = 'UTC' on connection. All values then retrieved from the database will be in the configured time zone. The app should still include the time zone in values sent to the database.

The one exception to the rule preferring timestamptz and timetz is a special case: partitioning. When partitioning data on timestamps, you must not use timestamptz. Why? Because almost no expression involving timestamptz comparison is immutable. Use one in a WHERE clause, and constraint exclusion may well be ignored and all partitions scanned. This is usually something you want to avoid.

So in this one case and only in this one case, use a timestamp without time zone column, but always insert data in UTC. This will keep things consistent with the timestamptz columns you have everywhere else in your database. Unless your app changes the value of the timestamp GUC when it connects, it can just assume that everything is always UTC, and should always send updates as UTC.

Looking for the comments? Try the old layout.

Validating Time Zones in PostgreSQL

I recently needed to validate that a value stored in a TEXTcolumn was a valid time zone identifier. Why? Because I was using its value inside the database to convert timestamp columns from UTC to a valid zone. So I set about writing a function I could use in a constraint.

It turns out that PostgreSQL has a pretty nice view that lists all of the time zones that it recognizes. It’s called pg_timezone_names:

try=# select * from pg_timezone_names limit 5;
        name        | abbrev | utc_offset | is_dst 
--------------------+--------+------------+--------
 Africa/Abidjan     | GMT    | 00:00:00   | f
 Africa/Accra       | GMT    | 00:00:00   | f
 Africa/Addis_Ababa | EAT    | 03:00:00   | f
 Africa/Algiers     | CET    | 01:00:00   | f
 Africa/Asmara      | EAT    | 03:00:00   | f
(5 rows)

Cool. So all I had to do was to look up the value in this view. My first stab at creating a time zone validation function therefore looked like this:

CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
DECLARE
    bool BOOLEAN;
BEGIN
    SELECT TRUE INTO bool
    FROM pg_timezone_names
    WHERE LOWER(name) = LOWER(tz)
        OR LOWER(abbrev) = LOWER(tz);
    RETURN FOUND;
END;
$$ language plpgsql STABLE;

This should pretty well cover anything that PostgreSQL considers valid. So does it work? You bet:

sandy_development=# \timing
Timing is on.
sandy_development=# select is_timezone('America/Los_Angeles');
 is_timezone 
-------------
 t
(1 row)

Time: 457.096 ms
sandy_development=# select is_timezone('Foo/Bar');
 is_timezone 
-------------
 f
(1 row)

Time: 472.752 ms

Perfect! Well, except for just one thing: performance is abysmal. A half second per shot? Not very useful for constraints. And since pg_timezone_names is a view (and, under that, a function), I can’t create indexes.

But then I did something dangerous: I started thinking. I realized that I needed this function when our app started getting errors like this:

try=# select now() at time zone 'Foo/Bar';
ERROR:  time zone "Foo/Bar" not recognized

So the underlying C code throws an error when a time zone is invalid. What if I could just trap the error? Well, PL/pgSQL conveniently has exception handling, so I could do just that. But there was only one problem. PL/pgSQL’s exception handling syntax requires that you specify an error condition. Here’s what the documentation has:

EXCEPTION
    WHEN condition [ OR condition ... ] THEN
        handler_statements
    [ WHEN condition [ OR condition ... ] THEN
            handler_statements
        ... ]
END;

Conditions are error codes. But which one corresponds to the invalid time zone error? I tried a few, but couldn’t figure out which one. (Anyone know now to map errors you see in psql to the error codes listed in Appendix A? Let me know!) But really, my function just needed to do one thing. Couldn’t I just trap any old error?

A careful re-read of the PL/pgSQL documentation reveals that, yes, you can. Use the condition “OTHERS,” and you can catch almost anything. With this information in hand, I quickly wrote:

CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
DECLARE
    date TIMESTAMPTZ;
BEGIN
    date := now() AT TIME ZONE tz;
    RETURN TRUE;
EXCEPTION WHEN OTHERS THEN
    RETURN FALSE;
END;
$$ language plpgsql STABLE;

And how well does this one work?

sandy_development=# select is_timezone('America/Los_Angeles');
 is_timezone 
-------------
 t
(1 row)

Time: 3.009 ms
sandy_development=# select is_timezone('Foo/Bar');
 is_timezone 
-------------
 f
(1 row)

Time: 1.224 ms

Yes, I’ll take 1-3 ms over 400-500 ms any day! I might even create a domain for this and be done with it:

CREATE DOMAIN timezone AS TEXT
CHECK ( is_timezone( value ) );

Enjoy!

Update: From a comment left by Tom Lane, use invalid_parameter_value rather than OTHERS:

CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
DECLARE
    date TIMESTAMPTZ;
BEGIN
    date := now() AT TIME ZONE tz;
    RETURN TRUE;
EXCEPTION invalid_parameter_value OTHERS THEN
    RETURN FALSE;
END;
$$ language plpgsql STABLE;

Looking for the comments? Try the old layout.

Need Help Reducing View Calculations

I could use some advice and suggestions for how to solve a performance problem due to the highly redundant calculation of values in a view. Sorry for the longish explanation. I wanted to make sure that I omitted no details in describing the problem.

In order to support recurring events in an application I’m working on, we have a lookup table that maps dates to their daily, weekly, monthly, and yearly recurrences. It looks something like this:

try=# \d recurrence_dates
   Table "public.recurrence_dates"
   Column   |    Type    | Modifiers 
------------+------------+-----------
 date       | date       | not null
 recurrence | recurrence | not null
 next_date  | date       | not null
Indexes:
    "recurrence_dates_pkey" PRIMARY KEY, btree (date, recurrence, next_date)
    "index_recurrence_dates_on_date_and_recurrence" btree (date, recurrence)

try=# select * from recurrence_dates
try-# where date = '2007-11-04'
try-# order by recurrence, next_date;
    date    | recurrence | next_date  
------------+------------+------------
 2007-11-04 | daily      | 2007-11-04
 2007-11-04 | daily      | 2007-11-05
 2007-11-04 | daily      | 2007-11-06
 2007-11-04 | weekly     | 2007-11-04
 2007-11-04 | weekly     | 2007-11-11
 2007-11-04 | weekly     | 2007-11-18
 2007-11-04 | monthly    | 2007-11-04
 2007-11-04 | monthly    | 2007-12-04
 2007-11-04 | monthly    | 2008-01-04
 2007-11-04 | annually   | 2007-11-04
 2007-11-04 | annually   | 2008-11-04
 2007-11-04 | annually   | 2009-11-04
 2007-11-04 | none       | 2007-11-04

To get all of the permutations of recurring events, we simply select from a view rather than from the events table that contains the actual event data. The view joins events to recurrence_dates table like so:

CREATE OR REPLACE VIEW recurring_events AS
SELECT id, name, user_id, duration,
       (rd.next_date || ' ' ||
       (starts_at::timestamptz at time zone start_tz)::time)::timestamp
       at time zone start_tz AS starts_at,
       start_tz
  FROM events LEFT JOIN recurrence_dates rd
    ON (events.starts_at::timestamptz at time zone events.start_tz)::date = rd.date
   AND events.recurrence = rd.recurrence;

Then, to get all of the recurrences of events for a user within a week, we do something like this in the client code:

SELECT *
  FROM recurring_events
 WHERE user_id = 2
   AND starts_at BETWEEN '2007-11-04 07:00:00' AND '2007-11-10 07:59:59';

This works perfectly, as all of our dates and times are stored in UTC in timestamp columns. We pass UTC times for the appropriate offset to the query (Pacific Time in this example) and, because the view does the right thing in mapping the starts_at time for each event to its proper time zone, we get all of the events within the date range, even if they are recurrences of an earlier event, and with their times properly set.

The trouble we’re having, however, is all of those conversions. Until last week, the view just kept everything in UTC and left it to the client to convert to the proper zone in the start_tz column. But that didn’t work so well when an event’s starts_at was during daylight savings time and recurrences were in standard time: the standard time recurrences were all an hour off! So I added the repeated instances of events.starts_at::timestamptz at time zone events.start_tz. But now the view is really slow.

Since the only thing that has changed is the addition of the time zone conversions, I believe that the performance penalty is because of them. The calculation executes multiple times per row: once for the join and once again for the starts_at column. We can have an awful lot of events for a given user, and an awful lot of recurrences of a given event. If, for example, an event recurs daily for 2 years, there will be around 730 rows for that one event. And the calculation has to be executed for every one of them before the WHERE clause can be properly evaluated. Ouch! Worse still, we actually have three columns that do this in our application, not just one as in the example here.

So what I need is a way to execute that calculation just once for each row in the events table, rather than once for each row in the recurring_events view. I figure 1 calculation will be a heck of a lot faster than 730! So the question is, how do I do this? How do I get the view to execute the conversion of the starts_at to the start_tz time zone only once for each row in events, regardless of how many rows it ends up generating in the recurring_events view?

Suggestions warmly welcomed. This is a bit of a tickler for me, and since the query performance on these views is killing us, I need to get this adjusted post haste!

Meanwhile, tomorrow I’ll post a cool hack I came up with for validating time zones in the database. Something to look forward to as you ponder my little puzzle, eh?

Update 2008-01-30: Thanks to help from depesz, I came figured out what the underling problem was and solved it much more elegantly using PL/pgSQL. I’ve now written up the basic recipe. Enjoy!

Looking for the comments? Try the old layout.

Ruby Time Object Time Zone Bug

This is disappointing.

To summarize, Ruby’s Time class has a bug in its zone method. The example is simple:

tz = ENV['TZ']
ENV['TZ'] = 'Africa/Luanda'
t = Time.now
puts t.zone
ENV['TZ'] = 'Australia/Lord_Howe'
puts t.zone

This outputs:

WAT
WAT

So far so good. But look what happens when I add a single line to the program, foo = t.to_s:

tz = ENV['TZ']
ENV['TZ'] = 'Africa/Luanda'
t = Time.now
puts t.zone
ENV['TZ'] = 'Australia/Lord_Howe'
foo = t.to_s
puts t.zone

The result changes:

WAT
LHST

This is clearly wrong. Changing the $TZ environment variable and stringifying the object should not change the underlying value of any of the object’s attributes. The Time object should remember the value of the time zone when it is initialized, and should never change.

Unfortunately, The Ruby core developers (or at least the owner of the bug report) feel that, since Time relies on the system C API, and since time zones are a PITA, it’s not worth it to change this behavior. The downside, however, is that you cannot rely on Time zones to ever be correct unless you’re very, very careful.

Personally, in my subclass of Time, I took care of stashing the time zone at object instantiation as a workaround for this bug. It seemed reasonable to me, and I was just surprised that the idea was rejected by the Ruby developers.

What do you think?

Looking for the comments? Try the old layout.

Intelligent MySQL Configuration

James Duncan Davidson’s Configuring MySQL on MacOS X post earlier today reminded me that I wanted to blog about the configuration I came up with while installing MySQL 5 on my box. Nothing has irritated me more than when MySQL’s syntax has violated the ANSI SQL standards in the most blatant ways, or when transactions have appeared to work, but mysteriously not worked. Yes, I use Duncan’s settings to make sure that the MySQL box on my PowerBook only listens on local sockets, but I additionally add this configuration to /etc/my.cnf:

[mysqld]
sql-mode=ansi,strict_trans_tables,no_auto_value_on_zero,no_zero_date,no_zero_in_date,only_full_group_by
character-set-server=utf8
default-storage-engine=InnoDB
default-time-zone=utc

That last configuration can actually only be added after running this command:

/usr/local/mysql/bin/mysql_tzinfo_to_sql /usr/share/zoneinfo | \
mysql -u root mysql

But then the upshot is that I have everything configured to be as compliant as possible (although the time zone stuff is just my personal preference):

mysql> SELECT @@global.sql_mode;
mysql> SELECT @@global.sql_mode;
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| @@global.sql_mode                                                                                                                                     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE,ONLY_FULL_GROUP_BY,ANSI,NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show variables like '%character_set\_%';
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_client     | utf8   |
| character_set_connection | utf8   |
| character_set_database   | utf8   |
| character_set_filesystem | binary |
| character_set_results    | utf8   |
| character_set_server     | utf8   |
| character_set_system     | utf8   |
+--------------------------+--------+
7 rows in set (0.01 sec)

mysql> show variables like '%table_ty%';
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| table_type    | InnoDB |
+---------------+--------+
1 row in set (0.00 sec)

mysql> show variables like 'time_zone%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| time_zone     | utc   |
+---------------+-------+
1 row in set (0.00 sec)

Now that’s the way things should be! Or at least as close as I’m going to get to it in MySQL 5.

Update 2006-11-04: Ask Bjørn Hansen turned me on to the “strict_trans_tables” mode, which prevents MySQL from trying to guess what you mean when you leave out a value for a required column. So I’ve now updated my configuration with sql-mode=ansi,strict_trans_tables.

Update 2009-11-05: I found myself configuring MySQL again today, and there were some other settings I found it useful to add:

  • no_auto_value_on_zero forces AUTO_INCREMENT columns to increment only when inserting a NULL, rather than when inserting a NULL or a zero(!).
  • no_zero_date and no_zero_in_date disallow dates where the the year or month are set to 0.
  • only_full_group_by requires that non-aggregated columns in a select list be included in a GROUP BY clause, as is mandated by the SQL standard. This only applies if an aggregate function is used in a query

I’ve added all of these to the example above.

Looking for the comments? Try the old layout.