<?xml version="1.0" encoding="utf-8"?>

<rdf:RDF 
  xmlns="http://purl.org/rss/1.0/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:cc="http://web.resource.org/cc/"
  xmlns:admin="http://webns.net/mvcb/"
  xmlns:content="http://purl.org/rss/1.0/modules/content/"
  xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/"
> 

  <channel rdf:about="http://justatheory.com">
    <title>Just a Theory</title>
    <link>http://justatheory.com</link>
    <description>Theory waxes practical. By David Wheeler.</description>
    <language>en-us</language>
    <dc:creator>David Wheeler (david@justatheory.com)</dc:creator>
    <dc:rights>Copyright David Wheeler</dc:rights>
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <admin:generatorAgent rdf:resource="http://www.raelity.org/apps/blosxom/?v=2.0" />
    <admin:errorReportsTo rdf:resource="mailto:david@justatheory.com"/>

    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/citext-patch-submitted.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/pgtap-on-pgfoundry_at_yapc.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/introducing_pgtap.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/recurring_events.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/timezone_validation.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/reducing_view_calculations.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/ruby_warm_standby.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/rails_and_slony.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/gtin-0.01.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/pgsql_batch_updates.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/plpgsql_talk_slides.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/plpgsql_talk.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/many_to_many_plpgsql.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/benchmarking_functions.html" />
        <rdf:li rdf:resource="http://justatheory.com/computers/databases/postgresql/ean_validation.html" />

      </rdf:Seq>
    </items>


    <image rdf:resource="http://meerkat.oreillynet.com/icons/meerkat-powered.jpg" />

  </channel>

  <image rdf:about="http://justatheory.com/logo.gif">
    <title>Just a Theory</title>
    <url>http://justatheory.com/logo.gif</url>
    <link>http://justatheory.com</link>
  </image>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/citext-patch-submitted.html">
    <title>CITEXT Patch Submitted to PostgreSQL Contrib</title>
    <link>http://justatheory.com/computers/databases/postgresql/citext-patch-submitted.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2008-06-30T12:49-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>I started out by trying to use the <a href="http://pgfoundry.org/projects/citext/" title="">citext project on pgFoundry</a>, but immediately identified two issues with it:</p>

<ol>
  <li>It does not install properly on PostgreSQL 8.3 (it uses a lot of casts that were removed in 8.3); and </li>
  <li>It only case-insensitively compared ASCII characters. So accented multibyte characters work just as they do in the text type.</li>
</ol>

<p>So I set about trying to create my own, new type, originally called “lctext”, since what it does is not true case-insensitive comparisons, but lowercases text and then compares, just as millions of us developers already do by using <code>LOWER()</code> on both sides of a query:</p>

<pre>
SELECT *
  FROM tab
 WHERE lower(col) = LOWER(?);
</pre>

<p>I just finally got fed up with this. The last straw for me was wanting to create a primary key that would be stored case-insensitively, which would have required that I create two indexes for it: One created for the primary key by default, the other a functional <code>UNIQUE INDEX</code> on <code>LOWER(col)</code>, which would just be stupid.</p>

<p>So <a href="http://archives.postgresql.org/message-id/4013F1AE-FE1B-427B-8C23-1A5681DA297E@kineticode.com" title="PATCH: CITEXT 2.0">this patch</a> is the culmination of my work to make a locale-aware case-insensitive text type. It&#x2019;s locale-aware in that it uses the same locale-aware string comparison code as that used for the text type, and it uses the same C function as <code>LOWER()</code> uses. The nice thing is that it works just as if you had used <code>LOWER()</code> in all your SQL, but now you don&#x2019;t have to.</p>

<p>So while this is not a <em>true</em> case-insensitive text type, in the sense that it doesn&#x2019;t do a case-insensitive comparison, but changes the cases and <em>then</em> compares, it is likely more efficient than the <code>LOWER()</code> workaround that we&#x2019;ve all been using for years, and it neater, too. Using this type, it will now be much easier to create, e.g, an <a href="http://www.varlena.com/GeneralBits/128.php" title="PostgrSQL General Bits: “Base Type using Domains”">email domain</a>, like so:</p>

<pre>
CREATE OR REPLACE FUNCTION is_email(text)
RETURNS BOOLEAN
AS $$
    use Email::Valid;
    return TRUE if Email::Valid->address( $_[0] );
    return FALSE;
$$ LANGUAGE &#x2018;plperlu&#x2019; STRICT IMMUTABLE;

CREATE DOMAIN email AS CITEXT CHECK ( is_email( value ) );
</pre>

<p>No more nasty workarounds to account for the lack of case-insensitive comparisons for text types. It works great for time zones and other data types that are defined to compare case-insensitively:</p>

<pre>
CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
BEGIN
  PERFORM now() AT TIME ZONE tz;
  RETURN TRUE;
EXCEPTION WHEN invalid_parameter_value THEN
  RETURN FALSE;
END;
$$ language plpgsql STABLE;

CREATE DOMAIN timezone AS CITEXT
CHECK ( is_timezone( value ) );
</pre>

<p>And that should just work!</p>

<p>I&#x2019;m hoping that this is accepted during the <a href="http://wiki.postgresql.org/index.php?title=CommitFest:2008-07" title="PostgreSQL CommitFest:2008-07">July CommitFest</a>. Of course I will welcome suggestions for how to improve it. Since I sent the patch, for example, I&#x2019;ve been thinking that I should suggest in the documentation that it is best used for short text entries (say, up to 256 characters), rather than longer entries (like email bodies or journal articles), and that for longer entries, one should really make use of <a href="http://www.postgresql.org/docs/current/static/textsearch.html" title="PostgreSQL Documentation: Chapter 12. Full Text Search">tsearch2</a>, instead. There are other notes and caveats in the <a href="http://archives.postgresql.org/message-id/4013F1AE-FE1B-427B-8C23-1A5681DA297E@kineticode.com" title="PATCH: CITEXT 2.0">patch submission</a>. Please do let me know what you think.</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/pgtap-on-pgfoundry_at_yapc.html">
    <title>pgTAP pgFoundry Project and Lightning Talk</title>
    <link>http://justatheory.com/computers/databases/postgresql/pgtap-on-pgfoundry_at_yapc.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2008-06-18T20:43-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>A couple of quick announcements:</p>

<dl>
  <dt>pgTAP on pgFoundry</dt>
  <dd><p>First, the PostgreSQL community approved my project, so now there is a <a href="http://pgfoundry.org/projects/pgtap/" title="pgTAP on pgFoundry">pgTAP project page</a>, including a couple of <a href="http://pgfoundry.org/mail/?group_id=1000389" title="pgTAP Mail Lists">mail lists</a>, a <a href="http://pgfoundry.org/tracker/?group_id=1000389" title="pgTAP Tracker">bug tracker</a>, and <a href="http://pgfoundry.org/frs/?group_id=1000389" title="downloads">downloads</a>. I uploaded a new version shortly after the project was approved, and 0.03 should be there soon, as well.</p></dd>
  <dt>pgTAP YAPC::NA Lightning Talk</dt>
  <dd><p>I gave a <a href="http://conferences.mongueurs.net/yn2008/" title="YAPC::NA 2008 pgTAP Lightning Talk">Lightning talk</a> at YAPC::NA 2008 in Chicago this afternoon. I&#x2019;ve now posted <a href="/computers/databases/postgresql/pgtap-yapc.pdf" title="Slides for “Test Your Database with pgTAP”">the slides</a> for your enjoyment.</p></dd>
</dl>

<p>Care to help me with development? Want to add your own test functions or make it all integrate better with standard PostgreSQL regression tests? Want to help me get Module::Build or Module::Install to run Perl and PostgreSQL and whatever tests side-by-side, all at once? Join the <a href="http://pgfoundry.org/mail/?group_id=1000389" title="Subscribe to pgtap-users">pgtap-users</a> mail list and join the fun!</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/introducing_pgtap.html">
    <title>Introducing pgTAP</title>
    <link>http://justatheory.com/computers/databases/postgresql/introducing_pgtap.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2008-06-07T05:24-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>So I started working on a new PostgreSQL data type this week. More on that
soon; in the meantime, I wanted to create a test suite for it, and wasn&#x2019;t sure
where to go. The only PostgreSQL tests I&#x2019;ve seen are those distributed with
Elein
Mustain&#x2019;s <a href="http://www.varlena.com/varlena/GeneralBits/Tidbits/email_test.sql"
title="Testing the email data type">tests</a> for the email data type she
created in a <a href="http://www.varlena.com/GeneralBits/128.php" title="Base
Type using Domains">PostgreSQL General Bits</a> posting from a couple of years
ago. I used the same approach myself for
my <a href="http://pgfoundry.org/projects/gtin/" title="GTIN data type project
info">GTIN data type</a>, but it was rather hard to use: I had to pay very
close attention to what was output in order to tell the description output
from the test output. It was quite a PITA, actually.</p>

<p>This time, I started down the same path, then then started thinking about
Perl testing, where each unit test, or <q>assertion,</q> in the xUnit
parlance, triggers output of a single line of information indicating whether
or not a test succeeded. It occurred to me that I could just run a bunch of
queries that returned booleans to do my testing. So my first stab looked
something like this:</p>

<pre>
\set ON_ERROR_STOP 1
\set AUTOCOMMIT off
\pset format unaligned
\pset tuples_only
\pset pager
\pset null &#x0027;[NULL]&#x0027;

SELECT foo() = &#x0027;bar&#x0027;;
SELECT foo(1) = &#x0027;baz&#x0027;;
SELECT foo(2) = &#x0027;foo&#x0027;;
</pre>

<p>The output looked like this:</p>

<pre>
% psql try -f ~/Desktop/try.sql
t
t
t
</pre>

<p>Once I started down that path, and had written ten or so tests, It suddenly
dawned on me that the Perl <a href="http://search.cpan.org/perldoc?Test::More"
title="Test::More on CPAN">Test::More</a> module and its
core <code>ok()</code> subroutine worked just like that. It essentially just
depends on a boolean value and outputs text based on that value. A couple
minutes of hacking and I had this:</p>

<pre>
CREATE TEMP SEQUENCE __tc__;
CREATE OR REPLACE FUNCTION ok ( boolean, text ) RETURNS TEXT AS $$
    SELECT (CASE $1 WHEN TRUE THEN &#x0027;&#x0027; ELSE &#x0027;not &#x0027; END) || &#x0027;ok&#x0027;
        || &#x0027; &#x0027; || NEXTVAL(&#x0027;__tc__&#x0027;)
        || CASE $2 WHEN &#x0027;&#x0027; THEN &#x0027;&#x0027; ELSE COALESCE( &#x0027; - &#x0027; || $2, &#x0027;&#x0027; ) END;
$$ LANGUAGE SQL;
</pre>

<p>I then rewrote my test queries like so:</p>

<pre>
\echo 1..3
SELECT ok( foo() = &#x0027;bar&#x0027;   &#x0027;foo() should return &quot;bar&quot;&#x0027; );
SELECT ok( foo(1) = &#x0027;baz&#x0027;, &#x0027;foo(1) should return &quot;baz&quot;&#x0027; );
SELECT ok( foo(2) = &#x0027;foo&#x0027;, &#x0027;foo(2) should return &quot;foo&quot;&#x0027; );
</pre>

<p>Running these tests, I now got:</p>

<pre>
% psql try -f ~/Desktop/try.sql
1..3
ok 1 - foo() should return &quot;bar&quot;
ok 2 - foo(1) should return &quot;baz&quot;
ok 3 - foo(2) should return &quot;foo&quot;
</pre>

<p>And, <strong><em>BAM!</em></strong> I had the beginning of a test framework
that emits pure <a href="http://testanything.org/" title="Test Anything
Protocol">TAP</a> output.</p>

<p>Well, I was so excited about this that I put aside my data type for a few
hours and banged out the rest of the framework. Why was this exciting to me?
Because now I can use a standard test harness to run the tests, even mix them
in with other TAP tests on any project I might work on. Just now, I quickly
hacked together a quick script to run the tests:</p>

<pre>
use TAP::Harness;

my $harness = TAP::Harness->new({
    timer   => $opts->{timer},
    exec    => [qw( psql try -f )],
});

$harness->runtests( @ARGV );
</pre>

<p>Now I&#x2019;m able to run the tests like so:</p>

<pre>
% try ~/Desktop/try.sql        
/Users/david/Desktop/try........ok   
All tests successful.
Files=1, Tests=3,  0 wallclock secs ( 0.00 usr  0.00 sys +  0.01 cusr  0.00 csys =  0.01 CPU)
Result: PASS
</pre>

<p>Pretty damn cool! And lest you wonder whether such a suite of TAP-emitting
test functions is suitable for testing SQL, here are a few exmples of tests
I&#x2019;ve written:</p>

<pre>
&#x002d;&#x002d; Plan the tests.
SELECT plan(4);

&#x002d;&#x002d; Emit a diagnostic message for users of different locales.
SELECT diag(
    E&#x0027;These tests expect LC_COLLATE to be en_US.UTF-8,\n&#x0027;
  || &#x0027;but yours is set to &#x0027; || setting || E&#x0027;.\n&#x0027;
  || &#x0027;As a result, some tests may fail. YMMV.&#x0027;
)
  FROM pg_settings
 WHERE name = &#x0027;lc_collate&#x0027;
   AND setting &lt;&gt; &#x0027;en_US.UTF-8&#x0027;;

SELECT is( &#x0027;a&#x0027;, &#x0027;a&#x0027;, &#x0027;&quot;a&quot; should = &quot;a&quot;&#x0027; );
SELECT is( &#x0027;B&#x0027;, &#x0027;B&#x0027;, &#x0027;&quot;B&quot; should = &quot;B&quot;&#x0027; );

CREATE TEMP TABLE try (
    name lctext PRIMARY KEY
);

INSERT INTO try (name)
VALUES (&#x0027;a&#x0027;), (&#x0027;ab&#x0027;), (&#x0027;â&#x0027;), (&#x0027;aba&#x0027;), (&#x0027;b&#x0027;), (&#x0027;ba&#x0027;), (&#x0027;bab&#x0027;), (&#x0027;AZ&#x0027;);

SELECT ok( &#x0027;a&#x0027; = name, &#x0027;We should be able to select the value&#x0027; )
  FROM try
 WHERE name = &#x0027;a&#x0027;;

SELECT throws_ok(
    &#x0027;INSERT INTO try (name) VALUES (&#x0027;&#x0027;a&#x0027;&#x0027;)&#x0027;,
    &#x0027;23505&#x0027;,
    &#x0027;We should get an error inserting a lowercase letter&#x0027;
);

&#x002d;&#x002d; Finish the tests and clean up.
SELECT * FROM finish();
</pre>

<p>As you can see, it&#x2019;s just SQL. And yes, I have ported most of the test
functions from <a href="http://search.cpan.org/perldoc?Test::More"
title="Test::More on CPAN">Test::More</a>, as well as a couple
from <a href="http://search.cpan.org/perldoc?Test::Exception"
title="Test::Exception on CPAN">Test::Exception</a>.</p>

<p>So, without further ado, I&#x2019;d like to introduce
<a href="https://svn.kineticode.com/pgtap/trunk" title="pgTAP Subversion
repository">pgTAP</a>, a lightweight test framework for PostgreSQL implemented
in PL/pgSQL and PL/SQL. I&#x2019;ll be hacking on it more in the coming days, mostly
to get a proper client for running tests hacked together. Then I think I&#x2019;ll
see if pgFoundry is interested in it.</p>

<p>Whaddya think? Is this something you could use? I can see many uses,
myself, not only for testing a custom data type as I develop it, but also
custom functions in PL/pgSQL or PL/Perl, and, heck, just regular schema stuff.
I&#x2019;ve had to write a lot of Perl tests to test my database schema (triggers,
rules, functions, etc.), all using the DBI and being very verbose. Being able
to do it all in a single psql script seems so much cleaner. And if I cand end
up mixing the output of those scripts in with the rest of my unit tests,
so much the better!</p>

<p>Anyway, feedback welcome. Leave your comments, suggestions, complaints,
patches, etc., below. Thanks!</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/recurring_events.html">
    <title>How to Generate Recurring Events in the Database</title>
    <link>http://justatheory.com/computers/databases/postgresql/recurring_events.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2008-01-30T20:04-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>This is a followup to
my <a href="/computers/databases/postgresql/reducing_view_calculations.html"
title="Need Help Reducing View Calculations">request for help</a> fixing the
performance of a database view that generated recurrences for events. This was
an essential feature of <a href="http://iwantsandy.com" title="Sandy — your
free personal assistant">Sandy</a>, and thus important to get right. The idea
when I started was simple:</p>

<ul>
  <li>Add a <code>recurrence</code> domain to the database that supports a
  number of different values, including <q>daily</q>, <q>weekly</q>,
  and <q>monthly</q>.</li>
  <li>Add a <code>recurrence</code> column to the <code>events</code> table
  that identify how an event recurs.</li>
  <li>Add a <code>recurrence_dates</code> table that contains a pre-generated
  list of recurrences for a given date. I&#x2019;d populated this table with five
  years of dates, each one mapped to five years worth of recurrence dates (see
  the original<a
  href="/computers/databases/postgresql/reducing_view_calculations.html"
  title="Need Help Reducing View Calculations">blog entry</a> for more on the
  format of this table.</li>
  <li>Create a view that maps each <code>events</code> row to its date and
  recurrence in the <code>recurrence_dates</code> table.</li>
  <li>Profit.</li>
</ul>

<p>It was this last bullet point that didn&#x2019;t quite work out: although the data
was perfectly accurate, queries for a lot of events in the view were
<em>very</em> expensive. I mean, the query could run for 3-4 minutes. It was
just crazy! I couldn&#x2019;t figure out the problem, so I posted my <a
href="/computers/databases/postgresql/reducing_view_calculations.html"
title="Need Help Reducing View Calculations">request for help</a>. It was
through discussions that followed with <a href="http://www.depesz.com/"
title="&lt;/depesz&gt; blog">depesz</a> that I finally figured out what the
problem was: Although I was usually selecting only a week&#x2019;s or months worth of
events, the view was calculating rows for all five years worth of data for all
of the events for a given user. Um, <em>not efficient.</em></p>

<p>So here I finally document how, with a lot of help and example code from
depesz, I solved the problem. The trick was to use a function instead of a
view to generate the recurring event rows, and to limit it only to the dates
we&#x2019;re interested in. For convenience sake, I broke this down into two PL/pgSQL
functions: one to generate recurring dates and one to return the recurring
event rows. But first, here&#x2019;s the <code>recurrence</code> domain and the
<code>events</code> table, both of which are unchanged from the original
approach:</p>

<pre>
CREATE DOMAIN recurrence AS TEXT
CHECK ( VALUE IN ( &#x0027;none&#x0027;, &#x0027;daily&#x0027;, &#x0027;weekly&#x0027;, &#x0027;monthly&#x0027; ) );

CREATE TABLE events (
    id         SERIAL     PRIMARY KEY,
    user_id    INTEGER    NOT NULL,
    starts_at  TIMESTAMP  NOT NULL,
    start_tz   TEXT       NOT NULL,
    ends_at    TIMESTAMP,
    end_tz     TEXT       NOT NULL,
    recurrence RECURRENCE NOT NULL DEFAULT &#x0027;none&#x0027;
);
</pre>

<p>Just assume the <code>user_id</code> is a foreign key. Now let&#x2019;s populate
this table with some data. For the purposes of this demonstration, I&#x2019;m going
to create one event per day for 1000 days, evenly divided between daily,
weekly, monthly, and no recurrences, as well as five different times of day
and six different durations:</p>

<pre>
INSERT INTO events (user_id, starts_at, start_tz, ends_at, end_tz, recurrence)
SELECT 1,
       ts::timestamp,
       &#x0027;PST8PDT&#x0027;,
       ts::timestamp + dur::interval,
       &#x0027;PST8PDT&#x0027;,
       recur
  FROM (
    SELECT &#x0027;2007-12-19&#x0027;::date + i || &#x0027; &#x0027; || CASE i % 5
               WHEN 0 THEN &#x0027;06:00&#x0027;
               WHEN 1 THEN &#x0027;10:00&#x0027;
               WHEN 2 THEN &#x0027;14:00&#x0027;
               WHEN 3 THEN &#x0027;18:00&#x0027;
               ELSE        &#x0027;22:30&#x0027;
               END,
           CASE i % 6
               WHEN 0 THEN &#x0027;2 hours&#x0027;
               WHEN 1 THEN &#x0027;1 hour&#x0027;
               WHEN 2 THEN &#x0027;45 minutes&#x0027;
               WHEN 3 THEN &#x0027;3.5 hours&#x0027;
               WHEN 4 THEN &#x0027;15 minutes&#x0027;
               ELSE        &#x0027;30 minutes&#x0027;
               END,
           CASE i % 4
               WHEN 0 THEN &#x0027;daily&#x0027;
               WHEN 1 THEN &#x0027;weekly&#x0027;
               WHEN 2 THEN &#x0027;monthly&#x0027;
               ELSE        &#x0027;none&#x0027;
               END
    FROM generate_series(1, 1000) as gen(i)
  ) AS ser( ts, dur, recur);
</pre>

<p>This gives us some nicely distributed data:</p>

<pre>
try=# select * from events limit 10;
  id  | user_id |      starts_at      | start_tz |       ends_at       | end_tz  | recurrence 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
    1 |       1 | 2007-12-20 10:00:00 | PST8PDT  | 2007-12-20 11:00:00 | PST8PDT | weekly
    2 |       1 | 2007-12-21 14:00:00 | PST8PDT  | 2007-12-21 14:45:00 | PST8PDT | monthly
    3 |       1 | 2007-12-22 18:00:00 | PST8PDT  | 2007-12-22 21:30:00 | PST8PDT | none
    4 |       1 | 2007-12-23 22:30:00 | PST8PDT  | 2007-12-23 22:45:00 | PST8PDT | daily
    5 |       1 | 2007-12-24 06:00:00 | PST8PDT  | 2007-12-24 06:30:00 | PST8PDT | weekly
    6 |       1 | 2007-12-25 10:00:00 | PST8PDT  | 2007-12-25 12:00:00 | PST8PDT | monthly
    7 |       1 | 2007-12-26 14:00:00 | PST8PDT  | 2007-12-26 15:00:00 | PST8PDT | none
    8 |       1 | 2007-12-27 18:00:00 | PST8PDT  | 2007-12-27 18:45:00 | PST8PDT | daily
    9 |       1 | 2007-12-28 22:30:00 | PST8PDT  | 2007-12-29 02:00:00 | PST8PDT | weekly
   10 |       1 | 2007-12-29 06:00:00 | PST8PDT  | 2007-12-29 06:15:00 | PST8PDT | monthly
(10 rows)
</pre>

<p>Now let&#x2019;s get to the recurring date function:</p>

<pre>
CREATE OR REPLACE FUNCTION  generate_recurrences(
    recurs RECURRENCE, 
    start_date DATE,
    end_date DATE
)
    RETURNS setof DATE
    LANGUAGE plpgsql IMMUTABLE
    AS $BODY$
DECLARE
    next_date DATE := start_date;
    duration  INTERVAL;
    day       INTERVAL;
    check     TEXT;
BEGIN
    IF recurs = &#x0027;none&#x0027; THEN
        &#x002d;&#x002d; Only one date ever.
        RETURN next next_date;
    ELSIF recurs = &#x0027;weekly&#x0027; THEN
        duration := &#x0027;7 days&#x0027;::interval;
        WHILE next_date &lt;= end_date LOOP
            RETURN NEXT next_date;
            next_date := next_date + duration;
        END LOOP;
    ELSIF recurs = &#x0027;daily&#x0027; THEN
        duration := &#x0027;1 day&#x0027;::interval;
        WHILE next_date &lt;= end_date LOOP
            RETURN NEXT next_date;
            next_date := next_date + duration;
        END LOOP;
    ELSIF recurs = &#x0027;monthly&#x0027; THEN
        duration := &#x0027;27 days&#x0027;::interval;
        day      := &#x0027;1 day&#x0027;::interval;
        check    := to_char(start_date, &#x0027;DD&#x0027;);
        WHILE next_date &lt;= end_date LOOP
            RETURN NEXT next_date;
            next_date := next_date + duration;
            WHILE to_char(next_date, &#x0027;DD&#x0027;) &lt;&gt; check LOOP
                next_date := next_date + day;
            END LOOP;
        END LOOP;
    ELSE
        &#x002d;&#x002d; Someone needs to update this function, methinks.
        RAISE EXCEPTION &#x0027;Recurrence % not supported by generate_recurrences()&#x0027;, recurs;
    END IF;
END;
$BODY$;
</pre>

<p>The idea here is pretty simple: pass in a recurrence (<q>daily</q>,
<q>weekly</q>, or <q>monthly</q>), a start date, and an end date, and get
back a set of all the recurrence dates between the start and end dates:</p>

<pre>
try=# \timing
Timing is on.
try=# select * from  generate_recurrences(&#x0027;daily&#x0027;, &#x0027;2008&#x002d;01&#x002d;29&#x0027;, &#x0027;2008&#x002d;02&#x002d;05&#x0027;);
 generate_recurrences 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
 2008&#x002d;01&#x002d;29
 2008&#x002d;01&#x002d;30
 2008&#x002d;01&#x002d;31
 2008&#x002d;02&#x002d;01
 2008&#x002d;02&#x002d;02
 2008&#x002d;02&#x002d;03
 2008&#x002d;02&#x002d;04
 2008&#x002d;02&#x002d;05
(8 rows)

Time: 0.548 ms
try=# select * from  generate_recurrences(&#x0027;weekly&#x0027;, &#x0027;2008&#x002d;01&#x002d;29&#x0027;, &#x0027;2008&#x002d;03&#x002d;05&#x0027;);
 generate_recurrences 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
 2008&#x002d;01&#x002d;29
 2008&#x002d;02&#x002d;05
 2008&#x002d;02&#x002d;12
 2008&#x002d;02&#x002d;19
 2008&#x002d;02&#x002d;26
 2008&#x002d;03&#x002d;04
(6 rows)

Time: 0.670 ms
try=# select * from  generate_recurrences(&#x0027;monthly&#x0027;, &#x0027;2008&#x002d;01&#x002d;29&#x0027;, &#x0027;2008&#x002d;05&#x002d;05&#x0027;);
 generate_recurrences 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
 2008&#x002d;01&#x002d;29
 2008&#x002d;02&#x002d;29
 2008&#x002d;03&#x002d;29
 2008&#x002d;04&#x002d;29
(4 rows)

Time: 0.644 ms
</pre>

<p>Not bad, eh? And PostgreSQL&#x2019;s date and interval calculation operators are
<a
href="http://www.depesz.com/index.php/2007/12/27/how-many-1sts-of-any-month-were-sundays-since-1901-01-01/"
title="&lt;depesz/&gt;: how many 1sts of any month were sundays - since
1901-01-01?"><em>wicked fast</em></a>. Check out how long it dates to generate
two years worth of daily recurrence dates:</p>

<pre>
try=# select * from  generate_recurrences(&#x0027;daily&#x0027;, &#x0027;2008-01-29&#x0027;, &#x0027;2010-02-05&#x0027;);
 generate_recurrences 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
 2008-01-29
 2008-01-30
 2008-01-31
...
 2010-02-03
 2010-02-04
 2010-02-05
(739 rows)

Time: 4.982 ms
</pre>

<p>Awesome. And the great thing about this function is that any time I need to
add new recurrences (yearly, biweekly, quarterly, weekends, weekdays, etc.), I
just modify the domain and this function and we&#x2019;re ready to go.</p>

<p>And now, part two: the recurring event function:</p>

<pre>
CREATE OR REPLACE FUNCTION recurring_events_for(
   for_user_id INTEGER,
   range_start TIMESTAMP,
   range_end   TIMESTAMP
)
   RETURNS SETOF events
   LANGUAGE plpgsql STABLE
   AS $BODY$
DECLARE
   event events;
   start_date TIMESTAMPTZ;
   start_time TEXT;
   ends_at    TIMESTAMPTZ;
   next_date  DATE;
   recurs_at  TIMESTAMPTZ;
BEGIN
   FOR event IN 
       SELECT *
         FROM events
        WHERE user_id = for_user_id
          AND (
                  recurrence &lt;&gt; &#x0027;none&#x0027;
              OR  (
                     recurrence = &#x0027;none&#x0027;
                 AND starts_at BETWEEN range_start AND range_end
              )
          )
    LOOP
       IF event.recurrence = &#x0027;none&#x0027; THEN
         RETURN NEXT event;
         CONTINUE;
       END IF;

       start_date := event.starts_at::timestamptz AT TIME ZONE event.start_tz;
       start_time := start_date::time::text;
       ends_at    := event.ends_at::timestamptz AT TIME ZONE event.end_tz;

       FOR next_date IN
           SELECT *
             FROM generate_recurrences(
                      event.recurrence,
                      start_date::date,
                      (range_end AT TIME ZONE event.start_tz)::date
             )
       LOOP
           recurs_at := (next_date || &#x0027; &#x0027; || start_time)::timestamp
               AT TIME ZONE event.start_tz;
           EXIT WHEN recurs_at &gt; range_end;
           CONTINUE WHEN recurs_at &lt; range_start AND ends_at &lt; range_start;
           event.starts_at := recurs_at;
           event.ends_at   := ends_at;
           RETURN NEXT event;
       END LOOP;
   END LOOP;
   RETURN;
END;
$BODY$;
</pre>

<p>The idea here is to select the appropriate events for a given user between
two dates, and for each event iterate over all of the recurrences between the
two dates and return a row for each one. So the lines starting with <code>FOR
event IN</code> and ending with <code>LOOP</code> select the original events,
looking for either recurring events or non-recurring events that are between
the two dates. Note that if you needed to, you could easily refine this query
for your particular application, or even use
PL/pgSQL&#x2019;s <a href="http://www.postgresql.org/docs/8.3/static/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING"
title="PL/pgSQL: Looping Through Query Results"><code>EXECUTE</code></a>
operator to dynamically generate queries to suit particular application
needs.</p>

<p>Next, the block starting with <code>IF event.recurrence =
&#x0027;none&#x0027; THEN</code> simply returns any non-recurring events.
Although the next block already handles this case, adding this optimization
eliminates a fair bit of calculation for the common case of non-recurring
events.</p>

<p>Then the lines starting with <code>FOR next_date IN</code> and ending with
<code>LOOP</code> select all of the dates for the recurrence in question,
using the <code>generate_recurrences()</code> function created earlier. From
<code>LOOP</code> to <code>END LOOP;</code>, the function generates the start
and end timestamps, exiting the loop when the start date falls after the range
or when it falls before the range and the end date falls after the range.
There are many other tweaks one could make here to modify which recurrences
are included and which are excluded. For example, if you had a column in the
<code>events</code> table such as <code>exclude_dates TIMESTAMP[] NOT NULL
DEFAULT &#x2018;{}&#x2019;</code> that stored an array of dates to ignore when generating
recurrences, you could just add this line to go ahead and exclude them from
the results returned by the function:</p>

<pre>
           CONTINUE WHEN recurs_at = ANY( exclude_dates );
</pre>

<p>But enough of the details: let&#x2019;s see how it works! Here&#x2019;s a query for a
week&#x2019;s worth of data:</p>

<pre>
try=# select * from recurring_events_for(1, &#x0027;2007-12-19&#x0027;, &#x0027;2007-12-26&#x0027;);
  id  | user_id |      starts_at      | start_tz |       ends_at       | end_tz  | recurrence 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
    1 |       1 | 2007-12-20 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    2 |       1 | 2007-12-21 14:00:00 | PST8PDT  | 2007-12-21 06:45:00 | PST8PDT | monthly
    3 |       1 | 2007-12-22 18:00:00 | PST8PDT  | 2007-12-22 21:30:00 | PST8PDT | none
    4 |       1 | 2007-12-23 22:30:00 | PST8PDT  | 2007-12-23 14:45:00 | PST8PDT | daily
    4 |       1 | 2007-12-24 22:30:00 | PST8PDT  | 2007-12-23 14:45:00 | PST8PDT | daily
    4 |       1 | 2007-12-25 22:30:00 | PST8PDT  | 2007-12-23 14:45:00 | PST8PDT | daily
    5 |       1 | 2007-12-24 06:00:00 | PST8PDT  | 2007-12-23 22:30:00 | PST8PDT | weekly
    6 |       1 | 2007-12-25 10:00:00 | PST8PDT  | 2007-12-25 04:00:00 | PST8PDT | monthly
(8 rows)

Time: 51.890 ms
</pre>

<p>Note the time it took to execute this query. 52 ms is a <em>hell</em> of a
lot faster than the several minutes it took to run a similar query using the
old view. Plus, I&#x2019;m not limited to just the recurrence dates I&#x2019;ve
pre-calculated in the old <code>recurrence_dates</code> table. Now we can use
whatever dates are supported by PostgreSQL. It&#x2019;s even fast when we look at a
year&#x2019;s worth of data:</p>

<pre>
try=# select * from recurring_events_for(1, &#x0027;2007-12-19&#x0027;, &#x0027;2008-12-19&#x0027;);
  id  | user_id |      starts_at      | start_tz |       ends_at       | end_tz  | recurrence 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
    1 |       1 | 2007-12-20 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2007-12-27 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-03 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-10 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-17 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-24 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-31 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
...
  364 |       1 | 2008-12-17 22:30:00 | PST8PDT  | 2008-12-17 14:45:00 | PST8PDT | daily
  364 |       1 | 2008-12-18 22:30:00 | PST8PDT  | 2008-12-17 14:45:00 | PST8PDT | daily
  365 |       1 | 2008-12-18 06:00:00 | PST8PDT  | 2008-12-17 22:30:00 | PST8PDT | weekly
(19691 rows)

Time: 837.759 ms
</pre>

<p>Not stellar, but still respectable. Given that for a typical application, a
user will be looking at only a day&#x2019;s or a week&#x2019;s or a month&#x2019;s events at a time,
this seems to be an acceptable trade-off. I mean, how often will your users
need to see a list of 20,000 events? And even if a user <em>was</em>looking at
a year&#x2019;s worth of data, it&#x2019;s unlikely that 75% of them would be recurring as
in the example data here.</p>

<p>I was fucking <em>pumped</em> with this solution, and Sandy has hummed
along nicely since we put it into production. If you&#x2019;re interested in trying
it for yourself, I&#x2019;ve you can get all the SQL from this blog entry <a
href="/code/recurring_events.sql" title="Download the code for this blog
entry">here</a>.</p>

<p>The only thing I would like to have been able to do differently was to
encapsulate the <code>recurring_events_for()</code> function in a view. Such
would have made it much easier to actually <em>use</em> this solution in
Rails. If you know how to do that, please do leave a comment. As for how I
hacked Rails to use the function, well, that&#x2019;s a blog post for another
day.</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/timezone_validation.html">
    <title>Validating Time Zones in PostgreSQL</title>
    <link>http://justatheory.com/computers/databases/postgresql/timezone_validation.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2007-11-07T21:03-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>I recently needed to validate that a value stored in
a <code>TEXT</code>column was a valid time zone identifier. Why? Because I was
using its value inside the database to
<a href="/computers/databases/postgresql/reducing_view_calculations.html"
title="Need Help Reducing View Calculations">convert timestamp columns from
UTC to a valid zone</a>. So I set about writing a function I could use in a
constraint.</p>

<p>It turns out that PostgreSQL has a pretty nice view that lists all of the
time zones that it recognizes. It&#x2019;s called <code>pg_timezone_names</code>:</p>

<pre>
try=# select * from pg_timezone_names limit 5;
        name        | abbrev | utc_offset | is_dst 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
 Africa/Abidjan     | GMT    | 00:00:00   | f
 Africa/Accra       | GMT    | 00:00:00   | f
 Africa/Addis_Ababa | EAT    | 03:00:00   | f
 Africa/Algiers     | CET    | 01:00:00   | f
 Africa/Asmara      | EAT    | 03:00:00   | f
(5 rows)
</pre>

<p>Cool. So all I had to do was to look up the value in this view. My first
stab at creating a time zone validation function therefore looked like
this:</p>

<pre>
CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
DECLARE
  bool BOOLEAN;
BEGIN
  SELECT TRUE INTO bool
    FROM pg_timezone_names
   WHERE LOWER(name) = LOWER(tz)
      OR LOWER(abbrev) = LOWER(tz);
  RETURN FOUND;
END;
$$ language plpgsql STABLE;
</pre>

<p>This should pretty well cover anything that PostgreSQL considers valid. So
does it work? You bet:</p>

<pre>
sandy_development=# \timing
Timing is on.
sandy_development=# select is_timezone(&#x2019;America/Los_Angeles&#x2019;);
 is_timezone 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
 t
(1 row)

Time: 457.096 ms
sandy_development=# select is_timezone(&#x2019;Foo/Bar&#x2019;);
 is_timezone 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
 f
(1 row)

Time: 472.752 ms
</pre>

<p>Perfect! Well, except for just one thing: performance is abysmal. A half
second per shot? Not very useful for constraints. And since
<code>pg_timezone_names</code> is a view (and, under that, a function), I
can&#x2019;t create indexes.</p>

<p>But then I did something dangerous: I started thinking. I realized that I
needed this function when our app started getting errors like this:</p>

<pre>
try=# select now() at time zone &#x2018;Foo/Bar&#x2019;;
ERROR:  time zone "Foo/Bar" not recognized
</pre>

<p>So the underlying C code throws an error when a time zone is invalid. What
if I could just trap the error? Well, PL/pgSQL conveniently has exception
handling, so I could do just that. But there was only one problem. PL/pgSQL&#x2019;s
exception handling syntax requires that you specify an error condition. Here&#x2019;s
what the documentation has:</p>

<pre>
EXCEPTION
    WHEN condition [ OR condition ... ] THEN
        handler_statements
    [ WHEN condition [ OR condition ... ] THEN
          handler_statements
      ... ]
END;
</pre>

<p>Conditions are
<a href="http://www.postgresql.org/docs/current/static/errcodes-appendix.html"
title="PostgreSQL Documentation: Appendix A. PostgreSQL Error Codes">error
codes</a>. But which one corresponds to the invalid time zone error? I tried a
few, but couldn&#x2019;t figure out which one. (Anyone know now to map errors you see
in <code>psql</code> to the error codes listed in Appendex A? Let me know!)
But really, my function just needed to do one thing. Couldn&#x2019;t I just trap any
old error?</p>

<p>A careful re-read of the PL/pgSQL documentation reveals that, yes, you can.
Use the condition <q>OTHERS,</q> and you can catch almost anything. With this
information in hand, I quickly wrote:</p>

<pre>
CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
DECLARE
  date TIMESTAMPTZ;
BEGIN
  date := now() AT TIME ZONE tz;
  RETURN TRUE;
EXCEPTION WHEN OTHERS THEN
  RETURN FALSE;
END;
$$ language plpgsql STABLE;
</pre>

<p>And how well does this one work?</p>

<pre>
sandy_development=# select is_timezone(&#x2019;America/Los_Angeles&#x2019;);
 is_timezone 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
 t
(1 row)

Time: 3.009 ms
sandy_development=# select is_timezone(&#x2019;Foo/Bar&#x2019;);
 is_timezone 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
 f
(1 row)

Time: 1.224 ms
</pre>

<p>Yes, I&#x2019;ll take 1-3 ms over 400-500 ms any day! I might even
<a href="http://www.postgresql.org/docs/current/static/sql-createdomain.html"
title="PostgreSQL Documentation: CREATE DOMAIN">create a domain</a> for this
and be done with it:</p>

<pre>
CREATE DOMAIN timezone AS TEXT
CHECK ( is_timezone( value ) );
</pre>

<p>Enjoy!</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/reducing_view_calculations.html">
    <title>Need Help Reducing View Calculations</title>
    <link>http://justatheory.com/computers/databases/postgresql/reducing_view_calculations.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2007-11-07T05:22-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>I could use some advice and suggestions for how to solve a performance
problem due to the highly redundant calculation of values in a view. Sorry for
the longish explanation. I wanted to make sure that I omitted no details in
desribing the problem.</p>

<p>In order to support recurring events in an application I&#x2019;m working on, we
have a lookup table that maps dates to their daily, weekly, monthly, and
yearly recurrences. It looks something like this:</p>

<pre>
try=# \d recurrence_dates
   Table &quot;public.recurrence_dates&quot;
   Column   |    Type    | Modifiers 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
 date       | date       | not null
 recurrence | recurrence | not null
 next_date  | date       | not null
Indexes:
    &quot;recurrence_dates_pkey&quot; PRIMARY KEY, btree (date, recurrence, next_date)
    &quot;index_recurrence_dates_on_date_and_recurrence&quot; btree (date, recurrence)

try=# select * from recurrence_dates
try-# where date = &#x0027;2007-11-04&#x0027;
try-# order by recurrence, next_date;
    date    | recurrence | next_date  
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
 2007-11-04 | daily      | 2007-11-04
 2007-11-04 | daily      | 2007-11-05
 2007-11-04 | daily      | 2007-11-06
 2007-11-04 | weekly     | 2007-11-04
 2007-11-04 | weekly     | 2007-11-11
 2007-11-04 | weekly     | 2007-11-18
 2007-11-04 | monthly    | 2007-11-04
 2007-11-04 | monthly    | 2007-12-04
 2007-11-04 | monthly    | 2008-01-04
 2007-11-04 | annually   | 2007-11-04
 2007-11-04 | annually   | 2008-11-04
 2007-11-04 | annually   | 2009-11-04
 2007-11-04 | none       | 2007-11-04
</pre>

<p>To get all of the permutations of recurring events, we simply select from a
view rather than from the <code>events</code> table that contains the actual
event data. The view joins <code>events</code>
to <code>recurrence_dates</code> table like so:</p>

<pre>
CREATE OR REPLACE VIEW recurring_events AS
SELECT id, name, user_id, duration,
       (rd.next_date || &#x0027; &#x0027; ||
       (starts_at::timestamptz at time zone start_tz)::time)::timestamp
       at time zone start_tz AS starts_at,
       start_tz
  FROM events LEFT JOIN recurrence_dates rd
    ON (events.starts_at::timestamptz at time zone events.start_tz)::date = rd.date
   AND events.recurrence = rd.recurrence;
</pre>

<p>Then, to get all of the recurrences of events for a user within a week, we
do something like this in the client code:</p>

<pre>
SELECT *
  FROM recurring_events
 WHERE user_id = 2
   AND starts_at BETWEEN &#x0027;2007-11-04 07:00:00&#x0027; AND &#x0027;2007-11-10 07:59:59&#x0027;;
</pre>

<p>This works perfectly, as all of our dates and times are stored in UTC
in <code>timestamp</code> columns. We pass UTC times for the appropriate
offset to the query (Pacific Time in this example) and, because the view does
the right thing in mapping the <code>starts_at</code> time for each event to
its proper time zone, we get all of the events within the date range, even if
they are recurrences of an earlier event, and with their times properly
set.</p>

<p>The trouble we&#x2019;re having, however, is all of those conversions. Until last
week, the view just kept everything in UTC and left it to the client to
convert to the proper zone in the <code>start_tz</code> column. But that
didn&#x2019;t work so well when an event&#x2019;s <code>starts_at</code> was during daylight
savings time and recurrences were in standard time: the standard time
recurrences were all an hour off! So I added the repeated instances
of <code>events.starts_at::timestamptz at time zone events.start_tz</code>.
But now the view is <em>really</em> slow.</p>

<p>Since the only thing that has changed is the addition of the time zone
conversions, I believe that the performance penalty is because of them. The
calculation executes multiple times per row: once for the join and once again
for the <code>starts_at</code> column. We can have an awful lot of events for
a given user, and an awful lot of recurrences of a given event. If, for
example, an event recurs daily for 2 years, there will be around 730 rows for
that one event. And the calculation has to be executed for every one of them
before the <code>WHERE</code> clause can be properly evaluated. Ouch! Worse
still, we actually have <em>three</em> columns that do this in our
application, not just one as in the example here.</p>

<p>So what I need is a way to execute that calculation just once for each
row in the <code>events</code> table, rather than once for each row in the
<code>recurring_events</code> view. I figure 1 calculation will be a heck of a
lot faster than 730! So the question is, how do I do this? How do I get the
view to execute the conversion of the <code>starts_at</code> to
the <code>start_tz</code> time zone only once for each row
in <code>events</code>, regardless of how many rows it ends up generating in
the <code>recurring_events</code> view?</p>

<p>Suggestions warmly welcomed. This is a bit of a tickler for me, and
since the query performance on these views is killing us, I need to get
this adjusted post haste!</p>

<p>Meanwhile, tomorrow I&#x2019;ll post a cool hack I came up with for validating
time zones in the database. Something to look forward to as you ponder my
little puzzle, eh?</p>

<p><strong>Update 2008-01-30:</strong> <em>Thanks to help from depesz, I came
figured out what the underling problem was and solved it much more elegantly
using PL/pgSQL. I&#x2019;ve now <a
href="/computers/databases/postgresql/recurring_events.html" title="How to
Generate Recurring Events in the Database">written up the basic recipe.
Enjoy!</em></p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/ruby_warm_standby.html">
    <title>PostgreSQL Warm Standby Using Ruby</title>
    <link>http://justatheory.com/computers/databases/postgresql/ruby_warm_standby.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2007-03-29T00:44-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>The new PostgreSQL <a href="http://www.postgresql.org/docs/8.2/static/warm-standby.html">Warm Standby</a> support is pretty nice. Since my app doesn&#x2019;t currently require  read access to a slave database, I&#x2019;ve dumped Slony-I (and all of the <a href="/computers/databases/postgresql/rails_and_slony.html">headache</a> that went with it), and now have a warm failover server being updated a least once per minute. W00t!</p>

<p>I used <a href="http://archives.postgresql.org/sydpug/2006-10/msg00001.php">Charles Duffy&#x2019;s</a> example, as well as the documentation, to build my warm standby configuration, but unfortunately, our server OS does not have the <code>usleep</code> utility, so rather than have 1 second sleeps, I ported Charles&#x2019;s shell script to Ruby. Here it is for your enjoyment:</p>

<pre>
#!/usr/bin/env ruby

DELAY         = 0.01
FAILOVER_FILE = &quot;/path/to/failover&quot;

@@triggered = false

require &#x2018;ftools&#x2019;

def move (from, to)
  # Do not overwrite! Throws an exception on failure, existing the script.
  File.copy( from, to ) unless @@triggered || File.exists?( to )
end

from, to = ARGV

# If PostgreSQL is asking for .history, just try to move it and exit.
if from =~ /\.history$/
  move from, to
  exit
end

# Sleep while waiting for the file.
while !File.exists?(from) &amp;&amp; !@@triggered
  sleep DELAY
  @@triggered = true if File.exists?( FAILOVER_FILE )
end

# Move the file.
move from, to
</pre>

<p>Just change the <code>DELAY</code> value to the number of seconds you want to sleep, and the <code>FAILOVER_FILE</code> value to the location of a file that will trigger a failover.</p>

<p>This is all well and good, but I ultimately ended up using the <code>pg_standby</code> utility that&#x2019;s a new contrib utility in PostgreSQL CVS (and will therefore ship with 8.3), as it has the nice feature of cleaning up old WAL log files. It also does not have subsecond precision, but hey, maybe we don&#x2019;t really need it.</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/rails_and_slony.html">
    <title>Rails Migrations with Slony?</title>
    <link>http://justatheory.com/computers/databases/postgresql/rails_and_slony.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2007-03-26T06:53-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>The new app I&#x2019;m developing is written in Ruby on Rails and runs on PostgreSQL. We&#x2019;re replicating our production database using <a href="http://slony.info/" title="Slony-I PostgreSQL Replication">Slony-I</a>, but we&#x2019;ve run into a bit of a snag: database schema updates must be run as plain SQL through a Slony script in order to ensure proper replication of the schema changes within a transaction, but Rails migrations run as Ruby code updating the database via the Rails database adapter.</p>

<p>So how do others handle Rails migrations with their Slony-I replication setups? How do you update the Slony-I configuration file for the changes? How do you synchronize changes to the master schema out to the slaves? Do you shut down your apps, shut down Slony-I, make the schema changes to both the master and the slaves, and then restart Slony-I and your apps?</p>

<p>For that matter, people running Slony for their Bricolage databases must have the same issue, because the Bricolage upgrade scripts are just Perl using the DBI, not SQL files. Can anyone shed a little light on this for me?</p>

<p>Oh, and one last question: Why is this such a PITA? Can&#x2019;t we have decent replication that replicates <em>everything</em>, including schema changes? <em>Please?</em></p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/gtin-0.01.html">
    <title>My First C: A GTIN Data Type for PostgreSQL</title>
    <link>http://justatheory.com/computers/databases/postgresql/gtin-0.01.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-09-22T18:53-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>After all of my recent experimentation creating <a href="/computers/databases/postgresql/plpgsql_upc_validation.html" title="Validating UPCs with PL/pgSQL">UPC</a>, <a href="/computers/databases/postgresql/ean_validation.html" title="Corrected PostgreSQL EAN Functions">EAN</a>, and <a href="/computers/programming/perl/stepped_series.html" title="Stepped Series of Numbers in Perl">GTIN</a> validation functions, I became interested in trying to create a GTIN PostgreSQL data type in C. The fact that I don&#x2019;t know C didn&#x2019;t stop me from learning enough to do some damage. And now I have a first implementation done. <a href="http://pgfoundry.org/frs/?group_id=1000229" title="Download the GTIN data type">Check it out!</a></p>

<p>So how did I do this? Well, chapter six of the <a href="http://www.amazon.com/exec/obidos/ASIN/0672327562/justatheory-20" title="&#x201c;PostgreSQL (2nd Edition)&#x201d; by Douglas and Douglas">Douglas Book</a> was a great help to get me started. I also learned what I could by reading the source code for the core and contributed PostgreSQL data types, as well as the EnumKit enumerated data type builder (<a href="http://developer.postgresql.org/~adunstan/" title="Andrew Dunstan at PostgreSQL.org">download from here</a>). And the denizens of the <code>#postgresql</code> channel on FreeNode were also extremely helpful. Thank you, guys!</p>

<p>I would be very grateful if the C hackers among you, and especially any PostgreSQL core hackers who happen to read my blog, would download the GTIN source code and have a look at it. This is the first C code I&#x2019;ve written, so it would not surprise me if there were some gotchas that I missed (memory leaks, anyone?). And yes, I know that the new ISN contributed data types in the forthcoming 8.2 is a far more featureful implementation of bar code data types; I learned about it after I had nearly finished this first release of GTIN. But I did want to learn some C and how to create PostgreSQL data types, and provide the code for others to learn from, as well. It may also end up as the basis for an article. Stay tuned</p>

<p>In the meantime, share and enjoy.</p>

<p><strong>Update:</strong> I forgot to mention that you can check out the source code from the <a href="https://svn.kineticode.com/gtin/trunk/" title="The GTIN Subversion trunk"> Kineticode Subversion</a> repository.</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/pgsql_batch_updates.html">
    <title>Batch Updates with PL/pgSQL</title>
    <link>http://justatheory.com/computers/databases/postgresql/pgsql_batch_updates.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-09-15T22:50-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>The third in my series of articles about PL/pgSQL, <q>Batch Updates with
PL/pgSQL</q> has been published on <a href="http://www.oreillynet.com/pub/a/databases/2006/09/07/plpgsql-batch-updates.html" title="Batch Updates with PL/pgSQL">The O&#x2019;Reilly Network</a>. Actually it was published last week, but I&#x2019;ve not been very attentive to my blog lately. Sorry about that. Anyway, it improves upon the code in the second article in the series, <q><a href="http://www.onlamp.com/pub/a/onlamp/2006/06/29/many-to-many-with-plpgsql.html" title="Managing Many-to-Many Relationships with PL/pgSQL">Managing Many-to-Many Relationships with PL/pgSQL</a>,</q> by modifying the updating functions to use PostgreSQL batch query syntax. This means that the number of database calls in a given call to a function are constant, no matter how many IDs are passed to it.</p>

<p>So <a href="http://www.oreillynet.com/pub/a/databases/2006/09/07/plpgsql-batch-updates.html" title="Batch Updates with PL/pgSQL">check it out</a>!</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/plpgsql_talk_slides.html">
    <title>PL/pgSQL Talk Slides Posted</title>
    <link>http://justatheory.com/computers/databases/postgresql/plpgsql_talk_slides.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-07-20T21:47-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>The talk that I <a href="/computers/databases/postgresql/plpgsql_talk.html"
title="PL/pgSQL Talk for Portland PostgreSQL Users">announced</a> a couple of
days ago went off very well, I think. Rich Shepard, a local PostgreSQL who was
at the meeting, later said on the mail list:</p>

<blockquote>
  <p>On the happy side, my thanks to Selena for organizing the group and
rounding up a half-dozen cats for the first meeting, and to David for the best
presentation at a computer users group meeting I&#x2019;ve seen in 20 years. The
topic was very interesting and the presentation itself highly professional and
polished. It&#x2019;s a standard to be met by future presenters for the benefit of us
all.</p>

  <p>—Rich Shepard</p>
</blockquote>

<p>Wow, praise doesn&#x2019;t get much higher than that. I&#x2019;m glad the group got a lot
out of the presentation. For the benefit of those who couldn&#x2019;t make it, I&#x2019;ve
<a href="/computers/databases/postgresql/learning_plpgsql.pdf"
title="“Learning PL/pgSQL” slides">posted the slides</a> for your enjoyment.
They might be a bit hard to follow at times without my commentary, but you
should be able to fill in the gaps by reading
<a href="http://www.oreillynet.com/pub/a/onlamp/2006/05/11/postgresql-plpgsql.html"
title="Writing PostgreSQL Functions with
PL/pgSQL">my</a> <a
href="http://www.oreillynet.com/pub/a/onlamp/2006/06/29/many-to-many-with-plpgsql.html"
title="Managing Many-to-Many Relationships with PL/pgSQL">articles</a> for
O&#x2019;Reilly.</p>

<p>So start having some fun with PL/pgSQL!</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/plpgsql_talk.html">
    <title>PL/pgSQL Talk for Portland PostgreSQL Users</title>
    <link>http://justatheory.com/computers/databases/postgresql/plpgsql_talk.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-07-18T20:39-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>Attention PostgreSQL users in the Portland metropolitan area and environs
(Salem, Eugene, Vancouver, Seattle)! I am honored to be giving the inaugural
talk to the newly-formed <a href="http://pugs.postgresql.org/pdx/">Portland
PostgreSQL Users Group</a> on Wednesday, 19 July 2006 at 19:00
at <a href="http://www.freegeek.org/">FreeGeek</a>. My talk will be an
introduction to PL/pgSQL. Come check it out and join the fun! Beer and
schmoozing to take place after the talk
at <a href="http://www.luckylab.com/">The Lucky Lab</a>.</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/many_to_many_plpgsql.html">
    <title>Managing Many-to-Many Relationships with PL/pgSQL</title>
    <link>http://justatheory.com/computers/databases/postgresql/many_to_many_plpgsql.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-06-30T04:02-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>The second in my series of articles on programming PostgreSQL PL/pgSQL,
<q>Managing Many-to-Many Relationships with PL/pgSQL</q>, has just been
published <a
href="http://www.onlamp.com/pub/a/onlamp/2006/06/29/many-to-many-with-plpgsql.html"
title="Read &#x201c;Managing Many-to-Many Relationships with
PL/pgSQL&#x201d;">onLamp.com</a>. The idea is to abstract out of the
application layer the management of the many-to-many relationships, moving it
into the database layer where execution is both safer and faster. And you can
learn more about PL/pgSQL along the way.</p>

<p>So what are you waiting
for? <a
href="http://www.onlamp.com/pub/a/onlamp/2006/06/29/many-to-many-with-plpgsql.html"
title="Ready &#x201c;Managing Many-to-Many Relationships with
PL/pgSQL&#x201d;">Check it out</a>!</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/benchmarking_functions.html">
    <title>Benchmarking PostgreSQL Functions</title>
    <link>http://justatheory.com/computers/databases/postgresql/benchmarking_functions.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-05-20T04:57-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p><strong>Update 2006-05-19:</strong> <em>I realized that there was a nasty error in my
algorithm for determining the runtime of a function: It was only fetching the
milliseconds part of the runtime, without adding in seconds and minutes! This
led to getting negative runtimes then the milliseconds part of the end time was
less than the millseconds part of the start time. Ugh. But with the help of
<code>yain</code> on IRC, I&#x2019;ve switched to calculating the number of seconds by
converting the start and end times to epoch seconds (which have subsecond
precision in PostgreSQL, and now things are just dandy. While I was at it, I
reorganized the function so that it was a bit easier to read, by constructing
the created function in the order it would be executed, and fixed the caching
problem, as suggested by Aidan in a comment below.</em></p>

<p>Following <a
href="/computers/databases/postgresql/benchmarking_upc_validation.html"
title="Benchmarking UPC Validation">yesterday&#x2019;s post</a>, Klint Gore sent
me some PL/pgSQL code that might be useable as a benchmark function. Today
I took that code and ran with it.</p>

<p>The idea was to create a function like the
Perl <a href="http://search.cpan.org/dist/perl/lib/Benchmark.pm">Benchmark</a>
module&#x2019;s <code>timethese()</code> function. In the process, I found, with help
from <a href="http://blogs.ittoolbox.com/database/soup/"
title="&#x201c;Database Soup&#x201d; by Josh Berkus">Josh Berkus</a>, that
PL/pgSQL&#x2019;s <code>EXECUTE</code> statement has quite a lot of overhead, and the
amount of overhead per call is pretty random. The overhead resulted in pretty
inaccurate benchmark numbers, unfortunately.</p>

<p>At Josh&#x2019;s suggestion, I rewrote the function to just test each function
inline, rather than passing the function code as parameters. This time, the
results were dead on. So then I refactored the original benchmark function to
create its <em>own</em> benchmark function, inlining all of the code, and then
call that function. Almost higher order PL/pgSQL! Again the results were just
right, and so now I present it to you:</p>

<pre>
create type _benchmark as (
    code      text,
    runtime   real,
    corrected real
);

CREATE OR REPLACE FUNCTION benchmark(n INTEGER, funcs TEXT[])
RETURNS SETOF _benchmark AS $$
DECLARE
    code TEXT := &#x0027;&#x0027;;
    a    _benchmark;
BEGIN
    &#x002d;&#x002d; Start building the custom benchmarking function.
    code := $_$
        CREATE OR REPLACE FUNCTION _bench(n INTEGER)
        RETURNS SETOF _benchmark AS $__$
        DECLARE
            s TIMESTAMP;
            e TIMESTAMP;
            a RECORD;
            d numeric;
            res numeric;
            ret _benchmark;
        BEGIN
            &#x002d;&#x002d; Create control.
            s := timeofday();
            FOR a IN SELECT TRUE FROM generate_series( 1, $_$ || n || $_$ )
            LOOP
            END LOOP;
            e := timeofday();
            d := extract(epoch from e) - extract(epoch from s);
            ret := ROW( &#x0027;[Control]&#x0027;, d, 0 );
            RETURN NEXT ret;
 
$_$;
    &#x002d;&#x002d; Append the code to bench each function call.
    FOR i IN array_lower(funcs,1) .. array_upper(funcs, 1) LOOP
        code := code || &#x0027;
            s := timeofday();
            FOR a IN SELECT &#x0027; || funcs[i] || &#x0027; FROM generate_series( 1, &#x0027;
                || n || $__$ ) LOOP
            END LOOP;
            e := timeofday();
            res := extract(epoch from e) - extract(epoch from s);
            ret := ROW(
                $__$ || quote_literal(funcs[i]) || $__$,
                res, 
                res - d
            );
            RETURN NEXT ret;
$__$;
    END LOOP;

    &#x002d;&#x002d; Create the function.
    execute code || $_$
        END;
        $__$ language plpgsql;
$_$; 

    &#x002d;&#x002d; Now execute the function.
    FOR a IN EXECUTE &#x0027;SELECT * FROM _bench(&#x0027; || n || &#x0027;)&#x0027; LOOP
        RETURN NEXT a;
    END LOOP;

    &#x002d;&#x002d; Drop the function.
    DROP FUNCTION _bench(integer);
    RETURN;
END;
$$ language &#x0027;plpgsql&#x0027;;
</pre>

<p>You call the function like this:</p>

<pre>
try=# select * from benchmark(10000, ARRAY[
try(#     &#x0027;ean_substr(&#x0027;&#x0027;036000291452&#x0027;&#x0027;)&#x0027;,
try(#     &#x0027;ean_byte(&#x0027;&#x0027;036000291452&#x0027;&#x0027;)&#x0027;,
try(#     &#x0027;ean_c(&#x0027;&#x0027;036000291452&#x0027;&#x0027;)&#x0027;
try(# ]);
            code            | runtime   | corrected 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
 [Control]                  | 0.0237451 |          0
 ean_substr(&#x0027;036000291452&#x0027;) |  0.497734 |   0.473989
 ean_byte(  &#x0027;036000291452&#x0027;) |  0.394456 |   0.370711
 ean_c(     &#x0027;036000291452&#x0027;) | 0.0277281 | 0.00398302
(4 rows)
</pre>

<p>Pretty slick, eh? The only downside was that, when the <code>DROP
FUNCTION</code> line was not commented out, the function would run
once, and then, the next time, I&#x2019;d get this error:</p>

<pre>
ERROR:  cache lookup failed for function 17323
CONTEXT:  PL/pgSQL function &quot;benchmark&quot; line 49 at for over select rows
</pre>

<p>I have no idea why. So I just leave the function and let the
<code>CREATE OR REPLACE</code> take care of it.</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://justatheory.com/computers/databases/postgresql/ean_validation.html">
    <title>Corrected PostgreSQL EAN Functions</title>
    <link>http://justatheory.com/computers/databases/postgresql/ean_validation.html</link>
    <description></description>
    <dc:subject>/computers/databases/postgresql</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-05-20T04:55-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p><strong>Update:</strong><em> I updated the benchmarks based on the fixed version
of my <a
href="http://www.justatheory.com/computers/databases/postgresql/benchmarking_functions.html"
title="Benchmarking PostgreSQL Functions">benchmarking function</a>.</em></p>

<p>In doing a bit more reading about EAN codes, I realized that my <a
href="http://www.justatheory.com/computers/databases/postgresql/plpgsql_upc_validation.html"
title="Validating UPCs with PL/pgSQL">previous</a> <a
href="http://www.justatheory.com/computers/databases/postgresql/benchmarking_upc_validation.html"
title="Benchmarking UPC Validation">attempts</a> to write a validating function
for UPC and EAN codes had a significant error: they would only properly validate
EAN codes if the first numeral was 0! So I went back and fixed them all, and
present them here for posterity.</p>

<ul>
  <li>
    <p>The substring solution:</p>
    <pre>
CREATE OR REPLACE FUNCTION ean_substr (
    TEXT
) RETURNS boolean AS $$
DECLARE
    offset integer := 0;
    &#x002d;&#x002d; Support UPCs.
    ean   TEXT    := CASE WHEN length($1) = 12 THEN &#x0027;0&#x0027; || $1 ELSE $1 END;
BEGIN
    &#x002d;&#x002d; Make sure we really have an EAN.
    IF ean !~ &#x0027;^\\d{13}$&#x0027; THEN RETURN FALSE; END IF;

    RETURN 10 - (
        (
          &#x002d;&#x002d; Sum even numerals.
            substring(ean,  2 + offset, 1)::integer
          + substring(ean,  4 + offset, 1)::integer
          + substring(ean,  6 + offset, 1)::integer
          + substring(ean,  8 + offset, 1)::integer
          + substring(ean, 10 + offset, 1)::integer
          + substring(ean, 12 + offset, 1)::integer
         ) * 3 &#x002d;&#x002d; Multiply total by 3.
         &#x002d;&#x002d; Add odd numerals except for checksum (13).
         + substring(ean,  1 + offset, 1)::integer
         + substring(ean,  3 + offset, 1)::integer
         + substring(ean,  5 + offset, 1)::integer
         + substring(ean,  7 + offset, 1)::integer
         + substring(ean,  9 + offset, 1)::integer
         + substring(ean, 11 + offset, 1)::integer
    &#x002d;&#x002d; Compare to the checksum.
    ) % 10 = substring(ean, 13 + offset, 1)::integer;
END;
$$ LANGUAGE &#x0027;plpgsql&#x0027; immutable;
    </pre>
  </li>

  <li>
    <p>The looping solution:</p>
    <pre>
CREATE OR REPLACE FUNCTION ean_loop(
    TEXT
) RETURNS boolean AS $$
DECLARE
    total INTEGER := 0;
    &#x002d;&#x002d; Support UPCs.
    ean   TEXT    := CASE WHEN length($1) = 12 THEN &#x0027;0&#x0027; || $1 ELSE $1 END;
BEGIN
    &#x002d;&#x002d; Make sure we really have an EAN.
    IF ean !~ &#x0027;^\\d{13}$&#x0027; THEN RETURN FALSE; END IF;

    &#x002d;&#x002d; Sum even numerals.
    FOR i IN 2..12 LOOP
        total := total + substring(ean, i, 1)::INTEGER;
        i := i + 1;
    END LOOP;

    &#x002d;&#x002d; Multiply total by 3.
    total := total * 3;

    &#x002d;&#x002d; Add odd numerals except for checksum (13).
    FOR i IN 1..11 LOOP
        total := total + substring(ean, i, 1)::INTEGER;
        i := i + 1;
    END LOOP;

    &#x002d;&#x002d; Compare to the checksum.
    RETURN 10 - total % 10 = substring(ean, 13, 1)::INTEGER;
END;
$$ LANGUAGE &#x0027;plpgsql&#x0027; immutable;
    </pre>
  </li>

  <li>
    <p>The <code>BYTEA</code> solution:</p>
    <pre>
CREATE OR REPLACE FUNCTION ean_byte (
   arg TEXT
) RETURNS boolean AS $$
DECLARE
    &#x002d;&#x002d; Convert to BYTEA; support UPCs.
    ean BYTEA := CASE WHEN length($1) = 12 THEN &#x0027;0&#x0027; || $1 ELSE $1 END;
BEGIN
    &#x002d;&#x002d; Make sure we really have an EAN.
    IF arg !~ &#x0027;^\\d{12,13}$&#x0027; THEN RETURN FALSE; END IF;

    RETURN 10 - (
        (
            &#x002d;&#x002d; Sum odd numerals.
            get_byte(ean,  1) - 48
          + get_byte(ean,  3) - 48
          + get_byte(ean,  5) - 48
          + get_byte(ean,  7) - 48
          + get_byte(ean,  9) - 48
          + get_byte(ean, 11) - 48
         ) * 3 &#x002d;&#x002d; Multiply total by 3.
         &#x002d;&#x002d; Add even numerals except for checksum (12).
         + get_byte(ean,  0) - 48
         + get_byte(ean,  2) - 48
         + get_byte(ean,  4) - 48
         + get_byte(ean,  6) - 48
         + get_byte(ean,  8) - 48
         + get_byte(ean, 10) - 48
    &#x002d;&#x002d; Compare to the checksum.
    ) % 10 = get_byte(ean, 12) - 48;
    
END;
$$ LANGUAGE plpgsql immutable;
    </pre>
  </li>

  <li>
    <p>The PL/Perl solution:</p>
    <pre>
CREATE OR REPLACE FUNCTION ean_perl (
    TEXT
) RETURNS boolean AS $_$
    my $ean = length $_[0] == 12 ? &quot;0$_[0]&quot; : $_[0];
    # Make sure we really have an EAN.
    return &#x0027;false&#x0027; unless $ean =~ /^\d{13}$/;
    my @nums = split &#x0027;&#x0027;, $ean;
    return 10 - (
        # Sum even numerals.
        (   (   $nums[1] + $nums[3] + $nums[5] + $nums[7] + $nums[9]
                    + $nums[11]
            ) * 3 # Multiply total by 3.
        # Add odd numerals except for checksum (12).
        ) + $nums[0] + $nums[2] + $nums[4] + $nums[6] + $nums[8] + $nums[10]
    # Compare to the checksum.
    ) % 10 == $nums[12] ? &#x0027;true&#x0027; : &#x0027;false&#x0027;;
$_$ LANGUAGE plperl immutable;
    </pre>
  </li>

  <li>
    <p>The C solution (thanks StuckMojo!):</p>
    <pre>
#include &lt;string.h&gt;
#include &quot;postgres.h&quot;
#include &quot;fmgr.h&quot;

Datum ean_c(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1(ean_c);

Datum ean_c(PG_FUNCTION_ARGS) {

    char *ean;
    text *arg = PG_GETARG_TEXT_P(0);
    int  arglen = VARSIZE(arg) - VARHDRSZ;
    bool ret = false;

    /* Validate the easy stuff: 12 or 13 digits. */
    if ((arglen != 12 &amp;&amp; arglen != 13) || 
        strspn(VARDATA(arg), &quot;0123456789&quot;) != arglen) {
        PG_RETURN_BOOL(ret);
    }

    /* Support UPCs. */
    if (arglen == 12) {
        ean = (char *) palloc(13);
        ean[0] = &#x0027;0&#x0027;;
        memcpy(&amp;ean[1], VARDATA(arg), arglen);
    } else {
        ean = (char *) palloc(arglen);
        memcpy(ean, VARDATA(arg), arglen);
    }

    ret = 10 - (
            /* Sum even numerals and multiply total by 3. */
            (  ean[1] - &#x0027;0&#x0027; + ean[3] - &#x0027;0&#x0027; + ean[5]  - &#x0027;0&#x0027; 
             + ean[7] - &#x0027;0&#x0027; + ean[9] - &#x0027;0&#x0027; + ean[11] - &#x0027;0&#x0027;) * 3
            /* Add odd numerals except for checksum (12). */
            + ean[0] - &#x0027;0&#x0027; + ean[2] - &#x0027;0&#x0027; + ean[4]  - &#x0027;0&#x0027;
            + ean[6] - &#x0027;0&#x0027; + ean[8] - &#x0027;0&#x0027; + ean[10] - &#x0027;0&#x0027;
        /* Compare to the checksum. */
        ) % 10 == ean[12] - &#x0027;0&#x0027;;

   PG_RETURN_BOOL(ret);
}
    </pre>
  </li>
</ul>

<p>And here are the benchmarks for them (without <code>immutable</code>):</p>

<pre>
try=# select * from benchmark(100000, ARRAY[
try(#     &#x0027;ean_substr(&#x0027;&#x0027;4007630000116&#x0027;&#x0027;)&#x0027;,
try(#     &#x0027;ean_loop(  &#x0027;&#x0027;4007630000116&#x0027;&#x0027;)&#x0027;,
try(#     &#x0027;ean_byte(  &#x0027;&#x0027;4007630000116&#x0027;&#x0027;)&#x0027;,
try(#     &#x0027;ean_perl(  &#x0027;&#x0027;4007630000116&#x0027;&#x0027;)&#x0027;,
try(#     &#x0027;ean_c(     &#x0027;&#x0027;4007630000116&#x0027;&#x0027;)&#x0027;
try(# ]);
            code             | runtime  |    rate     | corrected | corrected_rate 
&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;+&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;&#x002d;
 [Control]                   | 0.257728 | 388006.17/s |  0.257728 | 388006.17/s
 ean_substr(&#x0027;4007630000116&#x0027;) |  5.07296 | 19712.37/s  |   4.81523 | 20767.44/s
 ean_loop(  &#x0027;4007630000116&#x0027;) |  9.18085 | 10892.24/s  |   8.92312 | 11206.84/s
 ean_byte(  &#x0027;4007630000116&#x0027;) |   3.9248 | 25479.02/s  |   3.66707 | 27269.73/s
 ean_perl(  &#x0027;4007630000116&#x0027;) |   5.5062 | 18161.33/s  |   5.24848 | 19053.15/s
 ean_c(     &#x0027;4007630000116&#x0027;) | 0.285376 | 350415.10/s |  0.027648 | 3616901.80/s
(6 rows)
</pre>

<p>Enjoy!</p>
]]></content:encoded>
  </item>

  <cc:License rdf:about="http://creativecommons.org/licenses/by-nc/2.0/">
    <cc:permits rdf:resource="http://web.resource.org/cc/Reproduction" />
    <cc:permits rdf:resource="http://web.resource.org/cc/Distribution" />
    <cc:requires rdf:resource="http://web.resource.org/cc/Notice" />
    <cc:requires rdf:resource="http://web.resource.org/cc/Attribution" />
    <cc:prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" />
    <cc:permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" />
  </cc:License>
</rdf:RDF>
