home :: computers

How to Globally Change a Subversion Username

I successfully migrated the Kineticode Subversion repository to a new server yesterday. Everything works great. But after my first commit, I realized that, while my username on the old server was theory, on the new server it’s david. Subversion works fine, of course, and I was able to start committing from old checkouts using the new username, but I realized that sites like Ohloh would pick up the two usernames as separate usernames. So I wanted to update all of the 3630 existing revisions that were mine to use the new username.

Unfortunately, I couldn’t find much on how to do this in a quick Googling. But I quickly figured out that what I need to do was to svnadmin dump my repository, modify the dump, and then load it again. The Subversion dump format has all these fields for tracking the content-lengths of various, so doing the update was a bit tricky. But I wrote the script here to track things, and it worked great for me. So here it is for others to reference and use.

#!/usr/bin/perl -w

use strict;
use warnings;

while (<>) {
    print;
    next unless /^Revision-number:\s+\d+$/;

    # Grab the content lengths. Examples:
    # Prop-content-length: 139
    # Content-length: 139
    my $plen_line = <>;
    my $clen_line = <>;

    unless ( $plen_line =~ /^Prop-content-length:\s+\d+$/ ) {
        # Nothing we want to change.
        print $plen_line, $clen_line;
        next;
    }

    my @lines;
    while ( <> ) {
        if ( /^PROPS-END$/ ) {
            # finish.
            print $plen_line, $clen_line, @lines, $_;
            last;
        }

        push @lines, $_;

        if ( /^svn:author$/ ) {
            # Grab the author content length. Example:
            # V 6
            my $alen_line = <>;

            # Grab the author name.
            my $auth = <>;

            if ( $auth =~ s/^theory$/david/ ) {
                # Adjust the content lengths.
                for my $line ( $plen_line, $clen_line, $alen_line ) {
                    $line =~ s/(\d+)$/$1 - 1/e;
                }
            }
            print $plen_line, $clen_line, @lines, $alen_line, $auth;
            last;
        }
    }
}

To use it, save it to a file, say svn_author, then change line 40 to your old and new usernames. Then, on line 43, change the $1 - 1 bit to be correct for the difference between the usernames you’re changing. For example, if you’re changing your username from, say, shane to chromatic, the new name is five characters longer, so you’d make it $1 + 5.

Now, run it like so:

svnadmin dump /path/to/svnroot > svndump.out
perl svn_author svndump.out > svndump.in
svnadmin create /path/to/new/svnroot
svnadmin load /path/to/new/svnroot < svndump.in

And that’s it! Feel free to take this code and do with it what you like, including fix any bugs, add command-line options, support changing multiple authors at once, or whatever. Share and enjoy.

SVN::Notify 2.70: Output Filtering and Character Encoding

I’m very pleased to announce the release of SVN::Notify 2.70. You can see an example of its colordiff output here. This is a major release that I’ve spent the last several weeks polishing and tweaking to get just right. There are quite a few changes, but the two most important are imporoved character encoding support and output filtering.

Improved Character Encoding Support

I’ve had a number of bug reports regarding issues with character encodings. Particularly for folks working in Europe and Asia, but really for anyone using multibyte characters in their source code and log messages (and we all do nowadays, don’t we?), it has been difficult to find the proper incantation to get SVN::Notify to convert data from and to their proper encodings. Using a patch from Toshikazu Kinkoh as a starting-point, and with a lot of reading and experimentation, as well as regular and patient tests on Toshikazu’s and Martin Lindhe’s production systems, I think I’ve finally got it nailed down.

Now you can use the --encoding (formerly --charset), --svn-encoding, and --diff-encoding options—as well as --language—to get SVN::Notify to do the right thing. As long as your Subversion server’s OS supports an appropriate locale, you should be golden (mine is old, with no UTF-8 locales :\). And if all else fails, you can still set the $LANG environment variable before executing svnnotify.

There is actually a fair bit to know about encodings to get it to work properly, but if you use UTF-8 throughout and your OS supports UTF-8 locales, you shouldn’t have to do anything. You might have to set --language in order to get it to use the proper locale. See the new documentation of the encoding support for all the details. And if you still have problems, please do let me know.

Output Filtering

Much sexier is the addition of output filtering in SVN::Notify 2.70. I got pretty tired of getting feature requests for what are essentially formatting modifications, such as this one requesting support for KDE-style keyword support. I myself was using Trac wiki syntax in commit messages on a recent project and wanted to see them converted to HTML for messages output by SVN::Notify::HTML::ColorDiff.

So I finally sat down and gave some though on how to implement a simple plugin architecture for SVN::Notify. When I realized that it was generally just formatting that people wanted, it became simpler: I just needed a way to allow folks to write simple output filters. The solution I came up with was to just use Perl. Output filters are simply subroutines named for the kind of output they filter. They live in perl packages. That’s it.

For example, say that your developers write their commit log messages in Textile, and rather than receive them stuck inside <pre> tags, you’d like them converted to HTML. It’s simple. Just put this code in a Perl module file:

package SVN::Notify::Filter::Textile;
use Text::Textile ();

sub log_message {
    my ($notifier, $lines) = @_;
    return $lines unless $notify->content_type eq 'text/html';
    return [ Text::Textile->new->process( join $/, @$lines ) ];
}

Put the file, SVN/Notify/Filter/Textile.pm somewhere in a Perl library directory. Then use the new --filter option to svnnotify to put it to work:

svnnotify -p "$1" -r "$2" --handler HTML::ColorDiff --filter Textile

Yep, that’s it! SVN::Notify will find the filter module, load it, register its filtering subroutine, and then call it at the appropriate time. Of course, there are a lot of things you can filter; consult the complete documentation for all of the details. But hopefully this gives you a flavor for how easy it is to write new filters for SVN::Notify. I’m hoping that all those folks who want featurs can now stop bugging me and writing their own filters to do the job, and uploading them to CPAN for all to share!

To get things started, I scratched my own itch, writing a Trac filter myself. The filter is almost as simple as the Textile example above, but I also spent quite a bit of time tweaking the CSS so that most of the Trac-generated HTML looks good. You can see an example right here. Thanks to a number of bug fixes in Text::Trac, as well as Trac-specific CSS added via a filter on CSS output, it works beautifully. If I’m feeling motivated in the next week or so, I’ll create a separate CPAN distribution with just a Markdown filter and upload it. That will create a nice distriution example for folks to copy to creat their own. Or maybe someone on the Lazy Web Will do it for me! Maybe you?

I wish I’d thought to do this from the beginning; it would have saved me from having to add so many features/cruft to SVN::Notify over the years. Here’s a quick list of the features that likely could have been implemented via filters instead of added to the core:

  • --user-domain: Combine the SVN username with a domain for the From header.
  • --add-header: Add a header to the message.
  • --reply-to: Add a specific header to the message.
  • SVN::Notify::HTML::ColorDiff: Frankly, looking back on it, I don’t know why I didn’t just put this support right into SVN::Notify::HTML. But even if I hadn’t, it could have been implemented via filters.
  • --subject-prefix:: Modify the message subject.
  • --subject-cx: Add the commit context to the subject.
  • --strip-cx-regex: More subject context modification.
  • --no-first-line: Another subject filter.
  • --max-sub-length: Yet another!
  • --max-diff-length: A filter could truncate the diff, although this might be tricky with the HTML formatting.
  • --author-url: Modify the metadata section to add a link to the author URL.
  • --revision-url: Ditto for the revision URL.
  • --ticket-map: Filter the log message for various ticketing system strings to convert to URLs. This also encompasses the old --rt-url, --bugzilla-url, --gnats-url, and --jira-url options.
  • --header: Filter the beginning of the message.
  • --footer: Filter the end of the message.
  • --linkize: Filter the log message to convert URLs to links for HTML messages.
  • --css-url: Filter the CSS to modify it, or filter the start of the HTML to add a link to an external CSS URL.
  • --wrap-log: Reformat the log message for HTML.

Yes, really! That’s about half the functionality right there. I’m glad that I won’t have to add any more like that; filters are a much better way to go.

So download it, install it, write some filters, get your multibyte characters output properly, and enjoy! And as usual, send me your bug reports, but implement your own improvements using filters!

Mac OS X CD-ROM File Systems WTF?

Didn’t it used to be the case that when you used the Mac OS X Finder to burn a CD-ROM that you could then mount that CD-ROM on a Windows box? In the last few months, I’m suddenly finding that this is no longer the case. So now I have to use hdiutil to convert a .dmg file to the Joliet and ISO9660 file systems:

hdiutil makehybrid -o image.iso -joliet -iso image.dmg

And then I could burn a CD readable on Windows. What the fuck? I burned three CDs that were then useless to me before I finally dug up this hint. And I had this problem with CDs burned by Tiger, too, last summer, so it’s not just Leopard. It seems to me that Mac OS X should always default to building a hybrid CD that’s then readable by Windows, Linux, and everything else. Why doesn’t it?

Need Suggestions for IMAP Solution and Migration

For the last several years, I’ve run a Courier-IMAP mail server for all of the mail for this site, Kineticode, Strongrrl and other domains. We mainly used Mail.app on Mac OS X to communicate with the server, and it worked really well. Today, Julie has over 3 GB of mail data, and I have around 1.5 GB, all managed via IMAP.

Recently, I decided it was time to move the mail elsewhere. I’ve been meaning to do it for a while, primarily because the server I was using is now used for the Bricolage project, and because I never set up any spam filtering. Julie was suddenly getting 100s of spam messages in her inbox. (It really didn’t help that she was still using Panther.) So on the advice of a good friend who had been evaluating various mail services—and who for now shall go nameless and therefor blameless—I moved all of our mail to FuseMail.

At first this seamed like a pretty good solution. Our spam rates went way down, I could set up unlimited mail lists, aliases, and forwards, and there was a migration tool that automated moving all of our existing mail from the old IMAP server to the new one. There were some glitches with the migration tool, but in the end all of our mail was moved and in tact.

But that’s when I started to notice the issues. To summarize:

  • Mail put into the Sent Items folder by Mail.app was marked as unread. This didn’t happen on the old server, and apparently has something to so with how FuseMail names the sent folder: Sent Items rather than Sent Messages.
  • Mail.app is syncing constantly. Even once it had successfully synced the all of our email in all of our IMAP folders (which took days, it is syncing all the time, to the extent that I am sometimes waiting for up to a minute to read a mail when I double-click it, because there are all these other threads doing stuff and taking up all the resources. It can take several minutes for mail I’m sending to be sent (though that might be a delay in Mail.app copying the message to the Sent Items folder rather than the actual sending).
  • Deleting mail takes forever! This is probably the same issue as the syncing problem, but when I delete 1000s of messages from my Junk mail folder, it runs forever, and all other activities are delayed eve further. It turns out to be much more efficient to empty the Junk and Deleted Items folders using the webmail interface. And even then, Mail.app can take a while to delete locally-cached items from the folder when it syncs.
  • Suddenly, Julie is getting a lot less spam. She went from several hundred messages showing up in her Junk mailbox a few days ago to just five on Friday and two yesterday—one of which was a false positive). As she had been expecting a message from someone that she never got, this naturally made her very suspicious. Where is all the spam? Is she getting all of her mail?
  • Since FuseMail uses a mailbox named Sent Items instead of the traditional Sent Messages for all sent mail, I asked if they could move the 1.8 GB of messages from Julie’s Sent Messages to their Sent Items, since Mail.app would just choke on such a task. Though my request was escalated to the FuseMail developers, the answer came back no. Which I guess means that they’re not using Maildir, because in that case it would be a cinch, n’est pas?
  • Backups are not really feasible. Of course FuseMail has its own backup regimen, but if I ever want to move elsewhere or deal with some sort of catastrophic failure, I want my own backups. There is no rsync service available for this (remember: no maildir), so I have to use the IMAP interface. I’ve been trying for the past two weeks to get Offline IMAP to back up all of Julie’s and my mail, but it keeps choking. It gets a little further every time I run it; eventually it will get it all. But this only allows me to backup those accounts for which I happen to have a password. I have accounts set up for a few other users, but don’t have access to their passwords, so I can’t back them up. This does not make for very good support for corporate backup and retention policies.
  • Mail forwarded by FuseMail has its Return-Path header modified. This made RT break until I hacked it to ignore that header (which is its by-default preferred header for identifying senders.

So I’m pretty fed up. It took me a week to get all of our mail on FuseMail, and now I’m looking at moving it off again (once OfflineIMAP finishes a full sync). Grr. I’m considering finding a virtual host somewhere and setting up my own IMAP server again, but then I have the spam problem again. So then I could use a forwarding service like Pobox, or I can set up my own spam filtering (something I had hoped never to get into managing myself). My old IMAP server required very little maintenance, which was nice, but then the span filtering stuff always seemed daunting. Don’t you have to update things all the time?a

But before I go off and do something else, and unlike before I moved to FuseMail, I wanted to get an idea what other folks are doing? Do you use IMAP? Do you use it to manage a shitload (read: Gigabytes) of mail? Do you get very little spam and still get all of your valid mail? Are IMAP folder maintenance actions fast for you (in Mail.app in particular)? Are you paying a not-unreasonable amount of money for your setup? If you answered yes to all of these questions, please, for the love of all that is good in this world, tell me how you do it. I’m looking for something that I don’t have to work very hard to maintain (hence my original attempt to have some company that specializes in this stuff do it), but I’ll do what I have to to make this thing right. So how do you make it right? And if I have to run my own server, where should I host it that won’t cost me an arm and a leg?

Thanks for your help!

How to Generate Recurring Events in the Database

This is a followup to my request for help fixing the performance of a database view that generated recurrences for events. This was an essential feature of Sandy, and thus important to get right. The idea when I started was simple:

  • Add a recurrence domain to the database that supports a number of different values, including daily, weekly, and monthly.
  • Add a recurrence column to the events table that identify how an event recurs.
  • Add a recurrence_dates table that contains a pre-generated list of recurrences for a given date. I’d populated this table with five years of dates, each one mapped to five years worth of recurrence dates (see the originalblog entry for more on the format of this table.
  • Create a view that maps each events row to its date and recurrence in the recurrence_dates table.
  • Profit.

It was this last bullet point that didn’t quite work out: although the data was perfectly accurate, queries for a lot of events in the view were very expensive. I mean, the query could run for 3-4 minutes. It was just crazy! I couldn’t figure out the problem, so I posted my request for help. It was through discussions that followed with depesz that I finally figured out what the problem was: Although I was usually selecting only a week’s or months worth of events, the view was calculating rows for all five years worth of data for all of the events for a given user. Um, not efficient.

So here I finally document how, with a lot of help and example code from depesz, I solved the problem. The trick was to use a function instead of a view to generate the recurring event rows, and to limit it only to the dates we’re interested in. For convenience sake, I broke this down into two PL/pgSQL functions: one to generate recurring dates and one to return the recurring event rows. But first, here’s the recurrence domain and the events table, both of which are unchanged from the original approach:

CREATE DOMAIN recurrence AS TEXT
CHECK ( VALUE IN ( 'none', 'daily', 'weekly', 'monthly' ) );

CREATE TABLE events (
    id         SERIAL     PRIMARY KEY,
    user_id    INTEGER    NOT NULL,
    starts_at  TIMESTAMP  NOT NULL,
    start_tz   TEXT       NOT NULL,
    ends_at    TIMESTAMP,
    end_tz     TEXT       NOT NULL,
    recurrence RECURRENCE NOT NULL DEFAULT 'none'
);

Just assume the user_id is a foreign key. Now let’s populate this table with some data. For the purposes of this demonstration, I’m going to create one event per day for 1000 days, evenly divided between daily, weekly, monthly, and no recurrences, as well as five different times of day and six different durations:

INSERT INTO events (user_id, starts_at, start_tz, ends_at, end_tz, recurrence)
SELECT 1,
       ts::timestamp,
       'PST8PDT',
       ts::timestamp + dur::interval,
       'PST8PDT',
       recur
  FROM (
    SELECT '2007-12-19'::date + i || ' ' || CASE i % 5
               WHEN 0 THEN '06:00'
               WHEN 1 THEN '10:00'
               WHEN 2 THEN '14:00'
               WHEN 3 THEN '18:00'
               ELSE        '22:30'
               END,
           CASE i % 6
               WHEN 0 THEN '2 hours'
               WHEN 1 THEN '1 hour'
               WHEN 2 THEN '45 minutes'
               WHEN 3 THEN '3.5 hours'
               WHEN 4 THEN '15 minutes'
               ELSE        '30 minutes'
               END,
           CASE i % 4
               WHEN 0 THEN 'daily'
               WHEN 1 THEN 'weekly'
               WHEN 2 THEN 'monthly'
               ELSE        'none'
               END
    FROM generate_series(1, 1000) as gen(i)
  ) AS ser( ts, dur, recur);

This gives us some nicely distributed data:

try=# select * from events limit 10;
  id  | user_id |      starts_at      | start_tz |       ends_at       | end_tz  | recurrence 
------+---------+---------------------+----------+---------------------+---------+------------
    1 |       1 | 2007-12-20 10:00:00 | PST8PDT  | 2007-12-20 11:00:00 | PST8PDT | weekly
    2 |       1 | 2007-12-21 14:00:00 | PST8PDT  | 2007-12-21 14:45:00 | PST8PDT | monthly
    3 |       1 | 2007-12-22 18:00:00 | PST8PDT  | 2007-12-22 21:30:00 | PST8PDT | none
    4 |       1 | 2007-12-23 22:30:00 | PST8PDT  | 2007-12-23 22:45:00 | PST8PDT | daily
    5 |       1 | 2007-12-24 06:00:00 | PST8PDT  | 2007-12-24 06:30:00 | PST8PDT | weekly
    6 |       1 | 2007-12-25 10:00:00 | PST8PDT  | 2007-12-25 12:00:00 | PST8PDT | monthly
    7 |       1 | 2007-12-26 14:00:00 | PST8PDT  | 2007-12-26 15:00:00 | PST8PDT | none
    8 |       1 | 2007-12-27 18:00:00 | PST8PDT  | 2007-12-27 18:45:00 | PST8PDT | daily
    9 |       1 | 2007-12-28 22:30:00 | PST8PDT  | 2007-12-29 02:00:00 | PST8PDT | weekly
   10 |       1 | 2007-12-29 06:00:00 | PST8PDT  | 2007-12-29 06:15:00 | PST8PDT | monthly
(10 rows)

Now let’s get to the recurring date function:

CREATE OR REPLACE FUNCTION  generate_recurrences(
    recurs RECURRENCE, 
    start_date DATE,
    end_date DATE
)
    RETURNS setof DATE
    LANGUAGE plpgsql IMMUTABLE
    AS $BODY$
DECLARE
    next_date DATE := start_date;
    duration  INTERVAL;
    day       INTERVAL;
    check     TEXT;
BEGIN
    IF recurs = 'none' THEN
        -- Only one date ever.
        RETURN next next_date;
    ELSIF recurs = 'weekly' THEN
        duration := '7 days'::interval;
        WHILE next_date <= end_date LOOP
            RETURN NEXT next_date;
            next_date := next_date + duration;
        END LOOP;
    ELSIF recurs = 'daily' THEN
        duration := '1 day'::interval;
        WHILE next_date <= end_date LOOP
            RETURN NEXT next_date;
            next_date := next_date + duration;
        END LOOP;
    ELSIF recurs = 'monthly' THEN
        duration := '27 days'::interval;
        day      := '1 day'::interval;
        check    := to_char(start_date, 'DD');
        WHILE next_date <= end_date LOOP
            RETURN NEXT next_date;
            next_date := next_date + duration;
            WHILE to_char(next_date, 'DD') <> check LOOP
                next_date := next_date + day;
            END LOOP;
        END LOOP;
    ELSE
        -- Someone needs to update this function, methinks.
        RAISE EXCEPTION 'Recurrence % not supported by generate_recurrences()', recurs;
    END IF;
END;
$BODY$;

The idea here is pretty simple: pass in a recurrence (daily, weekly, or monthly), a start date, and an end date, and get back a set of all the recurrence dates between the start and end dates:

try=# \timing
Timing is on.
try=# select * from  generate_recurrences('daily', '2008-01-29', '2008-02-05');
 generate_recurrences 
----------------------
 2008-01-29
 2008-01-30
 2008-01-31
 2008-02-01
 2008-02-02
 2008-02-03
 2008-02-04
 2008-02-05
(8 rows)

Time: 0.548 ms
try=# select * from  generate_recurrences('weekly', '2008-01-29', '2008-03-05');
 generate_recurrences 
----------------------
 2008-01-29
 2008-02-05
 2008-02-12
 2008-02-19
 2008-02-26
 2008-03-04
(6 rows)

Time: 0.670 ms
try=# select * from  generate_recurrences('monthly', '2008-01-29', '2008-05-05');
 generate_recurrences 
----------------------
 2008-01-29
 2008-02-29
 2008-03-29
 2008-04-29
(4 rows)

Time: 0.644 ms

Not bad, eh? And PostgreSQL’s date and interval calculation operators are wicked fast. Check out how long it dates to generate two years worth of daily recurrence dates:

try=# select * from  generate_recurrences('daily', '2008-01-29', '2010-02-05');
 generate_recurrences 
----------------------
 2008-01-29
 2008-01-30
 2008-01-31
...
 2010-02-03
 2010-02-04
 2010-02-05
(739 rows)

Time: 4.982 ms

Awesome. And the great thing about this function is that any time I need to add new recurrences (yearly, biweekly, quarterly, weekends, weekdays, etc.), I just modify the domain and this function and we’re ready to go.

And now, part two: the recurring event function:

CREATE OR REPLACE FUNCTION recurring_events_for(
   for_user_id INTEGER,
   range_start TIMESTAMP,
   range_end   TIMESTAMP
)
   RETURNS SETOF events
   LANGUAGE plpgsql STABLE
   AS $BODY$
DECLARE
   event events;
   start_date TIMESTAMPTZ;
   start_time TEXT;
   ends_at    TIMESTAMPTZ;
   next_date  DATE;
   recurs_at  TIMESTAMPTZ;
BEGIN
   FOR event IN 
       SELECT *
         FROM events
        WHERE user_id = for_user_id
          AND (
                  recurrence <> 'none'
              OR  (
                     recurrence = 'none'
                 AND starts_at BETWEEN range_start AND range_end
              )
          )
    LOOP
       IF event.recurrence = 'none' THEN
         RETURN NEXT event;
         CONTINUE;
       END IF;

       start_date := event.starts_at::timestamptz AT TIME ZONE event.start_tz;
       start_time := start_date::time::text;
       ends_at    := event.ends_at::timestamptz AT TIME ZONE event.end_tz;

       FOR next_date IN
           SELECT *
             FROM generate_recurrences(
                      event.recurrence,
                      start_date::date,
                      (range_end AT TIME ZONE event.start_tz)::date
             )
       LOOP
           recurs_at := (next_date || ' ' || start_time)::timestamp
               AT TIME ZONE event.start_tz;
           EXIT WHEN recurs_at > range_end;
           CONTINUE WHEN recurs_at < range_start AND ends_at < range_start;
           event.starts_at := recurs_at;
           event.ends_at   := ends_at;
           RETURN NEXT event;
       END LOOP;
   END LOOP;
   RETURN;
END;
$BODY$;

The idea here is to select the appropriate events for a given user between two dates, and for each event iterate over all of the recurrences between the two dates and return a row for each one. So the lines starting with FOR event IN and ending with LOOP select the original events, looking for either recurring events or non-recurring events that are between the two dates. Note that if you needed to, you could easily refine this query for your particular application, or even use PL/pgSQL’s EXECUTE operator to dynamically generate queries to suit particular application needs.

Next, the block starting with IF event.recurrence = 'none' THEN simply returns any non-recurring events. Although the next block already handles this case, adding this optimization eliminates a fair bit of calculation for the common case of non-recurring events.

Then the lines starting with FOR next_date IN and ending with LOOP select all of the dates for the recurrence in question, using the generate_recurrences() function created earlier. From LOOP to END LOOP;, the function generates the start and end timestamps, exiting the loop when the start date falls after the range or when it falls before the range and the end date falls after the range. There are many other tweaks one could make here to modify which recurrences are included and which are excluded. For example, if you had a column in the events table such as exclude_dates TIMESTAMP[] NOT NULL DEFAULT ‘{}’ that stored an array of dates to ignore when generating recurrences, you could just add this line to go ahead and exclude them from the results returned by the function:

           CONTINUE WHEN recurs_at = ANY( exclude_dates );

But enough of the details: let’s see how it works! Here’s a query for a week’s worth of data:

try=# select * from recurring_events_for(1, '2007-12-19', '2007-12-26');
  id  | user_id |      starts_at      | start_tz |       ends_at       | end_tz  | recurrence 
------+---------+---------------------+----------+---------------------+---------+------------
    1 |       1 | 2007-12-20 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    2 |       1 | 2007-12-21 14:00:00 | PST8PDT  | 2007-12-21 06:45:00 | PST8PDT | monthly
    3 |       1 | 2007-12-22 18:00:00 | PST8PDT  | 2007-12-22 21:30:00 | PST8PDT | none
    4 |       1 | 2007-12-23 22:30:00 | PST8PDT  | 2007-12-23 14:45:00 | PST8PDT | daily
    4 |       1 | 2007-12-24 22:30:00 | PST8PDT  | 2007-12-23 14:45:00 | PST8PDT | daily
    4 |       1 | 2007-12-25 22:30:00 | PST8PDT  | 2007-12-23 14:45:00 | PST8PDT | daily
    5 |       1 | 2007-12-24 06:00:00 | PST8PDT  | 2007-12-23 22:30:00 | PST8PDT | weekly
    6 |       1 | 2007-12-25 10:00:00 | PST8PDT  | 2007-12-25 04:00:00 | PST8PDT | monthly
(8 rows)

Time: 51.890 ms

Note the time it took to execute this query. 52 ms is a hell of a lot faster than the several minutes it took to run a similar query using the old view. Plus, I’m not limited to just the recurrence dates I’ve pre-calculated in the old recurrence_dates table. Now we can use whatever dates are supported by PostgreSQL. It’s even fast when we look at a year’s worth of data:

try=# select * from recurring_events_for(1, '2007-12-19', '2008-12-19');
  id  | user_id |      starts_at      | start_tz |       ends_at       | end_tz  | recurrence 
------+---------+---------------------+----------+---------------------+---------+------------
    1 |       1 | 2007-12-20 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2007-12-27 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-03 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-10 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-17 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-24 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-31 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
...
  364 |       1 | 2008-12-17 22:30:00 | PST8PDT  | 2008-12-17 14:45:00 | PST8PDT | daily
  364 |       1 | 2008-12-18 22:30:00 | PST8PDT  | 2008-12-17 14:45:00 | PST8PDT | daily
  365 |       1 | 2008-12-18 06:00:00 | PST8PDT  | 2008-12-17 22:30:00 | PST8PDT | weekly
(19691 rows)

Time: 837.759 ms

Not stellar, but still respectable. Given that for a typical application, a user will be looking at only a day’s or a week’s or a month’s events at a time, this seems to be an acceptable trade-off. I mean, how often will your users need to see a list of 20,000 events? And even if a user waslooking at a year’s worth of data, it’s unlikely that 75% of them would be recurring as in the example data here.

I was fucking pumped with this solution, and Sandy has hummed along nicely since we put it into production. If you’re interested in trying it for yourself, I’ve you can get all the SQL from this blog entry here.

The only thing I would like to have been able to do differently was to encapsulate the recurring_events_for() function in a view. Such would have made it much easier to actually use this solution in Rails. If you know how to do that, please do leave a comment. As for how I hacked Rails to use the function, well, that’s a blog post for another day.

Powered by KinoSearch