home :: computers :: databases

How to Generate Recurring Events in the Database

This is a followup to my request for help fixing the performance of a database view that generated recurrences for events. This was an essential feature of Sandy, and thus important to get right. The idea when I started was simple:

  • Add a recurrence domain to the database that supports a number of different values, including daily, weekly, and monthly.
  • Add a recurrence column to the events table that identify how an event recurs.
  • Add a recurrence_dates table that contains a pre-generated list of recurrences for a given date. I’d populated this table with five years of dates, each one mapped to five years worth of recurrence dates (see the originalblog entry for more on the format of this table.
  • Create a view that maps each events row to its date and recurrence in the recurrence_dates table.
  • Profit.

It was this last bullet point that didn’t quite work out: although the data was perfectly accurate, queries for a lot of events in the view were very expensive. I mean, the query could run for 3-4 minutes. It was just crazy! I couldn’t figure out the problem, so I posted my request for help. It was through discussions that followed with depesz that I finally figured out what the problem was: Although I was usually selecting only a week’s or months worth of events, the view was calculating rows for all five years worth of data for all of the events for a given user. Um, not efficient.

So here I finally document how, with a lot of help and example code from depesz, I solved the problem. The trick was to use a function instead of a view to generate the recurring event rows, and to limit it only to the dates we’re interested in. For convenience sake, I broke this down into two PL/pgSQL functions: one to generate recurring dates and one to return the recurring event rows. But first, here’s the recurrence domain and the events table, both of which are unchanged from the original approach:

CREATE DOMAIN recurrence AS TEXT
CHECK ( VALUE IN ( 'none', 'daily', 'weekly', 'monthly' ) );

CREATE TABLE events (
    id         SERIAL     PRIMARY KEY,
    user_id    INTEGER    NOT NULL,
    starts_at  TIMESTAMP  NOT NULL,
    start_tz   TEXT       NOT NULL,
    ends_at    TIMESTAMP,
    end_tz     TEXT       NOT NULL,
    recurrence RECURRENCE NOT NULL DEFAULT 'none'
);

Just assume the user_id is a foreign key. Now let’s populate this table with some data. For the purposes of this demonstration, I’m going to create one event per day for 1000 days, evenly divided between daily, weekly, monthly, and no recurrences, as well as five different times of day and six different durations:

INSERT INTO events (user_id, starts_at, start_tz, ends_at, end_tz, recurrence)
SELECT 1,
       ts::timestamp,
       'PST8PDT',
       ts::timestamp + dur::interval,
       'PST8PDT',
       recur
  FROM (
    SELECT '2007-12-19'::date + i || ' ' || CASE i % 5
               WHEN 0 THEN '06:00'
               WHEN 1 THEN '10:00'
               WHEN 2 THEN '14:00'
               WHEN 3 THEN '18:00'
               ELSE        '22:30'
               END,
           CASE i % 6
               WHEN 0 THEN '2 hours'
               WHEN 1 THEN '1 hour'
               WHEN 2 THEN '45 minutes'
               WHEN 3 THEN '3.5 hours'
               WHEN 4 THEN '15 minutes'
               ELSE        '30 minutes'
               END,
           CASE i % 4
               WHEN 0 THEN 'daily'
               WHEN 1 THEN 'weekly'
               WHEN 2 THEN 'monthly'
               ELSE        'none'
               END
    FROM generate_series(1, 1000) as gen(i)
  ) AS ser( ts, dur, recur);

This gives us some nicely distributed data:

try=# select * from events limit 10;
  id  | user_id |      starts_at      | start_tz |       ends_at       | end_tz  | recurrence 
------+---------+---------------------+----------+---------------------+---------+------------
    1 |       1 | 2007-12-20 10:00:00 | PST8PDT  | 2007-12-20 11:00:00 | PST8PDT | weekly
    2 |       1 | 2007-12-21 14:00:00 | PST8PDT  | 2007-12-21 14:45:00 | PST8PDT | monthly
    3 |       1 | 2007-12-22 18:00:00 | PST8PDT  | 2007-12-22 21:30:00 | PST8PDT | none
    4 |       1 | 2007-12-23 22:30:00 | PST8PDT  | 2007-12-23 22:45:00 | PST8PDT | daily
    5 |       1 | 2007-12-24 06:00:00 | PST8PDT  | 2007-12-24 06:30:00 | PST8PDT | weekly
    6 |       1 | 2007-12-25 10:00:00 | PST8PDT  | 2007-12-25 12:00:00 | PST8PDT | monthly
    7 |       1 | 2007-12-26 14:00:00 | PST8PDT  | 2007-12-26 15:00:00 | PST8PDT | none
    8 |       1 | 2007-12-27 18:00:00 | PST8PDT  | 2007-12-27 18:45:00 | PST8PDT | daily
    9 |       1 | 2007-12-28 22:30:00 | PST8PDT  | 2007-12-29 02:00:00 | PST8PDT | weekly
   10 |       1 | 2007-12-29 06:00:00 | PST8PDT  | 2007-12-29 06:15:00 | PST8PDT | monthly
(10 rows)

Now let’s get to the recurring date function:

CREATE OR REPLACE FUNCTION  generate_recurrences(
    recurs RECURRENCE, 
    start_date DATE,
    end_date DATE
)
    RETURNS setof DATE
    LANGUAGE plpgsql IMMUTABLE
    AS $BODY$
DECLARE
    next_date DATE := start_date;
    duration  INTERVAL;
    day       INTERVAL;
    check     TEXT;
BEGIN
    IF recurs = 'none' THEN
        -- Only one date ever.
        RETURN next next_date;
    ELSIF recurs = 'weekly' THEN
        duration := '7 days'::interval;
        WHILE next_date <= end_date LOOP
            RETURN NEXT next_date;
            next_date := next_date + duration;
        END LOOP;
    ELSIF recurs = 'daily' THEN
        duration := '1 day'::interval;
        WHILE next_date <= end_date LOOP
            RETURN NEXT next_date;
            next_date := next_date + duration;
        END LOOP;
    ELSIF recurs = 'monthly' THEN
        duration := '27 days'::interval;
        day      := '1 day'::interval;
        check    := to_char(start_date, 'DD');
        WHILE next_date <= end_date LOOP
            RETURN NEXT next_date;
            next_date := next_date + duration;
            WHILE to_char(next_date, 'DD') <> check LOOP
                next_date := next_date + day;
            END LOOP;
        END LOOP;
    ELSE
        -- Someone needs to update this function, methinks.
        RAISE EXCEPTION 'Recurrence % not supported by generate_recurrences()', recurs;
    END IF;
END;
$BODY$;

The idea here is pretty simple: pass in a recurrence (daily, weekly, or monthly), a start date, and an end date, and get back a set of all the recurrence dates between the start and end dates:

try=# \timing
Timing is on.
try=# select * from  generate_recurrences('daily', '2008-01-29', '2008-02-05');
 generate_recurrences 
----------------------
 2008-01-29
 2008-01-30
 2008-01-31
 2008-02-01
 2008-02-02
 2008-02-03
 2008-02-04
 2008-02-05
(8 rows)

Time: 0.548 ms
try=# select * from  generate_recurrences('weekly', '2008-01-29', '2008-03-05');
 generate_recurrences 
----------------------
 2008-01-29
 2008-02-05
 2008-02-12
 2008-02-19
 2008-02-26
 2008-03-04
(6 rows)

Time: 0.670 ms
try=# select * from  generate_recurrences('monthly', '2008-01-29', '2008-05-05');
 generate_recurrences 
----------------------
 2008-01-29
 2008-02-29
 2008-03-29
 2008-04-29
(4 rows)

Time: 0.644 ms

Not bad, eh? And PostgreSQL’s date and interval calculation operators are wicked fast. Check out how long it dates to generate two years worth of daily recurrence dates:

try=# select * from  generate_recurrences('daily', '2008-01-29', '2010-02-05');
 generate_recurrences 
----------------------
 2008-01-29
 2008-01-30
 2008-01-31
...
 2010-02-03
 2010-02-04
 2010-02-05
(739 rows)

Time: 4.982 ms

Awesome. And the great thing about this function is that any time I need to add new recurrences (yearly, biweekly, quarterly, weekends, weekdays, etc.), I just modify the domain and this function and we’re ready to go.

And now, part two: the recurring event function:

CREATE OR REPLACE FUNCTION recurring_events_for(
   for_user_id INTEGER,
   range_start TIMESTAMP,
   range_end   TIMESTAMP
)
   RETURNS SETOF events
   LANGUAGE plpgsql STABLE
   AS $BODY$
DECLARE
   event events;
   start_date TIMESTAMPTZ;
   start_time TEXT;
   ends_at    TIMESTAMPTZ;
   next_date  DATE;
   recurs_at  TIMESTAMPTZ;
BEGIN
   FOR event IN 
       SELECT *
         FROM events
        WHERE user_id = for_user_id
          AND (
                  recurrence <> 'none'
              OR  (
                     recurrence = 'none'
                 AND starts_at BETWEEN range_start AND range_end
              )
          )
    LOOP
       IF event.recurrence = 'none' THEN
         RETURN NEXT event;
         CONTINUE;
       END IF;

       start_date := event.starts_at::timestamptz AT TIME ZONE event.start_tz;
       start_time := start_date::time::text;
       ends_at    := event.ends_at::timestamptz AT TIME ZONE event.end_tz;

       FOR next_date IN
           SELECT *
             FROM generate_recurrences(
                      event.recurrence,
                      start_date::date,
                      (range_end AT TIME ZONE event.start_tz)::date
             )
       LOOP
           recurs_at := (next_date || ' ' || start_time)::timestamp
               AT TIME ZONE event.start_tz;
           EXIT WHEN recurs_at > range_end;
           CONTINUE WHEN recurs_at < range_start AND ends_at < range_start;
           event.starts_at := recurs_at;
           event.ends_at   := ends_at;
           RETURN NEXT event;
       END LOOP;
   END LOOP;
   RETURN;
END;
$BODY$;

The idea here is to select the appropriate events for a given user between two dates, and for each event iterate over all of the recurrences between the two dates and return a row for each one. So the lines starting with FOR event IN and ending with LOOP select the original events, looking for either recurring events or non-recurring events that are between the two dates. Note that if you needed to, you could easily refine this query for your particular application, or even use PL/pgSQL’s EXECUTE operator to dynamically generate queries to suit particular application needs.

Next, the block starting with IF event.recurrence = 'none' THEN simply returns any non-recurring events. Although the next block already handles this case, adding this optimization eliminates a fair bit of calculation for the common case of non-recurring events.

Then the lines starting with FOR next_date IN and ending with LOOP select all of the dates for the recurrence in question, using the generate_recurrences() function created earlier. From LOOP to END LOOP;, the function generates the start and end timestamps, exiting the loop when the start date falls after the range or when it falls before the range and the end date falls after the range. There are many other tweaks one could make here to modify which recurrences are included and which are excluded. For example, if you had a column in the events table such as exclude_dates TIMESTAMP[] NOT NULL DEFAULT ‘{}’ that stored an array of dates to ignore when generating recurrences, you could just add this line to go ahead and exclude them from the results returned by the function:

           CONTINUE WHEN recurs_at = ANY( exclude_dates );

But enough of the details: let’s see how it works! Here’s a query for a week’s worth of data:

try=# select * from recurring_events_for(1, '2007-12-19', '2007-12-26');
  id  | user_id |      starts_at      | start_tz |       ends_at       | end_tz  | recurrence 
------+---------+---------------------+----------+---------------------+---------+------------
    1 |       1 | 2007-12-20 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    2 |       1 | 2007-12-21 14:00:00 | PST8PDT  | 2007-12-21 06:45:00 | PST8PDT | monthly
    3 |       1 | 2007-12-22 18:00:00 | PST8PDT  | 2007-12-22 21:30:00 | PST8PDT | none
    4 |       1 | 2007-12-23 22:30:00 | PST8PDT  | 2007-12-23 14:45:00 | PST8PDT | daily
    4 |       1 | 2007-12-24 22:30:00 | PST8PDT  | 2007-12-23 14:45:00 | PST8PDT | daily
    4 |       1 | 2007-12-25 22:30:00 | PST8PDT  | 2007-12-23 14:45:00 | PST8PDT | daily
    5 |       1 | 2007-12-24 06:00:00 | PST8PDT  | 2007-12-23 22:30:00 | PST8PDT | weekly
    6 |       1 | 2007-12-25 10:00:00 | PST8PDT  | 2007-12-25 04:00:00 | PST8PDT | monthly
(8 rows)

Time: 51.890 ms

Note the time it took to execute this query. 52 ms is a hell of a lot faster than the several minutes it took to run a similar query using the old view. Plus, I’m not limited to just the recurrence dates I’ve pre-calculated in the old recurrence_dates table. Now we can use whatever dates are supported by PostgreSQL. It’s even fast when we look at a year’s worth of data:

try=# select * from recurring_events_for(1, '2007-12-19', '2008-12-19');
  id  | user_id |      starts_at      | start_tz |       ends_at       | end_tz  | recurrence 
------+---------+---------------------+----------+---------------------+---------+------------
    1 |       1 | 2007-12-20 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2007-12-27 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-03 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-10 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-17 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-24 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
    1 |       1 | 2008-01-31 10:00:00 | PST8PDT  | 2007-12-20 03:00:00 | PST8PDT | weekly
...
  364 |       1 | 2008-12-17 22:30:00 | PST8PDT  | 2008-12-17 14:45:00 | PST8PDT | daily
  364 |       1 | 2008-12-18 22:30:00 | PST8PDT  | 2008-12-17 14:45:00 | PST8PDT | daily
  365 |       1 | 2008-12-18 06:00:00 | PST8PDT  | 2008-12-17 22:30:00 | PST8PDT | weekly
(19691 rows)

Time: 837.759 ms

Not stellar, but still respectable. Given that for a typical application, a user will be looking at only a day’s or a week’s or a month’s events at a time, this seems to be an acceptable trade-off. I mean, how often will your users need to see a list of 20,000 events? And even if a user waslooking at a year’s worth of data, it’s unlikely that 75% of them would be recurring as in the example data here.

I was fucking pumped with this solution, and Sandy has hummed along nicely since we put it into production. If you’re interested in trying it for yourself, I’ve you can get all the SQL from this blog entry here.

The only thing I would like to have been able to do differently was to encapsulate the recurring_events_for() function in a view. Such would have made it much easier to actually use this solution in Rails. If you know how to do that, please do leave a comment. As for how I hacked Rails to use the function, well, that’s a blog post for another day.

Validating Time Zones in PostgreSQL

I recently needed to validate that a value stored in a TEXTcolumn was a valid time zone identifier. Why? Because I was using its value inside the database to convert timestamp columns from UTC to a valid zone. So I set about writing a function I could use in a constraint.

It turns out that PostgreSQL has a pretty nice view that lists all of the time zones that it recognizes. It’s called pg_timezone_names:

try=# select * from pg_timezone_names limit 5;
        name        | abbrev | utc_offset | is_dst 
--------------------+--------+------------+--------
 Africa/Abidjan     | GMT    | 00:00:00   | f
 Africa/Accra       | GMT    | 00:00:00   | f
 Africa/Addis_Ababa | EAT    | 03:00:00   | f
 Africa/Algiers     | CET    | 01:00:00   | f
 Africa/Asmara      | EAT    | 03:00:00   | f
(5 rows)

Cool. So all I had to do was to look up the value in this view. My first stab at creating a time zone validation function therefore looked like this:

CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
DECLARE
  bool BOOLEAN;
BEGIN
  SELECT TRUE INTO bool
    FROM pg_timezone_names
   WHERE LOWER(name) = LOWER(tz)
      OR LOWER(abbrev) = LOWER(tz);
  RETURN FOUND;
END;
$$ language plpgsql STABLE;

This should pretty well cover anything that PostgreSQL considers valid. So does it work? You bet:

sandy_development=# \timing
Timing is on.
sandy_development=# select is_timezone(’America/Los_Angeles’);
 is_timezone 
-------------
 t
(1 row)

Time: 457.096 ms
sandy_development=# select is_timezone(’Foo/Bar’);
 is_timezone 
-------------
 f
(1 row)

Time: 472.752 ms

Perfect! Well, except for just one thing: performance is abysmal. A half second per shot? Not very useful for constraints. And since pg_timezone_names is a view (and, under that, a function), I can’t create indexes.

But then I did something dangerous: I started thinking. I realized that I needed this function when our app started getting errors like this:

try=# select now() at time zone ‘Foo/Bar’;
ERROR:  time zone "Foo/Bar" not recognized

So the underlying C code throws an error when a time zone is invalid. What if I could just trap the error? Well, PL/pgSQL conveniently has exception handling, so I could do just that. But there was only one problem. PL/pgSQL’s exception handling syntax requires that you specify an error condition. Here’s what the documentation has:

EXCEPTION
    WHEN condition [ OR condition ... ] THEN
        handler_statements
    [ WHEN condition [ OR condition ... ] THEN
          handler_statements
      ... ]
END;

Conditions are error codes. But which one corresponds to the invalid time zone error? I tried a few, but couldn’t figure out which one. (Anyone know now to map errors you see in psql to the error codes listed in Appendex A? Let me know!) But really, my function just needed to do one thing. Couldn’t I just trap any old error?

A careful re-read of the PL/pgSQL documentation reveals that, yes, you can. Use the condition OTHERS, and you can catch almost anything. With this information in hand, I quickly wrote:

CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
DECLARE
  date TIMESTAMPTZ;
BEGIN
  date := now() AT TIME ZONE tz;
  RETURN TRUE;
EXCEPTION WHEN OTHERS THEN
  RETURN FALSE;
END;
$$ language plpgsql STABLE;

And how well does this one work?

sandy_development=# select is_timezone(’America/Los_Angeles’);
 is_timezone 
-------------
 t
(1 row)

Time: 3.009 ms
sandy_development=# select is_timezone(’Foo/Bar’);
 is_timezone 
-------------
 f
(1 row)

Time: 1.224 ms

Yes, I’ll take 1-3 ms over 400-500 ms any day! I might even create a domain for this and be done with it:

CREATE DOMAIN timezone AS TEXT
CHECK ( is_timezone( value ) );

Enjoy!

Need Help Reducing View Calculations

I could use some advice and suggestions for how to solve a performance problem due to the highly redundant calculation of values in a view. Sorry for the longish explanation. I wanted to make sure that I omitted no details in desribing the problem.

In order to support recurring events in an application I’m working on, we have a lookup table that maps dates to their daily, weekly, monthly, and yearly recurrences. It looks something like this:

try=# \d recurrence_dates
   Table "public.recurrence_dates"
   Column   |    Type    | Modifiers 
------------+------------+-----------
 date       | date       | not null
 recurrence | recurrence | not null
 next_date  | date       | not null
Indexes:
    "recurrence_dates_pkey" PRIMARY KEY, btree (date, recurrence, next_date)
    "index_recurrence_dates_on_date_and_recurrence" btree (date, recurrence)

try=# select * from recurrence_dates
try-# where date = '2007-11-04'
try-# order by recurrence, next_date;
    date    | recurrence | next_date  
------------+------------+------------
 2007-11-04 | daily      | 2007-11-04
 2007-11-04 | daily      | 2007-11-05
 2007-11-04 | daily      | 2007-11-06
 2007-11-04 | weekly     | 2007-11-04
 2007-11-04 | weekly     | 2007-11-11
 2007-11-04 | weekly     | 2007-11-18
 2007-11-04 | monthly    | 2007-11-04
 2007-11-04 | monthly    | 2007-12-04
 2007-11-04 | monthly    | 2008-01-04
 2007-11-04 | annually   | 2007-11-04
 2007-11-04 | annually   | 2008-11-04
 2007-11-04 | annually   | 2009-11-04
 2007-11-04 | none       | 2007-11-04

To get all of the permutations of recurring events, we simply select from a view rather than from the events table that contains the actual event data. The view joins events to recurrence_dates table like so:

CREATE OR REPLACE VIEW recurring_events AS
SELECT id, name, user_id, duration,
       (rd.next_date || ' ' ||
       (starts_at::timestamptz at time zone start_tz)::time)::timestamp
       at time zone start_tz AS starts_at,
       start_tz
  FROM events LEFT JOIN recurrence_dates rd
    ON (events.starts_at::timestamptz at time zone events.start_tz)::date = rd.date
   AND events.recurrence = rd.recurrence;

Then, to get all of the recurrences of events for a user within a week, we do something like this in the client code:

SELECT *
  FROM recurring_events
 WHERE user_id = 2
   AND starts_at BETWEEN '2007-11-04 07:00:00' AND '2007-11-10 07:59:59';

This works perfectly, as all of our dates and times are stored in UTC in timestamp columns. We pass UTC times for the appropriate offset to the query (Pacific Time in this example) and, because the view does the right thing in mapping the starts_at time for each event to its proper time zone, we get all of the events within the date range, even if they are recurrences of an earlier event, and with their times properly set.

The trouble we’re having, however, is all of those conversions. Until last week, the view just kept everything in UTC and left it to the client to convert to the proper zone in the start_tz column. But that didn’t work so well when an event’s starts_at was during daylight savings time and recurrences were in standard time: the standard time recurrences were all an hour off! So I added the repeated instances of events.starts_at::timestamptz at time zone events.start_tz. But now the view is really slow.

Since the only thing that has changed is the addition of the time zone conversions, I believe that the performance penalty is because of them. The calculation executes multiple times per row: once for the join and once again for the starts_at column. We can have an awful lot of events for a given user, and an awful lot of recurrences of a given event. If, for example, an event recurs daily for 2 years, there will be around 730 rows for that one event. And the calculation has to be executed for every one of them before the WHERE clause can be properly evaluated. Ouch! Worse still, we actually have three columns that do this in our application, not just one as in the example here.

So what I need is a way to execute that calculation just once for each row in the events table, rather than once for each row in the recurring_events view. I figure 1 calculation will be a heck of a lot faster than 730! So the question is, how do I do this? How do I get the view to execute the conversion of the starts_at to the start_tz time zone only once for each row in events, regardless of how many rows it ends up generating in the recurring_events view?

Suggestions warmly welcomed. This is a bit of a tickler for me, and since the query performance on these views is killing us, I need to get this adjusted post haste!

Meanwhile, tomorrow I’ll post a cool hack I came up with for validating time zones in the database. Something to look forward to as you ponder my little puzzle, eh?

Update 2008-01-30: Thanks to help from depesz, I came figured out what the underling problem was and solved it much more elegantly using PL/pgSQL. I’ve now written up the basic recipe. Enjoy!

PostgreSQL Warm Standby Using Ruby

The new PostgreSQL Warm Standby support is pretty nice. Since my app doesn’t currently require read access to a slave database, I’ve dumped Slony-I (and all of the headache that went with it), and now have a warm failover server being updated a least once per minute. W00t!

I used Charles Duffy’s example, as well as the documentation, to build my warm standby configuration, but unfortunately, our server OS does not have the usleep utility, so rather than have 1 second sleeps, I ported Charles’s shell script to Ruby. Here it is for your enjoyment:

#!/usr/bin/env ruby

DELAY         = 0.01
FAILOVER_FILE = "/path/to/failover"

@@triggered = false

require ‘ftools’

def move (from, to)
  # Do not overwrite! Throws an exception on failure, existing the script.
  File.copy( from, to ) unless @@triggered || File.exists?( to )
end

from, to = ARGV

# If PostgreSQL is asking for .history, just try to move it and exit.
if from =~ /\.history$/
  move from, to
  exit
end

# Sleep while waiting for the file.
while !File.exists?(from) && !@@triggered
  sleep DELAY
  @@triggered = true if File.exists?( FAILOVER_FILE )
end

# Move the file.
move from, to

Just change the DELAY value to the number of seconds you want to sleep, and the FAILOVER_FILE value to the location of a file that will trigger a failover.

This is all well and good, but I ultimately ended up using the pg_standby utility that’s a new contrib utility in PostgreSQL CVS (and will therefore ship with 8.3), as it has the nice feature of cleaning up old WAL log files. It also does not have subsecond precision, but hey, maybe we don’t really need it.

Rails Migrations with Slony?

The new app I’m developing is written in Ruby on Rails and runs on PostgreSQL. We’re replicating our production database using Slony-I, but we’ve run into a bit of a snag: database schema updates must be run as plain SQL through a Slony script in order to ensure proper replication of the schema changes within a transaction, but Rails migrations run as Ruby code updating the database via the Rails database adapter.

So how do others handle Rails migrations with their Slony-I replication setups? How do you update the Slony-I configuration file for the changes? How do you synchronize changes to the master schema out to the slaves? Do you shut down your apps, shut down Slony-I, make the schema changes to both the master and the slaves, and then restart Slony-I and your apps?

For that matter, people running Slony for their Bricolage databases must have the same issue, because the Bricolage upgrade scripts are just Perl using the DBI, not SQL files. Can anyone shed a little light on this for me?

Oh, and one last question: Why is this such a PITA? Can’t we have decent replication that replicates everything, including schema changes? Please?

Powered by KinoSearch