Some Thoughts on Database Change Management

I've had reason to think about database change management lately. While I was working on Rails apps, I of course made use of migrations. These were an improvement over what I'd been doing with Bricolage upgrade scripts, in the sense that the system kept track of what version you were on (where “version” means “numeric sequence of migration scripts”) and applied whatever had a higher number than was last applied. But Rails migrations leave a number of things to be desired, not least of which that they support a very small subset of SQL and require that you write the migrations in Ruby (sure I could embed SQL, and I did, but it was still Ruby and thus not able to integrate with, say, Slony-I).

Inspired by Rails migrations, last summer I worte some code for a customer that did essentially the same thing, except that the migration scripts were pure SQL scripts rather than Ruby (or Perl or whatever). I did this by subclassing Module::Build to add a “db” action. So for a deployment, you could do this:

perl Build.PL
./Build
./Build db
sudo ./Build install

And it would do the right thing, tracking which migration scripts had been run by their numeric prefixes and updating a metadata record for each one. And that worked great…except when it came to maintaining database functions and procedures.

The problem in a nut-shell is this: Database functions are amazingly useful; they greatly simplify client interfaces and improve performance. But managing them in migration scripts is a recipe for a ton of code duplication. Why? Because--unlike your Perl or Ruby code or whatever, which you just modify in a library file and commit to your version control system--every time you have to modify a function, you have to paste the entire fucking thing in a new migration script and make your changes there.

Try to imagine, for a moment, what that means for a function such as the recurring_events_for() function in my piece on generating recurring events in the database. Say there was a bug in the function, where it was failing to return some instances of events that fall between the recurrence dates. A simple patch might look something like this:

@@ -22,7 +22,10 @@
                   recurrence <> 'none'
               OR  (
                      recurrence = 'none'
-                 AND starts_at BETWEEN range_start AND range_end
+                 AND (
+                         starts_at BETWEEN range_start AND range_end
+                      OR ends_at   BETWEEN range_start AND range_end
+                 )
               )
           )
     LOOP

Pretty straight-forward, right? But not for migration scripts!. To make this three-line change with migrations, I'd actually have to paste the entire 58-line function into a new migration script, make the changes and then commit. There are at least two problems with this approach: 1) A huge amount of code duplication for no good reason; and 2) No real change management! Someone looking at SVN::Notify notifications would not easily be able to see what changed, because the diff does not show what lines changed in the funcion, only that a whole new file with a whole new function was added. Yikes!

And it can get worse than this. For a client I'm currently working with, the state of the production database is tracked in Subversion by running a nightly cron job that dumps the database schema to a file and checks it into subversion. Mirgration scripts are written in pure SQL and named with bug numbers, but once they are pushed to production, the bloody migration scripts are deleted because the nightly schema dump reflets the changes!. So, for function updates, the are committed to Subversion en masse twice!

So I'd like to propose a different approach to database change management. Or, rather, a more diversified approach.

First, you can have numbered migration scripts. This works very well for simple schema changes, like adding a table, column, foreign key, or index, because the SQL statements that get committed in the scripts describe exactly what they're doing. ALTER statments are uncomplicated, and show only what they're altering. You can also easily write downgrade migration scripts to use in case something goes wrong, just by using the same script number in a different directory. So you'd have somehting like this:

sql/
sql/migrations/
sql/migrations/up/
sql/migrations/up/001-create_user_table.sql
sql/migrations/up/002-add_fk_for_user.sql
sql/migrations/down/
sql/migrations/down/001-drop_user_table.sql
sql/migrations/down/002-drop_fk_for_user.sql

That's the first part, not unlike how Rails migrations work or the stuff I wrote for a client last year. But then there's a second set of migration scripts. These are managed like Perl or Ruby libraries or what-have-you, such that you can just add and/or change files as necessary. It might look like this, with one database function or procedure per file:

sql/migrations/lib/
sql/migrations/lib/generate_recurrences.sql
sql/migrations/lib/recurring_events_for.sql

Each of these files has one function defined using CREATE OR REPLACE FUNCTION. This means that they can be run over and over again without problem. If I run into a bug in my recurring_events_for() function, I just change it in recurring_events_for.sql, commit it, and I'm done.

The code that handles the database build can then track both the migration number and the timestamp for the last commit migrated in the sql/migrations/lib directory. Once the numbered migrations are run, it can then decide what lib migrations to run by looking for those that have been modified since the last time migrations were run. In Perl, using Git, that'd be something like this:

sub run {
    my $cmd = shift;
    map { s/^\s+//; s/\s+$//; $_ } `$cmd`;
}

my $lib = 'lib';
my $date = '2009-04-01 00:01:00';
my ($rev) = run "git rev-list --reverse --since='$date' master -- '$lib'";

for my $script (
    map  { $_->[0] }
    sort { $a->[1] cmp $b->[1] }
    map  { chomp; [ $_ => run "git log -1 --pretty=format:%ci '$_'" ]  }
    run "git diff --name-only $rev^ '$lib'"
) {
    system qw(psql -f), $script;
}

First, we get the oldest revision SHA1 ID since the specified date and store it in $rev. The magic is in the for loop which, due to the antiquity of Perl list functions, you must read bottom-to-top (aside: this can be rectified by autoboxing). We use git diff --name-only to get a list of all the files changed in the directory since just before that revision. For file each, we get the date of the most recent commit, sort on the date, and then apply the migration by passing it to psql -f.

In reality, the code should also update the metadata table in the database with the date of each script as it's applied -- but only if it succeeds. If it fails, we just die. If you needed to migrate down, the code could just check out the files as they were at $rev^ and apply them. Yeah, use the change management interface to go back in time: who'da thought it?

Anyway, that's the idea. I think I'll implement this for an internal app I've been hacking on and off, just to see what kinds of issues might come up. Anyone else thought of something like this? Maybe there's something already out there? I'm not interested in automatic migrations like DBIx::Class Schema migrations, as I don't rely on ORMs anymore (probably should blog that one of these days, too). I'm just curious what other approaches folks have taken to database change management. Let me know.

Backtalk

Chris Winters wrote:

I like the addition of the file timestamp a lot, will have to look into integrating that into our internal tool. Also appreciate blogging about this stuff -- IME too many people think writing about maintenance is boring.

BTW, the way we organize updates to the schema or data (or configuration) is by feature. One of the things I think are missing from a lot of migration schemes is how they work in teams. Multiple people working on multiple features can develop each migration independently. This seems really useful to me, not sure if I'm missing something in existing tools that provides this.

I really need to write up this tool :-)

hdp wrote:

This doesn't directly address the function problem, but I agree about writing migrations in SQL -- I think the reason Rails doesn't is because ActiveRecord really wants to hide the database from you.

I've been looking at http://search.cpan.org/perldoc?DBIx::Migration::Directories for my own use, but haven't made the plunge yet.

Theory wrote:

@Chris—

Thanks! Not sure what you mean by organizing stuff by feature or working in teams, though. When I wrote Rails migrations, I would put the SQL for a new feature in a single migration; someone working on another feature would put database changes into a separate migration. But I expect you're talking about something a bit different, in that your migration tool somehow supports this more directly. I look forward to hearing more about it.

@hdp—

Rails puts stuff in Ruby so that you don't have to think about the database, yes, but this is because they hate databases (and they hate them because they don't (yet) understand them). As for DBIx::Migration::Directories, maybe I'm missing something, but it seems, at its core, to still be about numbered migrations. Am I missing something?

—Theory

elein wrote:

The problem with database upgrade scripts is that they are dependent and you cannot back out arbitrary changes. You can't back out a create table xxx without removing any other later changes referencing xxx. Things get complicated with function updates depending on views that have changed. You've got a good start, but the schema upgrade solution is not complete. This is a hard problem but solvable. Fallible human eyeballing is sometimes the best solution for resolving dependencies. Grouping changes can be interesting, but there is complexity when groups depend on groups. The best solution is to never backout changes and only write new updates which address dependencies, moving always forward.

John wrote:

I always cringe when I see migration-style "up/down" database scripts because they imply a symmetry that may not actually exist. Most data modifications, for example, cannot be rolled back (UPDATE, DELETE, etc.) Now maybe you'd say those are not "schema migrations," but in my experience, data modifications are extremely common--much more common than schema changes, especially for long-lived products--and I usually see up/down-style systems used to carry them out.

Anyway, at work we have a similar hybrid system with all stored procedures in their own files in version control, providing a comprehensible change history. But we also have separate patch files that contain the superset of all changes, plus the usual "full schema" files and an automated way to verify that the application of any series of patches results in the a database that is exactly equivalent to the full schema file of the equivalent version.

The patch files are all pure SQL with a few specially-formatted comments that our patch automation system understands. We do nothing based on dates (in version control or otherwise). Even with NTP, and with servers across many time zones, it's just too much of a hassle. All patch automation uses (db-resident) version numbering, plus a few simple patch file naming conventions to know what needs to be applied to bring a database up to any arbitrary version. We also have automated db creation from scratch, which includes the full schema and SPLs plus the "boilerplate" real production data, plus some optional generated "fake" data.

In the end, the realities of dealing "big" databases in a 24/7 environment mean that our patch automation is used only in development and QA (and also to verify the integrity of the patches sent to production). But when it comes time to apply them to production, this is all done by dedicated DBAs and release engineers and the steps are often very different, involving manual work to shuffle dbspaces and bring services down and then back up in the proper sequence to cause minimal downtime depending on which parts of the db are being modified, etc. etc. (Not having reliable transactional schema alterations really makes this much more painful. Oh PostgreSQL, how I miss you! :)

Josh Jore wrote:

We had to solve many of these problems at work recently. No blog post about it yet. Roughly we implemented much of what you outlined. Our additional complications are many concurrent teams doing branched development on the databases and slony.

Did it all in Ruby with something roughly built on top of activerecord.

Piet Hadermann wrote:

I once wrote a migration tool like this for ms sqlserver (sorry for swearing) which was used to update the db at several 100 customer when the installed a new version of our software.

Big help was a commercial tool called sqlcompare to generate a base migration script between 2 schema versions. In the beginning we didn't store changes in a versioning system. And even after we started doing so sqlcompare was still very handy to generate migration scripts at the click of a button.

These scripts were stored together with a version number in a table within the database itself. Why ? Because besides the 'main' database there were also archive databases that could be attached dynamically from eg a network share or cd-rom (after copying to a local hd). After bringing such an archive online it also had to be updated to the latest version by cycling through all the necessary migration scripts.

After each succesful execution of a migration script a version number in the database itself was updated to reflect this. Much like Rails migrations. (I LOVE Rails btw, and not only for the migrations)

Reversing/undo of a migration wasn't something I ever considered. When something went wrong it was fixed instead of reversing to a previous version (which would have also meant installing all previous binaries on both server and clients - too much hassle). By the time the majority of the customers installed an update though (usually from cdrom) we had tested the update at several installations and hopefully (not always) weeded most of the bugs out.

All of this worked using mainly sqlserver stored procedures. And when it didn't, it caused me and several others major headaches.

Theory wrote:

@elein—

I have to admit that I always found the downward migrations in Rails pretty useless. As long as a migration was well tested before it was deployed, we had no problems. With over 100 migrations for iwantsandy, we never migrated down. Ever.

What we did do, though, was discover new problems, (mostly related to performance, as with recurring events), at which point we spent some time to figure out the problem and then wrote a new migration.

So I agree that you can't back them out, or at least that backing them out is pretty useless. So maybe I won't bother with downward migrations the next time I write something like this. It'll save me a lot of work, frankly.

I'm not sure what you mean by “the schema upgrade solution is not complete,” though. I'm sure that I'm missing something, but adding date-based migrations (basically checking mtime on files) for functions and views and such goes a long way towards solving my biggest complaints. And as for dependencies, well, no migration should ever be deployed to production without adequate testing.

@John—

Yeah, the “down” migrations are right out. I wasn't thinking specifically of data migrations, though I have written downward data migrations in the past. More often than not, though, for data migrations, there was no downward migration, because it just wasn't possible. And yeah, data migrations are really common.

Keeping files for each procedure in the VCS is exactly the aim of my recommendations here. I'm envious that you have them already! I would use UTC to compare all times, so no comparison problems there. I agree about NTP, though one must sync one's computer clocks. But by the time I generally see a migration go to production, it has usually been quite a while since it was written (because significant testing has to verify the migration, first).

I'm curious about your patch file naming conventions, though, if they're not version numbers.

@Josh—

I look forward to hearing more about your solution. Will there be a blog post?

@Piet—

A tool like SQLCompare is nice if you can get it, and if you know that it has actually generated an appropriate migration (lots of testing and QA required as usual), but I'd rather do without it.

Thanks for all the great comments, folks!

—Theory

John wrote:

I'm curious about your patch file naming conventions, though, if they're not version numbers.

They are. That's all I meant by, "All patch automation uses (db-resident) version numbering, plus a few simple patch file naming conventions." The patch files have version numbers (plus a "part" number for multi-part patches). To upgrade from db version N to version N + M, just sort the patch files by version (and, secondarily, by part number) where version > N and version <= N + M. (The db-resident schema version is automatically updated after the successful application of each file (or files) for each new version, and patches will refuse to apply to anything except a database whose version matches the expected earlier version.)