Just a Theory

Black lives matter

Posts about SQL

The Ever Evolving Sqitch Plan

I’ve been working on the parser for the proposed new deployment plan format, and spent a day thinking it wasn’t going to work at all. But by the end of the day yesterday, I was back on board with it. I think I’ve also figured out how to eliminate the VCS dependency (and thus a whole level of complication). So first, the plan format:

  • Names of things (steps and tags) cannot start or end in punctuation characters
  • @ indicates a tag
  • + indicates a step to be deployed
  • - indicates a step to be reverted
  • # starts a comment
  • : indicates a required step
  • ! indicates a conflicting step
  • % is for Sqitch directives

So, here’s an example derived from a previous example:

%plan-syntax-v1
+roles
+users_table
+dr_evil
@alpha

# Some procedures.
+add_user :roles :users_table
+del_user :@alpha
-dr_evil
+upd_user :add_user !dr_evil
@beta     # woo!

So we start with a directive for the version of the plan file (thanks for the suggestion, Tom Davis!). Then we have deployments of the roles, users_table, and dr_evil steps. After that, it’s tagged as alpha.

Next, we have a comment, then the deployment of the add_user step. It requires that the roles and users_table steps be deployed. Then we deploy del_user. It requires all steps as of the $alpha tag. Next we revert the dr_evil step. Why? Because the next line deploys upd_user, which conflicts with dr_evil (and requires add_user). And finally, we tag it as beta.

There are a number of things I like about these changes:

  • Dependencies are spelled out in the plan file, rather than the deploy scripts. This allows the deploy scripts to have nothing special about them at all.

  • We now have a way to explicitly revert a step as part of the overall plan. This is useful for ensuring that conflicts can be dealt with.

  • We can deploy to any point in the plan by specifying a step:

    sqitch deploy add_user
    

    Or a tag:

    sqitch deploy @alpha
    

    For steps that are duplicated, we can disambiguate by specifying a tag:

    sqitch deploy dir_evil --as-of @alpha
    

    Naturally, this requires that a step not be repeated within the scope of a single tag.

Now, as for the VCS dependency, my impetus for this was to allow Sqitch to get earlier versions of a particular deploy script, so that it could be modified in place and redeployed to make changes inline, as described in an earlier post. However I’ve been troubled as to how to indicate in the plan where to look in the VCS history for a particular copy of a file. Yesterday, I had an insight: why do I need the earlier version of a particular deploy script at all? There are two situations where it would be used, assuming a plan that mentions the same step at two different points:

  1. To run it as it existed at the first point, and to run it the second time as it exists at that time.
  2. To run it in order to revert from the second point to the first.

As to the first, I could not think of a reason why that would be necessary. If I’m bootstrapping a new database, and the changes in that file are idempotent, is it really necessary to run the earlier version of the file at all? Maybe it is, but I could not think of one.

The second item is the bit I wanted, and I realized (thanks in part to prompt from Peter van Hardenberg while at PGCon) that I don’t need a VCS to get the script as it was at the time it was deployed. Instead, all I have to do is store the script in the database as it was at the time it was run. Boom, reversion time travel without a VCS.

As an example, take the plan above. Say we have a database that has been deployed all the way to @beta. Let’s add the add_user step again:

%plan-syntax-v1
+roles
+users_table
+dr_evil
@alpha

# Some procedures.
+add_user :roles :users_table
+del_user :@alpha
-dr_evil
+upd_user :add_user !dr_evil
@beta     # woo!

+crypto
+add_user :roles :users_table :crypto
@gamma

The last two lines are the new ones. At this point, the sql/deploy/add_user.sql script has been modified to fix a bug that now requires the crypto step. If we deploy to a new database, Sqitch will notice that the same step is listed twice and apply it only once. This works because, even though add_user is listed before pg_crypto, it is actually applied as described in its second declaration. So the run order would be:

  • crypto
  • add_user
  • del_user
  • upd_user

Note that this works because crypto declares no dependencies itself. If it did, it would be shuffled as appropriate. It would not work if it required, say, upd_user, as that would create a circular dependency (add_usercryptoupd_useradd_user).

Now say we want to deploy the changes to the production database, which is currently on @beta. That simply runs:

  • crypto
  • add_user

If something went wrong, and we needed to revert, all Sqitch has to do is to read add_user from the database, as it was deployed previously, and run that. This will return the add_user function to its previous state. So, no duplication and no need for a VCS.

The one thing that scares me a bit is being able to properly detect circular dependencies in the plan parser. I think it will be pretty straight-forward for steps that require other steps. Less so for steps that require tags. Perhaps it will just have to convert a tag into an explicit dependence on all steps prior to that tag.

So, I think this will work. But I’m sure I must have missed something. If you notice it please enlighten me in the comments. And thanks for reading this far!

Looking for the comments? Try the old layout.

Sqitch: Where to Define Dependencies?

I’ve been hard at work the last couple of days working on the new plan file format. It looks like this:

roles
users_table
@alpha

# Some procedures.
add_user
del_user
upd_user
@beta     # woo!

The new code parses this, finding all steps and tags, and can rewrite the file exactly how it read it, including blank lines and comments. All of this is toward requiring a plan file and depending less on the VCS. I’ve also just added methods for adding new steps and tags to the plan. In doing so, made sure that all dependencies properly resolve, and throw an error if they don’t. Dependencies will then be written at the top of the deployment file like so:

-- :requires: roles
-- :requires: users_table

The plan parser is smart enough to parse these out of the files when parsing the plan, so it’s easy for the user to add dependencies just by editing the deploy file.

As I was working on this, I realized that that may not be necessary. Since the plan file will now be required, we could instead specify dependencies in the plan file. Maybe something like this:

roles
users_table
@alpha

# Some procedures.
add_user +roles +users_table
del_user +@alpha
upd_user +add_user -dr_evil
@beta     # woo!

The idea is that required steps and tags could be specified on the same line as the named step with preceding plus signs. Conflicting steps and tags could be specified with a preceding minus sign. Here, the add_user step requires the roles and users_table steps. The del_user step requires the @alpha tag. And the upd_user step requires the add_user step but conflicts with the dr_evil step.

There are a couple of upsides to this:

  • Dependencies are specified all in once place.
  • Plan parsing is much faster, because it no longer has to also parse every deploy script.
  • There is no need for any special syntax in the deploy scripts, which could theoretically conflict with some database-specific script formatting (a stretch, I realize).

But there are also downsides:

  • Changing dependencies would require editing the plan file directly.
  • The appearance of the plan file is someone more obscure.
  • It’s more of a PITA to edit the plan file.
  • Adding commands to change dependencies in the plan file might be tricky.

But I am thinking that the advantages might outweigh the disadvantages. Thoughts?

Looking for the comments? Try the old layout.

Sqitch Update: The Plan

I gave my first presentation on Sqitch at PGCon last week. The slides are on Slideshare and the PGCon site. It came together at the last minute, naturally. I was not able to pay as close attention to PGCon sessions as I would have liked, as I was doing last minute hacking to get the deploy command working on PostgreSQL, and then writing the slides (which are based on the tutorial). I was pleased with the response, given that this is very much a project that is still under heavy development and available only as a very very early alpha. There was great discussion and feedback afterward, which I appreciate.

A number of folks offered to help, too. For that I am grateful. I’ve started a list of to-dos to give folks a starting point. Please fork and hack! Find me on #sqitch on Freenode for questions/comments/discussion.

But back to the guts. As a result of the work on the deploy command, as well as thinking about how I and my co-workers do database development with Git, I am starting to revise how I think about the deployment plan. You see, I personally often make a lot of changes to a deployment script as I develop a database, generally over many commits and even many days or weeks. If I were to then rely on the Git history to do deployments, it would probably work, but there might be ten times as many deployments as I actually need, just to get it from zero to release state. I had originally thought that using sqitch bundle --tags-only to create a bundle with a written plan would get around this, as it would write a plan file with only VCS tags for Sqitch tags, rather than every commit. That might be okay for releases, but still not great for the developers, such as myself, who will be using Sqitch as part of the development process all day long.

So now I’m thinking more that Sqitch should rely on an explicit plan file (which was to be the preferred method, if it existed, all along) rather than VCS history. That is, the plan file would be required, and a new command, sqitch plan, will allow one to interactively add steps and tags to it. It would also make it easier for the developer to hand-edit, as appropriate, so as not to rely on a funky Git history.

So I’m toying with changing the plan format, which up until now looked likes this:

[alpha]
foo
bar
init

[beta]
users
insert_user
delete_user
update_user

[gamma]
widgets
insert_widget

Each item in brackets is a tag, and each item below is a deployment step (which corresponds to a script) that is part of that tag. So if you deployed to the beta tag, it would deploy all the way up to update_user step. You could only specify tags for deployment, and either all the steps for a given tag succeeded or they failed. When you added a step, it was added to the most recent tag.

I came up with this approach by playing with git log. But now I’m starting to think that it should feel a bit more gradual, where steps are added and a tag is applied to a certain step. Perhaps a format like this:

foo
bar
init
@alpha

users
insert_user
delete_user
update_user
@beta

widgets
insert_widget

With this approach, one could deploy or revert to any step or tag. And a tag is just added to a particular step. So if you deployed to @beta, it would run all the steps through update_user, as before. But you could also update all, deploy through insert_widget, and then the current deployed point in the database would not have a tag (could perhaps use a symbolic tag, HEAD?).

I like this because it feels a bit more VCS-y. It also makes it easier to add steps to the plan without worrying about tagging before one was ready. And adding steps and tags can be automated by a sqitch plan command pretty easily.

So the plan file becomes the canonical source for deployment planning, and is required. What we’ve lost, however, is the ability to use the same step name at different points in the plan, and to get the proper revision of the step by traveling back in VCS history for it. (Examples of what I mean are covered in a previous post, as well as the aforementioned presentation.) However, I think that we can still do that by complementing the plan with VCS history.

For example, take this plan:

foo
bar
init
@alpha

users
insert_user
delete_user
update_user
@beta

insert_user
update_user
@gamma

Note how insert_user and update_user repeat. Normally, this would not be allowed. But if the plan is in a VCS, and if that VCS has tags corresponding to the tags, then we might allow it: when deploying, each step would be deployed at the point in time of the tag that follows it. In other words:

  • foo, bar, and init would be deployed as of the alpha tag.
  • users, insert_user, delete_user, and update_user would be deployed as they were as of the beta tag.
  • insert_user and update_user would again be deployed, this time as of the gamma tag.

This is similar to what I’ve described before, in terms of where in VCS history steps are read from. But whereas before I was using the VCS history to derive the plan, I am here reversing things, requiring an explicit plan and using its hints (tags) to pull stuff from the VCS history as necessary.

I think this could work. I am not sure if I would require that all tags be present, or only those necessary to resolve duplications (both approaches feel a bit magical to me, though I haven’t tried it yet, either). The latter would probably be more forgiving for users. And overall, I think the whole approach is less rigid, and more likely to allow developers to work they way they are used to working.

But I could be off my rocker entirely. What do you think? I want to get this right, please, if you have an opinion here, let me have it!

Looking for the comments? Try the old layout.

Sqitch Status: A Step at a Time

I’ve just released Sqitch v0.20-TRIAL, the third testing release of Sqitch. Since last week, I’ve implemented add-step. So let’s have a look-see at what all it can do. First, let’s initialize a Sqitch project.

> mkdir myproj 
> cd myproj 
myproj> git init
Initialized empty Git repository in myproj/.git/
myproj> sqitch --engine pg init
Created sql/deploy
Created sql/revert
Created sql/test
Created ./sqitch.conf

Doesn’t look like much, does it? Let’s set the database name and look at the configuration:

myproj> sqitch config core.pg.db_name flipr_test
myproj> less sqitch.conf
[core]
    engine = pg
    # plan_file = sqitch.plan
    # sql_dir = sql
    # deploy_dir = sql/deploy
    # revert_dir = sql/revert
    # test_dir = sql/test
    # extension = sql
# [core "pg"]
    # db_name = 
    # client = psql
    # sqitch_schema = sqitch
    # password = 
    # port = 
    # host = 
    # username = 
[core "pg"]
    db_name = flipr_test

I’ve made an effort to make the default configuration file as useful as possible by including all the core and engine settings. Defaults are present, too, but commented-out. Some you’d probably never want to change in the local file, but might in your user file or in the system configuration file. Peruse the sqitch-config man page for all the Git-like awesomeness.s

So now we can add a step:

myproj> sqitch add-step user_roles
Created sql/deploy/user_roles.sql
Created sql/revert/user_roles.sql
Created sql/test/user_roles.sql

Wee! Again, doesn’t look like much, I know. But in fact the generated scripts are created from Template::Tiny templates, and again, they can be overridden on a user or system basis. Have a look at the add-step man page for the details. Or just start with what’s there: edit the generated scripts to deploy and revert your changes. Go crazy. The deploy script looks like this:

myproj> less sql/deploy/user_roles.sql 
-- Deploy user_roles

BEGIN;

-- XXX Add DDLs here.

COMMIT;

Next up, deployment. I think that will require that the plan interface be written, first. I’ll be getting on that tomorrow.

Looking for the comments? Try the old layout.

Sqitch Update

A quick update on Sqitch. I started implementation about a couple of weeks ago. It’s coming a long a bit more slowly than I’d like, given that I need to give a presentation on it soon. But I did things a little differently than I usually do with project like this: I wrote documentation first. In addition to the basic docs I posted a couple weeks back, I’ve written a tutorial. I put quite a lot of time into it, studying the Git interface as I did so, to try to develop useful workflows. The nice thing about this it that it will not only serve as the foundation for my presentation (PHEW! Half the work done already!), but it also serves as a design specification.

So I’ve been diligently plugging away on it, and have uploaded a couple of trial releases to CPAN. So far, we have decent support for:

  • sqitch help and sqitch help command. The latter only works for the implemented commands, of course.
  • sqitch config, which is a near perfect duplication of git-config, thanks to the very useful Config::GitLike. It supports a local, project-specific config file, a user config file, and a system config file.
  • sqitch init, which creates a new project by creating directories for the deploy, revert, and test scripts, and writes a project-specific config file. This file has options you specify in the call to sqitch (such as the database engine you plan to use), and all unmodified settings or settings set in user or system configuration are written out as comments.

So yeah, not a ton so far, but the foundations for how it all goes together are there, so it should take less time to develop other commands, all things being equal.

Next up:

  • sqitch add-step, which will create deploy and revert scripts for a new step, based on simple templates.
  • sqitch deploy, which is the big one. Initial support will be there for PostgreSQL and SQLite (and perhaps MySQL).

Interested in helping out?

  • I’m going to need a parser for the plan file pretty soon. The interface will need an iterator to move back and forth in the file, as well as a way to write to the file, add steps to it, etc. The grammar is pretty simple, so anyone familiar with parsers and iterators could probably knock something out pretty quickly.

  • The interface for testing needs some thinking through. I had been thinking that it could be something as simple as just diffing the output of a script file against an expected output file, at least to start. One could even use pgTAP or MyTAP in such scripts, although it might be a pain to get the output exactly right for varying environments. But maybe that doesn’t matter for deployment, so much? Because it tends to be to a more controlled environment than your typical open-source library test suite, I mean.

Got something to add? Fork it!

Looking for the comments? Try the old layout.

Sqitch — VCS-powered SQL Change Management

Back in January, I wrote three posts outlining some ideas I had about a straight-forward, sane way of managing SQL change management. The idea revolved around specifying scripts to deploy and revert in a plan file, and generating that plan file from VCS history. I still feel pretty good about the ideas there, and work has agreed to let me write it and open-source it. Here is the first step making it happen. I call it “Sqitch.”

Why “Sqitch”? Think of it as SQL changes with Git stuck in the middle. Of course I expect to support VCSs other than Git (probably Subversion and Mercurial, though I am not sure yet), but since Git is what I now have the most familiarity with, I thought it kind of fun to kind of reference a VCS in the name, if only obliquely.

This week, I started work on it. My first task is to outline a draft for the interface. Sqitch will be a command-line tool, primarily. The remainder of this post contains the documentation for the draft interface. Thoughts and feedback would be greatly appreciated, especially if you think I’ve overlooked anything! I do want to keep features pretty minimal for now, though, to build up a solid core to be built on later. But other than that, your criticism is greatly desired.

Next up, I will probably write a tutorial, just so I can make my way through some real-life(ish) examples and notice if I missed anything else. Besides, I’m going to need the tutorial myself! Watch for that next week.

Thanks!


Name

Sqitch - VCS-powered SQL change management

Synopsis

sqitch [<options>] <command> [<command-options>] [<args>]

Description

Sqitch is a VCS-aware SQL change management application. What makes it different from your typical migration-style approaches? A few things:

No opinions

Sqitch is not integrated with any framework, ORM, or platform. Rather, it is a standalone change management system with no opinions on your database or development choices.

Native scripting

Changes are implemented as scripts native to your selected database engine. Writing a PostgreSQL application? Write SQL scripts for psql. Writing a MySQL-backed app? Write SQL scripts for mysql.

VCS integration

Sqitch likes to use your VCS history to determine in what order to execute changes. No need to keep track of execution order, your VCS already tracks information sufficient for Sqitch to figure it out for you.

Dependency resolution

Deployment steps can declare dependencies on other deployment steps. This ensures proper order of execution, even when you’ve committed changes to your VCS out-of-order.

No numbering

Change deployment is managed either by maintaining a plan file or, more usefully, your VCS history. As such, there is no need to number your changes, although you can if you want. Sqitch does not care what you name your changes.

Packaging

Using your VCS history for deployment but need to ship a tarball or RPM? Easy, just have Sqitch read your VCS history and write out a plan file with your change scripts. Once deployed, Sqitch can use the plan file to deploy the changes in the proper order.

Reduced Duplication

If you’re using a VCS to track your changes, you don’t have to duplicate entire change scripts for simple changes. As long as the changes are idempotent, you can change your code directly, and Sqitch will know it needs to be updated.

Terminology

step

A named unit of change. A step name must be used in the file names of its corresponding deployment and a reversion scripts. It may also be used in a test script file name.

tag

A known deployment state with a list one or more steps that define the tag. A tag also implies that steps from previous tags in the plan have been applied. Think of it is a version number or VCS revision. A given point in the plan may have one or more tags.

state

The current state of the database. This is represented by the most recent tag or tags deployed. If the state of the database is the same as the most recent tag, then it is considered “up-to-date”.

plan

A list of one or more tags and associated steps that define the order of deployment execution. Sqitch reads the plan to determine what steps to execute to change the database from one state to another. The plan may be represented by a “Plan File” or by VCS history.

deploy

The act of deploying database changes to reach a tagged deployment point. Sqitch reads the plan, checks the current state of the database, and applies all the steps necessary to change the state to the specified tag.

revert

The act of reverting database changes to reach an earlier tagged deployment point. Sqitch checks the current state of the database, reads the plan, and applies reversion scripts for all steps to return the state to an earlier tag.

Options

-p --plan-file  FILE    Path to a deployment plan file.
-e --engine     ENGINE  Database engine.
-c --client     PATH    Path to the engine command-line client.
-d --db-name    NAME    Database name.
-u --username   USER    Database user name.
-h --host       HOST    Database server host name.
-n --port       PORT    Database server port number.
   --sql-dir    DIR     Path to directory with deploy and revert scripts.
   --deploy-dir DIR     Path to directory with SQL deployment scripts.
   --revert-dir DIR     Path to directory with SQL reversion scripts.
   --test-dir   DIR     Path to directory with SQL test scripts.
   --extension  EXT     SQL script file name extension.
   --dry-run            Execute command without making any changes.
-v --verbose            Increment verbosity.
-V --version            Print the version number and exit.
-H --help               Print a usage statement and exit.
-M --man                Print the complete documentation and exit.

Options Details

-p
--plan-file

sqitch –plan-file plan.conf sqitch -p sql/deploy.conf

Path to the deployment plan file. Defaults to ./sqitch.plan. If this file is not present, Sqitch will attempt to read from VCS files. If no supported VCS system is in place, an exception will be thrown. See “Plan File” for a description of its structure.

-e
--engine

sqitch –engine pg sqitch -e sqlite

The database engine to use. Supported engines include:

-c
--client

sqitch –client /usr/local/pgsql/bin/psql sqitch -c /usr/bin/sqlite3

Path to the command-line client for the database engine. Defaults to a client in the current path named appropriately for the specified engine.

-d
--db-name

Name of the database. For some engines, such as PostgreSQL and MySQL, the database must already exist. For others, such as SQLite, the database will be automatically created on first connect.

-u
--user
--username

User name to use when connecting to the database. Does not apply to all engines.

-h
--host

Host name to use when connecting to the database. Does not apply to all engines.

-n
--port

Port number to connect to. Does not apply to all engines.

--sql-dir

sqitch –sql-dir migrations/

Path to directory containing deployment, reversion, and test SQL scripts. It should contain subdirectories named deploy, revert, and (optionally) test. These may be overridden by --deploy-dir, --revert-dir, and --test-dir. Defaults to ./sql.

--deploy-dir

sqitch –deploy-dir db/up

Path to a directory containing SQL deployment scripts. Overrides the value implied by --sql-dir.

--revert-dir

sqitch –revert-dir db/up

Path to a directory containing SQL reversion scripts. Overrides the value implied by --sql-dir.

--test-dir

sqitch –test-dir db/t

Path to a directory containing SQL test scripts. Overrides the value implied by --sql-dir.

--extension

sqitch –extension ddl

The file name extension on deployment, reversion, and test SQL scripts. Defaults to sql.

--dry-run

sqitch –dry-run

Execute the Sqitch command without making any actual changes. This allows you to see what Sqitch would actually do, without doing it. Implies a verbosity level of 1; add extra --verboses for greater verbosity.

-v
--verbose

sqitch –verbose -v

A value between 0 and 3 specifying how verbose Sqitch should be. The default is 0, meaning that Sqitch will be silent. A value of 1 causes Sqitch to output some information about what it’s doing, while 2 and 3 each cause greater verbosity.

-H
--help

sqitch –help sqitch -H

Outputs a brief description of the options supported by sqitch and exits.

-M
--man

sqitch –man sqitch -M

Outputs this documentation and exits.

-V
--version

sqitch –version sqitch -V

Outputs the program name and version and exits.

Sqitch Commands

init

Initialize the database and create deployment script directories if they do not already exist.

status

Output information about the current status of the deployment, including a list of tags, deployments, and dates in chronological order. If any deploy scripts are not currently deployed, they will be listed separately.

check

Sanity check the deployment scripts. Checks include:

  • Make sure all deployment scripts have complementary reversion scripts.

  • Make sure no deployment script appears more than once in the plan file.

deploy

Deploy changes. Configuration properties may be specified under the [deploy] section of the configuration file, or via sqitch config:

sqitch config deploy.$property $value

Options and configuration properties:

--to

Tag to deploy up to. Defaults to the latest tag or to the VCS HEAD commit. Property name: deploy.to.

revert

Revert changes. Configuration properties may be specified under the [revert] section of the configuration file, or via sqitch config:

sqitch config revert.$property $value

Options and configuration properties:

--to

Tag to revert to. Defaults to reverting all changes. Property name: revert.to.

test

Test changes. All SQL scripts in --test-dir will be run. [XXX Not sure whether to have subdirectories for tests and expected output and to diff them, or to use some other approach.]

config

Set configuration options. By default, the options will be written to the local configuration file, sqitch.ini. Options:

--get

Get the value for a given key. Returns error code 1.

--unset

Remove the line matching the key from config file.

--list

List all variables set in config file.

--global

For writing options: write to global ~/.sqitch/config.ini file rather than the local sqitch.ini.

For reading options: read only from global ~/.sqitch/config.ini rather than from all available files.

--system

For writing options: write to system-wide $prefix/etc/sqitch.ini file rather than the local sqitch.ini.

For reading options: read only from system-wide $prefix/etc/sqitch.ini rather than from all available files.

--config-file

Use the given config file.

package

Package up all deployment and reversion scripts and write out a plan file. Configuration properties may be specified under the [package] section of the configuration file, or via sqitch config package.$property $value command. Options and configuration properties:

--from

Tag to start the plan from. All tags and steps prior to that tag will not be included in the plan, and their change scripts Will be omitted from the package directory. Useful if you’ve rejiggered your deployment steps to start from a point later in your VCS history than the beginning of time. Property name: package.from.

--to

Tag with which to end the plan. No steps or tags after that tag will be included in the plan, and their change scripts will be omitted from the package directory. Property name: package.to.

--tags-only

Write the plan file with deployment targets listed under VCS tags, rather than individual commits. Property name: package.tags_only.

--destdir

Specify a destination directory. The plan file and deploy, revert, and test directories will be written to it. Defaults to “package”. Property name: package.destdir.

Configuration

Sqitch configuration information is stored in standard INI files. The # and ; characters begin comments to the end of line, blank lines are ignored.

The file consists of sections and properties. A section begins with the name of the section in square brackets and continues until the next section begins. Section names are not case sensitive. Only alphanumeric characters, - and . are allowed in section names. Each property must belong to some section, which means that there must be a section header before the first setting of a property.

All the other lines (and the remainder of the line after the section header) are recognized as setting properties, in the form name = value. Leading and trailing whitespace in a property value is discarded. Internal whitespace within a property value is retained verbatim.

All sections are named for commands except for one, named “core”, which contains core configuration properties.

Here’s an example of a configuration file that might be useful checked into a VCS for a project that deploys to PostgreSQL and stores its deployment scripts with the extension ddl under the migrations directory. It also wants packages to be created in the directory _build/sql, and to deploy starting with the “gamma” tag:

[core]
    engine    = pg
    db        = widgetopolis
    sql_dir   = migrations
    extension = ddl

[revert]
    to        = gamma

[package]
    from      = gamma
    tags_only = yes
    dest_dir  = _build/sql

Core Properties

This is the list of core variables, which much appear under the [core] section. See the documentation for individual commands for their configuration options.

plan_file

The plan file to use. Defaults to sqitch.ini or, if that does not exist, uses the VCS history, if available.

engine

The database engine to use. Supported engines include:

client

Path to the command-line client for the database engine. Defaults to a client in the current path named appropriately for the specified engine.

db_name

Name of the database.

username

User name to use when connecting to the database. Does not apply to all engines.

password

Password to use when connecting to the database. Does not apply to all engines.

host

Host name to use when connecting to the database. Does not apply to all engines.

port

Port number to connect to. Does not apply to all engines.

sql_dir

Path to directory containing deployment, reversion, and test SQL scripts. It should contain subdirectories named deploy, revert, and (optionally) test. These may be overridden by deploy_dir, revert_dir, and test_dir. Defaults to ./sql.

deploy_dir

Path to a directory containing SQL deployment scripts. Overrides the value implied by sql_dir.

revert_dir

Path to a directory containing SQL reversion scripts. Overrides the value implied by sql_dir.

test_dir

Path to a directory containing SQL test scripts. Overrides the value implied by sql_dir.

extension

The file name extension on deployment, reversion, and test SQL scripts. Defaults to sql.

Plan File

A plan file describes the deployment tags and scripts to be run against a database. In general, if you use a VCS, you probably won’t need a plan file, since your VCS history should be able to provide all the information necessary to derive a deployment plan. However, if you really do need to maintain a plan file by hand, or just want to better understand the file as output by the package command, read on.

Format

The contents of the plan file are plain text encoded as UTF-8. It is divided up into sections that denote deployment states. Each state has a bracketed, space-delimited list of one or more tags to identify it, followed by any number of deployment steps. Here’s an example of a plan file with a single state and a single step:

[alpha]
users_table

The state has one tag, named “alpha”, and one step, named “users_table”. A state may of course have many steps. Here’s an expansion:

[root alpha]
users_table
insert_user
update_user
delete_user

This state has two tags, “root” and “alpha”, and four steps, “users_table”, “insert_user”, “update_user”, and “delete_user”.

Most plans will have multiple states. Here’s a longer example with three states:

[root alpha]
users_table
insert_user
update_user
delete_user

[beta]
widgets_table
list_widgets

[gamma]
ftw

Using this plan, to deploy to the “beta” tag, the “root”/“alpha” state steps must be deployed, as must the “beta” steps. To then deploy to the “gamma” tag, the “ftw” step must be deployed. If you then choose to revert to the “alpha” tag, then the “gamma” step (“ftw”) and all of the “beta” steps will be reverted in reverse order.

Using this model, steps cannot be repeated between states. One can repeat them, however, if the contents for a file in a given tag can be retrieved from a VCS. An example:

[alpha]
users_table

[beta]
add_widget
widgets_table

[gamma]
add_user

[44ba615b7813531f0acb6810cbf679791fe57bf2]
widgets_created_at

[HEAD epsilon master]
add_widget

This example is derived from a Git log history. Note that the “add_widget” step is repeated under the state tagged “beta” and under the last state. Sqitch will notice the repetition when it parses this file, and then, if it is applying all changes, will fetch the version of the file as of the “beta” tag and apply it at that step, and then, when it gets to the last tag, retrieve the deployment file as of its tags and apply it. This works in reverse, as well, as long as the changes in this file are always idempotent.

Grammar

Here is the EBNF Grammar for the plan file:

plan-file   = { <state> | <empty-line> | <comment> }* ;

state       = <tags> <steps> ;

tags        = "[" <taglist> "]" <line-ending> ;
taglist     = <name> | <name> <white-space> <taglist> ;

steps       = { <step> | <empty-line> | <line-ending> }* ;
step        = <name> <line-ending> ;

empty-line  = [ <white-space> ] <line-ending> ;
line-ending = [ <comment> ] <EOL> ;
comment     = [ <white-space> ] "#" [ <string> ] ;

name        = ? non-white space characters ? ;
white-space = ? white space characters ? ;
string      = ? non-EOL characters ? ;

See Also

The original design for Sqitch was sketched out in a number of blog posts:

Other tools that do database change management include:

Rails migrations

Numbered migrations for Ruby on Rails.

Module::Build::DB

Numbered changes in pure SQL, integrated with Perl’s Module::Build build system. Does not support reversion.

DBIx::Migration

Numbered migrations in pure SQL.

Versioning

PostgreSQL-specific dependency-tracking solution by depesz.

Author

David E. Wheeler <david@justatheory.com>

License

Copyright (c) 2012 iovation Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Looking for the comments? Try the old layout.

SQL Change Management Sans Duplication

In the previous episode in this series, I had one more issue with regard to SQL change management that I wanted to resolve:

  1. There is still more duplication of code than I would like, in that a procedure defined in one change script would have to be copied whole to a new script for any changes, even simple single-line changes.

So let’s see what we can do about that. Loading it into Git, our first example looks like this:

> alias sqlhist="git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print \"\"} /./'"
> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (HEAD, gamma, master)]
sql/deploy/add_user.sql

(Aside: I’ve created an alias, sqlhist, on the first line, so that all the Git and Awk magic doesn’t clutter the remaining examples.)

So, we’ve got the creation of the users table under the alpha tag, the addition of the widgets table and an accompanying add_widget() function under the beta tag, and the creation of the add_user() function under the gamma tag. So far so good. Now, let’s say that gamma has been deployed to production, and now we’re ready to add a feature for the next release.

Modify This

It turns out that our users really want a timestamp for the time a widget was created. So let’s add a new change script that adds a created_at column to the widgets table. First we add sql/deploy/widgets_created_at.sql with:

-- requires: widgets_table
ALTER TABLE widgets ADD created_at TIMESTAMPTZ;

And then the accompanying revert script, sql/revert/widgets_created_at.sql:

ALTER TABLE widgets DROP COLUMN IF EXISTS created_at;

Commit them and now our deployment configuration looks like this:

> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2 (HEAD, master)]
sql/deploy/widgets_created_at.sql

So far so good. We have a simple delta script that modifies the existing table, and there is no code duplication. Time to modify the add_widget() function to insert the timestamp. Recall that, in the first article in this series, I created a separate sql/deploy/add_widgets_v2.sql file, copied the existing function in its entirety into the new file, and modified it there. If we were to do that here, the resulting deployment configuration would look something like this:

> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2]
sql/deploy/widgets_created_at.sql

[dfba488cfd9145928a25d8d48de3231da84s4bd2 (HEAD, master)]
sql/deploy/add_widget_v2.sql

Which would be fine, except that if someone else wanted to see what had changed, here’s what git diff would output:

diff --git a/sql/deploy/add_widget_v2.sql b/sql/deploy/add_widget_v2.sql
new file mode 100644
index 0000000..9132195
--- /dev/null
+++ b/sql/deploy/add_widget_v2.sql
@@ -0,0 +1,8 @@
+-- requires widgets_created_at
+CREATE OR REPLACE FUNCTION add_widget(
+    username   TEXT,
+    widgetname TEXT
+) RETURNS VOID LANGUAGE SQL AS $$
+    INSERT INTO widgets (created_by, name, created_at)
+    VALUES ($1, $2, NOW());
+$$;

So, what changed in the add_widget() function between gamma and now? One cannot tell from this diff: it looks like a brand new function. And no web-based VCS interface will show you, either; it’s just not inherent in the commit. We have to actually know that it was just an update to an existing function, and what files to manually diff, like so:

> diff -u sql/deploy/add_widget.sql sql/deploy/add_widget_v2.sql 
--- sql/deploy/add_widget.sql   2012-01-28 13:06:24.000000000 -0800
+++ sql/deploy/add_widget_v2.sql    2012-01-28 13:26:59.000000000 -0800
@@ -1,8 +1,8 @@
--- requires: widgets_table
-
+-- requires: widgets_created_at
    CREATE OR REPLACE FUNCTION add_widget(
        username   TEXT,
        widgetname TEXT
    ) RETURNS VOID LANGUAGE SQL AS $$
-    INSERT INTO widgets (created_by, name) VALUES ($1, $2);
+    INSERT INTO widgets (created_by, name, created_at)
+    VALUES ($1, $2, NOW());
    $$;

Much better, but how annoying is that? It doesn’t allow us to really take advantage of the VCS, all because we need SQL changes to run in a very specific order.

But let’s ignore that for the moment. Let’s just throw out the commit with add_widgets_v2.sql and go ahead and change the add_widget change script directly. So the history now looks like this:

> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2]
sql/deploy/widgets_created_at.sql

[e4b970aa36f27451fe377791eab040a73c6eb47a (HEAD, epsilon, master)]
sql/deploy/add_widget.sql

Naturally, the add_widget script appears twice now, once under the beta tag and once under epsilon (which I’ve just tagged). What are the consequences for our migration? Well, if we were to build a new database from the beginning, running these migrations as listed here, we would get an error while applying the beta changes:

ERROR:  column "created_at" of relation "widgets" does not exist
LINE 5:     INSERT INTO widgets (created_by, name, created_at)

This is because the created_at column won’t exist until the widgets_created_at change is applied. That won’t do, will it? Fortunately, Git knows exactly what the add_widget deploy script looked like under the beta tag, and we can ask it:

> git show beta:sql/deploy/add_widget.sql
-- requires: widgets_table

CREATE OR REPLACE FUNCTION add_widget(
    username   TEXT,
    widgetname TEXT
) RETURNS VOID LANGUAGE SQL AS $$
    INSERT INTO widgets (created_by, name) VALUES ($1, $2);
$$;

Boom, there it is, with no reference to created_at. Using this technique, our SQL deployment app can successfully apply all of our database changes by iterating over the list of changes and applying the contents of each script at the time of the appropriate commit or tag. In other words, it could apply the output from each of these commands:

git show alpha:sql/deploy/users_table.sql
git show beta:sql/deploy/widgets_table.sql
git show beta:sql/deploy/add_widget.sql
git show gamma:sql/deploy/add_user.sql
git show 44ba615b7813531f0acb6810cbf679791fe57bf2:sql/deploy/widget_created_at.sql
git show epsilon:sql/deploy/add_widget.sql

And everything will work exactly as it should: the original version of the add_widget change script will be for the beta tag, and the next version will be applied for the epsilon tag. Not bad, right? We get a nice, clean Git history and can exploit it to manage the changes.

Reversion to the Mean

But what about reversion? What if the deploy to epsilon failed, and we need to revert back to gamma? Recall that in the first article, I eliminated duplication by having the add_widget_v2 revert script simply call the add_widget deploy script. But such is not possible now that we’ve changed add_widget in place. What to do?

The key is for the change management script to know the difference between a new change script and a modified one. Fortunately, Git knows that, too, and we can get it to cough up that information with a simple change to the sqlhist alias: instead of passing --name-only, pass --name-status:

% alias sqlhist="git log -p --format='[%H%d]' --name-status --reverse sql/deploy \
| awk '/^\[/ {print \"\"} /./'"

Using this new alias, our history looks like:

> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
A   sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
A   sql/deploy/add_widget.sql
A   sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
A   sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2]
A   sql/deploy/widgets_created_at.sql

[e4b970aa36f27451fe377791eab040a73c6eb47a (HEAD, epsilon, master)]
M   sql/deploy/add_widget.sql

Now we have a letter defining the status of each file. An “A” means the file was added in that commit; an “M” means it was modified. But the upshot is that, to revert to gamma, our change management can see that add_widget was modified in epsilon, and, rather than apply a revert change script, it can just apply the version of the script as it existed under gamma:

> git show gamma:sql/deploy/add_widget.sql
-- requires: widgets_table

CREATE OR REPLACE FUNCTION add_widget(
    username   TEXT,
    widgetname TEXT
) RETURNS VOID LANGUAGE SQL AS $$
    INSERT INTO widgets (created_by, name) VALUES ($1, $2);
$$;

And there we are, right back to where we should be. Of course, the remaining epsilon deploy script, widget_created_at, was added in its commit, so we just apply the revert script and we’re set, back to gamma.

Still Configurable

To get back to the original idea of a migration configuration file, I still think it’s entirely do-able. All we need to is to have the change management app generate it, just as before. When it comes to modified — rather than added — deploy scripts, it can automatically insert new scripts with the full copies of previous versions, much as before. The resulting configuration would look something like this:

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2]
sql/deploy/widgets_created_at.sql

[e4b970aa36f27451fe377791eab040a73c6eb47a (HEAD, epsilon, master)]
sql/deploy/add_widget_v2.sql

Note that last line, where we now have add_widget_v2. The change management script would simply generate this file, and create an additional revert script with the same name that just contains the contents of the deploy script as it was under the gamma tag.

Too Baroque?

Having written down these ideas that have plagued by brain for the last week, along with some examples using Git to confirm them, I’m convinced more than ever that this is entirely workable. But it also leads me to wonder if it’s too baroque. I intend these posts as a rough spec for how this thing should work, and I plan to implement it in the coming weeks. But I’m wondering how difficult it will be to explain it all to people?

So let me see if I can break it down to a few simple rules.

  • In general, you should create independent deploy and revert scripts for your SQL. Put a CREATE TABLE statement into its own script. If it requires some some other table, require declare the dependency. If you need to change it later, create a new script that uses ALTER TABLE.
  • In special cases where a simple change cannot be made without copying something wholesale, and where the deploy script is idempotent, you may simply modify the deploy script in-place.

That’s about it. The idempotence of the deploy script is important for ensuring consistency, and applies very well to features such as user-defined functions. For other objects, there are generally ALTER statements that allow changes to be made without wholesale copying of existing code.

So what am I missing? What have I overlooked? What mistakes in my logic have I made? Do you think this will be too tricky to implement, or to use? Is it hard to understand? Your comments would be greatly appreciated, because I am going to write an app to do this stuff, and want to get it right.

Thanks for sticking with me through all the thought experiments. For my next post on this topic, I expect to have an interface spec for the new app.

Looking for the comments? Try the old layout.

VCS-Enabled SQL Change Management

In my previous post, I outlined the basics of a configuration-file and dependency-tracking SQL deployment architecture, but left a couple of additional challenges unresolved. They were:

  1. I would rather not have to hand-edit a configuration file, as it it’s finicky and error-prone.

  2. There is still more duplication of code than I would like, in that a procedure defined in one change script would have to be copied whole to a new script for any changes, even single-line simple changes.

I believe I can solve both of these issues by simple use of a VCS. Since all of my current projects currently use Git, I will use it for the examples here.

Git it On

First, recall the structure of the configuration file, which was something like this:

[alpha]
users_table

[beta]
add_widget
widgets_table

[gamma]
add_user

[delta]
widgets_created_at
add_widget_v2

Basically, we have bracketed tags identifying changes that should be deployed. Now have a look at this:

> git log -p --format='[%H]' --name-only --reverse sql/deploy
[8920aaf7947a56f6777e69a21b70fd877c8fd6dc]

sql/deploy/users_table.sql
[f7da5fd4b7391747f75d85db6fa82de47b9e4c00]

sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql
[ea10b9e566934ef256debe8752504189436e162a]

sql/deploy/add_user.sql
[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4]

sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

Look familiar? Let’s use a bit of awk magic to neaten things a bit (Thanks helwig!):

> git log -p --format='[%H]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

Ah, that’s better. We have commit SHA1s for tags, followed by the appropriate lists of deployment scripts. But wait, we can decorate it, too:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql
[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (HEAD, delta, master)]

Look at that! Actual VCS tags built right in to the output. So, assuming our deployment app can parse this output, we can deploy or revert to any commit or tag. Better yet, we don’t have to maintain a configuration file, because the VCS is already tracking all that stuff for us! Our change management app can automatically detect if we’re in a Git repository (or Mercurial or CVS or Subversion or whatever) and fetch the necessary information for us. It’s all there in the history. We can name revision identifiers (SHA1s here) to deploy or revert to, or use tags (alpha, beta, gamma, delta, HEAD, or master in this example).

And with careful repository maintenance, this approach will work for branches, as well. For example, say you have developers working in two branches, feature_foo and feature_bar. In feature_foo, a foo_table change script gets added in one commit, and an add_foo script in a second commit. Merge it into master and the history now looks like this:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e (HEAD, master, feature_foo)]
sql/deploy/add_foo.sql

So far so good.

Meanwhile, development in the feature_bar branch has added a bar_table change script in one commit and add_bar in another. Because development in this branch was going on concurrently with the feature_foo branch, if we just merged it into master, we might get a history like this:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'
[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[d1882d7b4cfcf5c57030bd5a15f8571bfd7e48e2]
sql/deploy/bar_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e]
sql/deploy/add_foo.sql

[2330da1caae9a46ea84502bd028ead399ca3ca02 (feature_bar)]
sql/deploy/add_bar.sql

[73979ede2c8589cfe24c9213a9538f305e6f508f (HEAD, master, feature_foo)]

Note that bar_table comes before add_foo. In other words, the feature_foo and feature_bar commits are interleaved. If we were to deploy to HEAD, and then need to revert feature_bar, bar_table would not be reverted. This is, shall we say, less than desirable.

There are at least two ways to avoid this issue. One is to squash the merge into a single commit using git merge --squash feature_bar. This would be similar to accepting a single patch and applying it. The resulting history would look like this:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e]
sql/deploy/add_foo.sql

[91a048c05e0444682e2e4763e8a7999a869b4a77 (HEAD, master)]
sql/deploy/add_bar.sql
sql/deploy/bar_table.sql

Now both of the feature_bar change scripts come after the feature_foo changes. But it might be nice to keep the history. So a better solution (and the best practice, I believe), is to rebase the feature_bar branch before merging it into master, like so:

> git rebase master
First, rewinding head to replay your work on top of it...
Applying: Add bar.
Applying: Add add_bar().
> git checkout master
Switched to branch 'master'
> git merge feature_bar
Updating 7f89e23..0fab7a0
Fast-forward
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 sql/deploy/add_bar.sql
 create mode 100644 sql/deploy/bar_table.sql
 create mode 100644 sql/revert/add_bar.sql
 create mode 100644 sql/revert/bar_table.sql

And now we should have:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e]
sql/deploy/add_foo.sql

[0e53c29eb47c618d0a8818cc17bd5a0aab0acd6d]
sql/deploy/bar_table.sql

[0fab7a0ba928b34a46a9495d4efc1c73d9133d37 (HEAD, master, feature_bar)]
sql/deploy/add_bar.sql

Awesome, now everything is in the correct order. We did lose the feature_foo “tag,” though. That’s because it wasn’t a tag, and neither is feature_bar here. They are, rather, branch names, which we becomes obvious when using “full” decoration:

git log --format='%d' --decorate=full HEAD^..      
 (HEAD, refs/heads/master, refs/heads/feature_foo)

After the next commit, it will disappear from the history. So let’s just tag the relevant commits ourselves:

> git tag feature_foo 7f89e23c9f1e7fc298c69400f6869d701f76759e
> git tag feature_bar
> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e (feature_foo)]
sql/deploy/add_foo.sql

[0e53c29eb47c618d0a8818cc17bd5a0aab0acd6d]
sql/deploy/bar_table.sql

[0fab7a0ba928b34a46a9495d4efc1c73d9133d37 (HEAD, feature_bar, master, feature_bar)]
sql/deploy/add_bar.sql

Ah, there we go! After the next commit, one of those feature_bars will disappear, since the branch will have been left behind. But we’ll still have the tag.

Not Dead Yet

Clearly we can intelligently use Git to manage SQL change management. (Kind of stands to reason, doesn’t it?) Nevertheless, I believe that a configuration file still might have its uses. Not only because not every project is in a VCS (it ought to be!), but because oftentimes a project is not deployed to production as a git clone. It might be distributed as a source tarball or an RPM. In such a case, including a configuration file in the distribution would be very useful. But there is still no need to manage it by hand; our deployment app can generate it from the VCS history before packaging for release.

More to Come

I’d planned to cover the elimination of duplication, but I think this is enough for one post. Watch for that idea in my next post.

Looking for the comments? Try the old layout.

Simple SQL Change Management

I’ve been thinking a lot about SQL change management. I know I have written about this before. But I was never satisfied with that idea, mostly because it required managing database changes in two separate but interdependent ways. Blargh. So for my Perl projects the last couple of years, I have stuck to the very simple but ugly Rails-style migration model, as implemented in Module::Build::DB.

But it has been on my brain more lately because I’m writing more and more database applications at work, and managing changes over time is becoming increasingly annoying. I’ve been using a variation on Depesz’s Versioning package, mainly because its idea of specifying dependencies instead of ordered deployment scripts is so useful. However, its implementation in pure SQL, with accompanying shell and Perl scripts, is not entirely satisfying. Worse, one cannot easily include the contents of an earlier deployment script in a reversion script, because the dependency registration function embedded in a script will throw an error if it has been run before. The upshot is that if you make a one-line change to a database function, you still have to paste the entire thing into a new file and commit it to your source code repository. This makes tracking diffs annoying.

Oh, and did I mention that there is no simple built-in way to revert changes, and even if there were, because there are no named releases, it can be difficult to decide what to revert to? I don’t often need that capability, but when I need it, I need it.

Then, this week, Robert Haas described a deployment implementation he implemented. It was simple:

My last implementation worked by keeping a schema_versions table on the server with one column, a UUID. The deployment tarball contained a file with a list of UUIDs in it, each one associated to an SQL script. At install time, the install script ran through that file in order and ran any scripts whose UUID didn’t yet appear in the table, and then added the UUIDs of the run scripts to the table.

I like this simplicity, but there are some more things I think could be done, including dependency resolution and reversion. And it seems silly to have a UUID stand for a script name; why not just list script names? Better yet, tag groups of changes for easy reference.

Yet Another SQL Deployment Strategy

So here’s my proposal. Following Robert, we create a configuration file, but instead of just listing changes, we fill it with tags and the names of the changes are associated with each. An example:

[alpha]
users_table

[beta]
add_widget
widgets_table

[gamma]
add_user

Our change management app will parse this file, finding the tag for each stage of the migration in brackets, and apply the associated changes, simply finding each of them in sql/deploy/$change.sql. If it’s reverting changes, it finds the reversion scripts named sql/revert/$change.sql. The tags can be anything you want; release tags might be useful. Easy so far, right?

Except notice that I have a minor ordering problem here. The add_widget change, which adds a function to insert a record into the widgets table, comes before the widgets_table script. If we run the add_widget change first, it will fail, because the widgets table does not yet exist.

Of course we can re-order the lines in the configuration file. But given that one might have many changes for a particular tag, with many cross-referencing dependencies, I think it’s better to overcome this problem in the scripts themselves. So I suggest that the sql/deploy/add_widget.sql file look something like this:

-- requires: widgets_table

CREATE OR REPLACE FUNCTION add_widget(
    username   TEXT,
    widgetname TEXT
) RETURNS VOID LANGUAGE SQL AS $$
    INSERT INTO widgets (created_by, name) VALUES ($1, $2);
$$;

Here I’m stealing Depesz’s dependency tracking idea. With a simple comment at the top of the script, we specify that this change requires that the widgets_table change be run first. So let’s look at sql/deploy/widgets_table.sql:

-- requires: users_table

CREATE TABLE widgets (
    created_by TEXT NOT NULL REFERENCES users(name),
    name       TEXT NOT NULL
);

Ah, now here we also require that the users_table change be deployed first. Of course, it likely would be, given that it appears under a tag earlier in the file, but it’s best to be safe and explicitly spell out dependencies. Someone might merge the two tags at some point before release, right?

The users_table change has no dependencies, but the later add_user change of course does; our sql/deploy/add_user.sql:

-- requires: users_table

CREATE OR REPLACE FUNCTION add_user(
    name TEXT
) RETURNS VOID LANGUAGE SQL AS $$
    INSERT INTO users (name) VALUES ($1);
$$;

Our deployment app can properly resolve these dependencies. Of course, we also need reversion scripts in the sql/revert directory. They might look something like:

-- sql/revert/users_table.sql
DROP TABLE IF EXISTS users;

-- sql/revert/add_widget.sql
DROP FUNCTION IF EXISTS add_widget(text, text);

-- sql/revert/widgets_table.sql
DROP TABLE IF EXISTS widgets;

-- sql/revert/add_user.sql
DROP FUNCTION IF EXISTS add_user(text);

So far so good, right? Our app can resolve dependencies in both directions, so that if we tell it to revert to beta, it can do so in the proper order.

Now, as the deployment app runs the scripts, deploying or reverting changes, it tracks them and their dependencies in its own metadata table in the database, not unlike Depesz’s Versioning package. But because dependencies are parsed from comments in the scripts, we are free to include the contents of one script in another. For example, say that we later need to revise the add_widget() function to log the time a widget is created. First we add a new script to add the necessary column:

-- requires: widgets_table
ALTER TABLE widgets ADD created_at TIMESTAMPTZ;

Call that script sql/deploy/widgets_created_at.sql. Next we add a script that changes add_widgets():

-- requires widgets_created_at
CREATE OR REPLACE FUNCTION add_widget(
    username   TEXT,
    widgetname TEXT
) RETURNS VOID LANGUAGE SQL AS $$
    INSERT INTO widgets (created_by, name, created_at)
    VALUES ($1, $2, NOW());
$$;

Call it sql/deploy/add_widget_v2.sql. Then update the deployment configuration file with a new tag and the associated changes:

[delta]
widgets_created_at
add_widget_v2

With me so far? Now, what about reversion? sql/revert/widgets_created_at.sql is simple, of course:

ALTER TABLE widgets DROP COLUMN IF EXISTS created_at;

But what should sql/revert/add_widget_v2.sql look like? Why, to go back to the first version of add_widget(), it would be identical to sql/deploy/add_widget.sql. But it would be silly to copy the whole file, wouldn’t it? Why duplicate when we can just include?

\i sql/deploy/add_widget.sql

Boom, we get the reversion script for free. No unnecessary duplication between deployment and reversion scripts, and all dependencies are nicely resolved. Plus, the tags in the configuration file make it easy to deploy and revert change sets as necessary, with dependencies properly followed.

There’s More!

To recap, I had two primary challenges with Depesz’s Versioning package to overcome: inability to easily revert to an earlier implementation; and the inability to easily include one script in another. Both of course are do-able with workarounds, but I think that the addition of a deployment configuration file with tagged sets of changes and the elimination of SQL-embedded dependency specification overcome these issues much more effectively and intuitively.

Still, there are two more challenges I would like to overcome:

  1. It would be nice not to need the configuration file at all. Maintaining such a thing can be finicky and error-prone.

  2. I still had to duplicate the entire add_widget() function in the add_widget_v2 script for a very simple change. This means no easy way to simply see the diff for this change in my VCS. It would be nice not to have to copy the entire function.

I think I have solutions for these issues, as well. More in my next post.

Looking for the comments? Try the old layout.

Fixing Foreign Key Deadlocks in PostgreSQL

PGX had a client come to us recently with a rather nasty deadlock issue. It took far longer than we would have liked to figure out the issue, and once we did, they were able to clear it up by dropping an unnecessary index. Still, it shouldn’t have been happening to begin with. Joel Jacobson admirably explained the issue on pgsql-hackers (and don’t miss the screencast).

Some might consider it a bug in PostgreSQL, but the truth is that PostgreSQL can obtain stronger than necessary locks. Such locks cause some operations to block unnecessarily and some other operations to deadlock, especially when foreign keys are used in a busy database. And really, who doesn’t use FKs in their busy database?

Fortunately, Simon Riggs proposed a solution. And it’s a good one. So good that PGX is partnering with Glue Finance and Command Prompt as founding sponsors on a new FOSSExperts project to actually get it done. Álvaro Herrera is doing the actual hacking on the project, and has already blogged about it here and here.

If you use foreign key constraints (and you should!) and you have a high transaction load on your database (or expect to soon!), this matters to you. In fact, if you use ActiveRecord with Rails, there might even be a special place in your heart for this issue, says Mina Naguib. We’d really like to get this done in time for the PostgreSQL 9.1 release. But it will only happen if the project can be funded.

Yes, that’s right, as with PGXN, this is community project for which we’re raising funds from the community to get it done. I think that more and more work could be done this way, as various interested parties contribute small amounts to collectively fund improvements to the benefit of us all. So can you help out? Hit the FOSSExperts project page for all the project details, and to make your contribution.

Help us help the community to make PostgreSQL better than ever!

Looking for the comments? Try the old layout.

Managing Key/Value Pairs in PostgreSQL

Let’s say that you’ve been following the latest research in key/value data storage and are interested in managing such data in a PostgreSQL database. You want to have functions to store and retrieve pairs, but there is no natural way to represent pairs in SQL. Many languages have hashes or or data dictionaries to fulfill this role, and you can pass them to functional interfaces. SQL’s got nothin’. In PostgreSQL, have two options: use nested arrays (simple, fast) or use a custom composite data type (sugary, legible).

Let’s assume you have this table for storing your pairs:

CREATE TEMPORARY TABLE kvstore (
    key        TEXT PRIMARY KEY,
    value      TEXT,
    expires_at TIMESTAMPTZ DEFAULT NOW() + '12 hours'::interval
);

To store pairs, you can use nested arrays like so:

 SELECT store(ARRAY[ ['foo', 'bar'], ['baz', 'yow'] ]);

Not too bad, and since SQL arrays are a core feature of PostgreSQL, there’s nothing special to do. Here’s the store() function:

CREATE OR REPLACE FUNCTION store(
    params text[][]
) RETURNS VOID LANGUAGE plpgsql AS $$
BEGIN
    FOR i IN 1 .. array_upper(params, 1) LOOP
        UPDATE kvstore
            SET value      = params[i][2],
                expires_at = NOW() + '12 hours'::interval
            WHERE key        = param[i][1];
        CONTINUE WHEN FOUND;
        INSERT INTO kvstore (key, value)
        VALUES (params[i][1], params[i][2]);
    END LOOP;
END;
$$;

I’ve seen worse. The trick is to iterate over each nested array, try an update for each, and insert when no row is updated. Alas, you have no control over how many elements a user might include in a nested array. One might call it as:

SELECT store(ARRAY[ ['foo', 'bar', 'baz'] ]);

Or:

SELECT store(ARRAY[ ['foo'] ]);

No errors will be thrown in either case. In the first the “baz” will be ignored, and in the second the value will default to NULL. If you really didn’t like these behaviors, you could add some code to throw an exception if array_upper(params, 2) returns anything other than 2.

Let’s look at fetching values for keys. PostgreSQL 8.4 added variadic function arguments, so it’s easy to provide a nice interface for retrieving one or more values. The obvious one fetches a single value:

CREATE OR REPLACE FUNCTION getval(
    text
) RETURNS TEXT LANGUAGE SQL AS $$
    SELECT value FROM kvstore WHERE key = $1;
$$;

Nice and simple:

SELECT getval('baz');

 getval 
--------'
 yow

The variadic version looks like this:

CREATE OR REPLACE FUNCTION getvals(
    variadic text[]
) RETURNS SETOF text LANGUAGE SQL AS $$
    SELECT value
      FROM kvstore
      JOIN (SELECT generate_subscripts($1, 1)) AS f(i)
        ON kvstore.key = $1[i]
     ORDER BY i;
$$;

Note the use of ORDER BY i to ensure that the values are returned in the same order as the keys are passed to the function. So if I’ve got the key/value pairs 'foo' => 'bar' and 'baz' => 'yow', the output is:

SELECT * FROM getvals('foo', 'baz');

 getvals 
---------
 bar
 yow

If we want to the rows to have the keys and values together, we can return them as arrays, like so:

CREATE OR REPLACE FUNCTION getpairs(
    variadic text[]
) RETURNS SETOF text[] LANGUAGE SQL AS $$
    SELECT ARRAY[key, value]
      FROM kvstore
      JOIN unnest($1) AS k ON kvstore.key = k
$$;

Here I’m assuming that order isn’t important, which means we can use unnest to “flatten” the array, instead of the slightly more baroque generate_subscripts() with array access. The output:

SELECT * FROM getpairs('foo', 'baz');

  getpairs   
-------------
 {baz,yow}
 {foo,bar}

Now, this is good as far as it goes, but the use of nested arrays to represent key/value pairs is not exactly ideal: just looking at the use of a function, there’s nothing to indicate that you’re using key/value pairs. What would be ideal is to use row constructors to pass arbitrary pairs:

SELECT store( ROW('foo', 'bar'), ROW('baz', 42) );

Alas, one cannot pass RECORD values (the data type returned by ROW()) to non-C functions in PostgreSQL.1 But if you don’t mind your keys and values always being TEXT, we can get almost all the way there by creating an “ordered pair” data type as a composite type like so:

CREATE TYPE pair AS ( k text, v text );

Then we can create store() with a signature of VARIADIC pair[] and pass in any number of these suckers:

CREATE OR REPLACE FUNCTION store(
    params variadic pair[]
) RETURNS VOID LANGUAGE plpgsql AS $$
DECLARE
    param pair;
BEGIN
    FOR param IN SELECT * FROM unnest(params) LOOP
        UPDATE kvstore
           SET value = param.v,
               expires_at = NOW() + '12 hours'::interval
         WHERE key = param.k;
        CONTINUE WHEN FOUND;
        INSERT INTO kvstore (key, value) VALUES (param.k, param.v);
    END LOOP;
END;
$$;

Isn’t it nice how we can access keys and values as param.k and param.v? Call the function like this:

SELECT store( ROW('foo', 'bar')::pair, ROW('baz', 'yow')::pair );

Of course, that can get a bit old, casting to pair all the time, so let’s create some pair constructor functions to simplify things:

CREATE OR REPLACE FUNCTION pair(anyelement, text)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';

CREATE OR REPLACE FUNCTION pair(text, anyelement)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';

CREATE OR REPLACE FUNCTION pair(anyelement, anyelement)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';

CREATE OR REPLACE FUNCTION pair(text, text)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair;';

I’ve created four variants here to allow for the most common combinations of types. So any of the following will work:

SELECT pair('foo', 'bar');
SELECT pair('foo', 1);
SELECT pair(12.3, 'foo');
SELECT pair(1, 43);

Alas, you can’t mix any other types, so this will fail:

SELECT pair(1, 12.3);

ERROR:  function pair(integer, numeric) does not exist
LINE 1: SELECT pair(1, 12.3);

We could create a whole slew of additional constructors, but since we’re using a key/value store, it’s likely that the keys will usually be text anyway. So now we can call store() like so:

SELECT store( pair('foo', 'bar'), pair('baz', 'yow') );

Better, eh? Hell, we can go all the way and create a nice binary operator to make it still more sugary. Just map each of the pair functions to the operator like so:

CREATE OPERATOR -> (
    LEFTARG   = text,
    RIGHTARG  = anyelement,
    PROCEDURE = pair
);

CREATE OPERATOR -> (
    LEFTARG   = anyelement,
    RIGHTARG  = text,
    PROCEDURE = pair
);

CREATE OPERATOR -> (
    LEFTARG   = anyelement,
    RIGHTARG  = anyelement,
    PROCEDURE = pair
);

CREATE OPERATOR -> (
    LEFTARG   = text,
    RIGHTARG  = text,
    PROCEDURE = pair
);

Looks like a lot of repetition, I know, but checkout the new syntax:

SELECT store( 'foo' -> 'bar', 'baz' -> 1 );

Cute, eh? I chose to use -> because => is deprecated as an operator in PostgreSQL 9.0: SQL 2011 reserves that operator for named parameter assignment.2

As a last twist, let’s rewrite getpairs() to return pairs instead of arrays:

CREATE OR REPLACE FUNCTION getpairs(
    variadic text[]
) RETURNS SETOF pair LANGUAGE SQL AS $$
    SELECT key -> value
      FROM kvstore
      JOIN unnest($1) AS k ON kvstore.key = k
$$;

Cute, eh? Its use is just like before, only now the output is more table-like:

SELECT * FROM getpairs('foo', 'baz');

  k  |   v   
-----+-------
 baz | yow
 foo | bar

You can also get them back as composites by omitting * FROM:

SELECT getpairs('foo', 'baz');

  getpairs   
-------------
 (foo,bar)
 (baz,yow)

Anyway, just something to consider the next time you need a function that allows any number of key/value pairs to be passed. It’s not perfect, but it’s pretty sweet.


  1. In the recent pgsql-hackers discussion that inspired this post, Pavel Stehule suggested adding something like Oracle COLLECTIONs to address this shortcoming. I don’t know how far this idea will get, but it sure would be nice to be able to pass objects with varying kinds of data, rather than be limited to data all of one type (values in an SQL array must all be of the same type). ↩︎

  2. No, you won’t be able to use named parameters for this application because named parameters are inherently non-variadic. That is, you can only pre-declare so many named parameters: you can’t anticipate every parameter that’s likely to be wanted as a key in our key/value store. ↩︎

Looking for the comments? Try the old layout.

SQL Hack: The Something-est From Each Entity

This is a pattern that I have dealt with many times, but never figured out how to adequately handle. Say that you have imported a mailbox into your database, and you want a list of the latest messages between each pair of recipients (sender and receiver — I’m ignoring multiple receivers for the moment). The data might look like this:

BEGIN;

CREATE TABLE messages (
    sender   TEXT        NOT NULL,
    receiver TEXT        NOT NULL,
    sent_at  TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
    body     TEXT        NOT NULL DEFAULT ''
);

INSERT INTO messages ( sender, receiver, body )
VALUES ('Theory', 'Strongrrl', 'Hi There.' );

INSERT INTO messages ( sender, receiver, body )
VALUES ('Strongrrl', 'Theory', 'Hi yourself.' );

INSERT INTO messages ( sender, receiver, body )
VALUES ('Anna', 'Theory', 'What''s for dinner?' );

INSERT INTO messages ( sender, receiver, body )
VALUES ('Theory', 'Anna', 'Brussels Sprouts.' );

INSERT INTO messages ( sender, receiver, body )
VALUES ('Anna', 'Theory', 'Oh man!' );

COMMIT;

So the goal is to show the most recent message between Theory and Strongrrl and the most recent message between Theory and Anna, without regard to who is the sender and who is the receiver. After running into this many times, today I consulted my colleagues, showing them this dead simple (and wrong!) query to demonstrate what I wanted:

SELECT sender, recipient, sent_at, body
  FROM messages
 GROUP BY sender, recipient
HAVING sent_at = max(sent_at);

That’s wrong because one can’t have columns in the SELECT expression that are not either aggregate expressions or included in theGROUP BY expression. It’s a violation of the standard (and prone to errors, I suspect). Andrew immediately said, “Classic case for DISTINCT ON”. This lovely little expression is a PostgreSQL extension not included in the SQL standard. It’s implementation looks like this:

SELECT DISTINCT ON (
          CASE WHEN receiver > sender
               THEN receiver || sender
               ELSE sender   || receiver
          END
       ) sender, receiver, sent_at, body
  FROM messages
 ORDER BY CASE WHEN receiver > sender
               THEN receiver || sender
               ELSE sender   || receiver
          END, sent_at DESC;

This query is saying, “fetch the rows where the sender and the receiver are distinct, and order by sent_at DESC. THE CASE statement to get a uniform value for the combination of sender and receiver is a bit unfortunate, but it does the trick:

  sender   | receiver |            sent_at            |     body     
-----------+----------+-------------------------------+--------------
 Anna      | Theory   | 2010-01-12 05:00:07.026711+00 | Oh man!
 Strongrrl | Theory   | 2010-01-12 05:00:07.02589+00  | Hi yourself.

Great, exactly the data I wanted. And the CASE statement can actually be indexed to speed up filtering. But I wondered if it would be possible to get the same results without the DISTINCT ON. In other words, can this be done with standard SQL? If you’re using PostgreSQL 8.4, the answer is “yes.” All you have to do is exploit window functions and a subquery. It looks like this:

SELECT sender, receiver, sent_at, body
  FROM (
    SELECT sender, receiver, sent_at, body,
           row_number() OVER ( PARTITION BY 
               CASE WHEN receiver > sender
                    THEN receiver || sender
                    ELSE sender   || receiver
               END
               ORDER BY sent_at DESC
           ) AS rnum
      FROM messages
  ) AS t
 WHERE rnum = 1;

Same nasty CASE statement as before (no way around it with this database design, alas), but this is fully conforming SQL. It’s also the first time I’ve ever used window functions. If you just focus on the row_number() OVER () expression, it’s simply partitioning the table according to the same value as in the DISTINCT ON value, but it’s ordering it by sent_at directly. The result is a row number, where the first is 1 for the most recent message for each combination of recipients. Then we just filter for that in the WHERE clause.

Not exactly intuitive (I’m really only understanding it now as I explain write it out), but quite straight-forward once you accept the expressivity in this particular OVER expression. It might be easier to understand if we remove some of the cruft. If instead we wanted the most recent message from each sender (regardless of the recipient), we’d write:

SELECT sender, receiver, sent_at, body
  FROM (
    SELECT sender, receiver, sent_at, body,
           row_number() OVER (
               PARTITION BY sender ORDER BY sent_at DESC
           ) AS rnum
      FROM messages
  ) AS t
 WHERE rnum = 1;

And that yields:

  sender   | receiver |            sent_at            |     body     
-----------+----------+-------------------------------+--------------
 Anna      | Theory   | 2010-01-12 05:00:07.026711+00 | Oh man!
 Strongrrl | Theory   | 2010-01-12 05:00:07.02589+00  | Hi yourself.
 Theory    | Anna     | 2010-01-12 05:00:07.24982+00  | Brussels Sprouts.

Furthermore, we can use a common table expression to eliminate the subquery. This query is functionally identical to the subquery example (returning to uniqueness for sender and receiver), just with the WITH clause coming before the SELECT clause, setting things up for it:

WITH t AS (
    SELECT sender, receiver, sent_at, body,
           row_number() OVER (PARTITION BY CASE
               WHEN receiver > sender
                   THEN receiver || sender
                   ELSE sender   || receiver
                   END
               ORDER BY sent_at DESC
           ) AS rnum
      FROM messages
) SELECT sender, receiver, sent_at, body
    FROM t
   WHERE rnum = 1;

So it’s kind of like putting the subquery first, only it’s not a subquery, it’s more like a temporary view. Nice, eh? Either way, the results are the same as before:

  sender   | receiver |            sent_at            |     body     
-----------+----------+-------------------------------+--------------
 Anna      | Theory   | 2010-01-12 05:00:07.026711+00 | Oh man!
 Strongrrl | Theory   | 2010-01-12 05:00:07.02589+00  | Hi yourself.

I hereby dub this “The Entity’s Something-est” pattern (I’m certain someone else has already come up with a good name for it, but this will do). I can see it working any place requiring the highest, lowest, latest, earliest, or something else-est item from each of a list of entities. Perhaps the latest headline from every news source:

WITH t AS (
    SELECT source, headline, dateline, row_number() OVER (
               PARTITION BY source ORDER BY dateline DESC
           ) AS rnum
      FROM news
) SELECT source, headline, dateline
    FROM t
   WHERE rnum = 1;

Or perhaps the lowest score for for each basketball team over the course of a season:

WITH t AS (
    SELECT team, date, score, row_number() OVER (
               PARTITION BY team ORDER BY score
           ) AS rnum
      FROM games
) SELECT team, date, score
    FROM t
   WHERE rnum = 1;

Easy! How have you handled a situation like this in your database hacking?

Looking for the comments? Try the old layout.

RFC: PostgreSQL Add-on Network

I’ve posted a plan to implement PGAN, a CPAN for PostgreSQL extensions. I’ve tried to closely follow the CPAN philosophy to come up with a plan that requires a minimum-work implementation that builds on the existing PostgreSQL tools and the examples of the CPAN and JSAN. My hope is that it’s full of JFDI! I would be very grateful for feedback and suggestions.

Looking for the comments? Try the old layout.

Enforcing a Set of Values

Enumerate Me

I love enums. They’re a terrific way to quickly create self-documenting data types that represent a set of values, and the nice thing is that the underlying values are stored as integers, making them very space- and performance-efficient. A typical example might be a workflow approval process for publishing magazine articles. You create it like so:

CREATE TYPE article_states AS ENUM (
    'draft', 'copy', 'approved', 'published'
);

Nice: we now have a simple data type that’s self-documenting. An an important feature of enums is that the ordering of values is the same as the declared labels. For a workflow such as this, it makes a lot of sense, because the workflow states are inherently ordered: “draft” comes before “copy” and so on.

Unfortunately, enums aren’t a panacea. I would use them all over the place if I could, but, alas, the value-set data types I tend to need tend not to have inherently ordered values other than the collation order of the text. For example, say that we need a table describing people’s faces. Using an enum to manage eye colors might look something like this:

CREATE TYPE eye_color AS ENUM ( 'blue', 'green', 'brown' );

CREATE TABLE faces (
    face_id SERIAL PRIMARY KEY,
    name TEXT NOT NULL DEFAULT '',
    eye_color eye_color NOT NULL
);

Nice, huh? So let’s insert a few values and see what it looks like:

INSERT INTO faces (name, eye_color)
VALUES ('David', 'blue' ),
       ('Julie', 'green' ),
       ('Anna', 'blue' ),
       ('Noriko', 'brown' )
;

So let’s look at the data ordered by the enum:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
  name  | eye_color 
--------+-----------
 David  | blue
 Anna   | blue
 Julie  | green
 Noriko | brown

Hrm. That’s not good. I forgot to put “green” after “brown” when I created the enum. Oh, and I forgot the color “hazel”:

% INSERT INTO faces (name, eye_color) VALUES ('Kat', 'hazel' );
ERROR:  invalid input value for enum eye_color: "hazel"

Well, nice to know that it’s enforced, and that message is really helpful. But the real problem is that we run into the inherent ordering of enum labels, and now we need to adjust the enum to meet our needs. Here’s how to do it:

ALTER TABLE faces RENAME eye_color TO eye_color_tmp;
ALTER TABLE faces ALTER eye_color_tmp TYPE TEXT;
DROP TYPE eye_color;
CREATE TYPE eye_color AS ENUM ( 'blue', 'brown', 'green', 'hazel' );
ALTER TABLE faces ADD eye_color eye_color;
UPDATE faces SET eye_color = eye_color_tmp::eye_color;
ALTER TABLE faces ALTER eye_color SET NOT NULL;
ALTER TABLE faces DROP column eye_color_tmp;

Yikes! I have to rename the column, change its type to TEXT, drop the enum, create a new enum, and then copy all of the data into the new column before finally dropping the old column. If I have a lot of data, this will not be very efficient, requiring that every single row be rewritten. Still, it does work:

% INSERT INTO faces (name, eye_color) VALUES ('Kat', 'hazel' );
% SELECT name, eye_color FROM faces ORDER BY eye_color;
  name  | eye_color 
--------+-----------
 David  | blue
 Anna   | blue
 Noriko | brown
 Julie  | green
 Kat    | hazel

The upshot is that enums are terrific if you have a very well-defined set of values that are inherently ordered (or where order is not important) and that are extremely unlikely to change. Perhaps someday PostgreSQL will have a more robust ALTER TYPE that allows enums to be more efficiently reorganized, but even then it seems likely that re-ordering values will require a table rewrite.

Lookup to Me

Another approach to handling a type as a set of values is to take advantage of the relational model and create store the values in a table. Going with the faces example, it looks like this:

CREATE TABLE eye_colors (
    eye_color TEXT PRIMARY KEY
);

INSERT INTO  eye_colors VALUES( 'blue' ), ('green'), ('brown' );

CREATE TABLE faces (
    face_id SERIAL PRIMARY KEY,
    name TEXT NOT NULL DEFAULT '',
    eye_color TEXT REFERENCES eye_colors(eye_color)
);

We can use this table much as we did before:

INSERT INTO faces (name, eye_color)
VALUES ('David', 'blue' ),
       ('Julie', 'green' ),
       ('Anna', 'blue' ),
       ('Noriko', 'brown' )
;

And of course we can get the rows back properly ordered by eye_color, unlike the original enum example:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
  name  | eye_color 
----+------
 David  | blue
 Anna   | blue
 Noriko | brown
 Julie  | green

Cool! But there are a couple of downsides. One is that you’re adding a bit of I/O overhead to every update. Most likely you won’t have very many values in the eye_colors table, so given PostgreSQL’s caching, this isn’t a big deal. A bigger deal is error handling:

INSERT INTO eye_colors VALUES ('hazel');
ERROR:  insert or update on table "faces" violates foreign key constraint "faces_eye_color_fkey"

That’s not an incredibly useful error message. One might ask, without knowing the schema, what has an eye color has to do with a foreign key constraint? At least looking at the tables can tell you a bit more:

% \dt
          List of relations
 Schema |    Name    | Type  | Owner 
----+------+----+----
 public | eye_colors | table | david
 public | faces      | table | david

A quick look at the eye_colors table will tell you what’s going on, and you can figure out that you just need to add a new row:

INSERT INTO eye_colors VALUES ('hazel');
INSERT INTO faces (name, eye_color) VALUES ('Kat', 'hazel' );

So it is self-documenting, but unlike enums it doesn’t do a great job of it. Plus if you have a bunch of set-constrained value types, you can end up with a whole slew of lookup tables. This can make it harder to sort the important tables that contain actual business data from those that are just lookup tables, because there is nothing inherent in them to tell the difference. You could put them into a separate schema, of course, but still, it’s not exactly intuitive.

Given these downsides, I’m not a big fan of using lookup tables for managing what is in fact a simple list of allowed values for a particular column unless those values change frequently. So what else can we do?

Constrain Me

A third approach is to use a table constraint, like so:

CREATE TABLE faces (
    face_id SERIAL PRIMARY KEY,
    name TEXT NOT NULL DEFAULT '',
    eye_color TEXT NOT NULL,
    CONSTRAINT valid_eye_colors CHECK (
        eye_color IN ( 'blue', 'green', 'brown' )
    )
);

No lookup table, no inherent ENUM ordering. And in regular usage it works just like the lookup table example. The usual INSERT and SELECT once again yields:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
  name  | eye_color 
----+------
 David  | blue
 Anna   | blue
 Noriko | brown
 Julie  | green

The error message, however, is a bit more helpful:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
ERROR:  new row for relation "faces" violates check constraint "valid_eye_colors"

A check constraint violation on eye_color is much more informative than a foreign key constraint violation. The downside to a check constraint, however, is that it’s not as self-documenting. You have to look at the entire table in order to find the constraint:

% \d faces
                             Table "public.faces"
  Column   |  Type   |                        Modifiers                        
------+-----+-----------------------------
 face_id   | integer | not null default nextval('faces_face_id_seq'::regclass)
 name      | text    | not null default ''::text
 eye_color | text    | not null
Indexes:
    "faces_pkey" PRIMARY KEY, btree (face_id)
Check constraints:
    "valid_eye_colors" CHECK (eye_color = ANY (ARRAY['blue', 'green', 'brown']))

There it is at the bottom. Kind of tucked away there, eh? At least now we can change it. Here’s how:

ALTER TABLE faces DROP CONSTRAINT valid_eye_colors;
ALTER TABLE faces ADD CONSTRAINT valid_eye_colors CHECK (
    eye_color IN ( 'blue', 'green', 'brown', 'hazel' )
);

Not as straight-forward as updating the lookup table, and much less efficient (because PostgreSQL must validate that existing rows don’t violate the constraint before committing the constraint). But it’s pretty simple and at least doesn’t require the entire table be UPDATEd as with enums. For occasional changes to the value list, a table scan is not a bad tradeoff. And of course, once that’s done, it just works:

INSERT INTO eye_colors VALUES ('hazel');
INSERT INTO faces (name, eye_color) VALUES ('Kat', 'hazel' );

So this is almost perfect for our needs. Only poor documentation persists as an issue.

This is My Domain

To solve that problem, switch to domains. A domain is simply a custom data type that inherits behavior from another data type and to which one or more constraints can be added. It’s pretty simple to switch from the table constraint to a domain:

CREATE DOMAIN eye_color AS TEXT
CONSTRAINT valid_eye_colors CHECK (
    VALUE IN ( 'blue', 'green', 'brown' )
);

CREATE TABLE faces (
    face_id SERIAL PRIMARY KEY,
    name TEXT NOT NULL DEFAULT '',
    eye_color eye_color NOT NULL
);

Nice table declaration, eh? Very clean. Looks exactly like the enum example, in fact. And it works as well as the table constraint:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
  name  | eye_color 
----+------
 David  | blue
 Anna   | blue
 Noriko | brown
 Julie  | green

A constraint violation is a bit more useful than with the table constraint:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
ERROR:  value for domain eye_color violates check constraint "valid_eye_colors"

This points directly to the domain. It’d be nice if it mentioned the violating value the way the enum error did, but at least we can look at the domain out like so:

\dD eye_color
                                                        List of domains
 Schema |   Name    | Type | Modifier |                                         Check                                          
----+------+---+-----+--------------------------------------------
 public | eye_color | text |          | CHECK (VALUE = ANY (ARRAY['blue', 'green', 'brown', 'hazel']))

None of the superfluous stuff about the entire table to deal with, just the constraint, thank you very much. Changing it is just as easy as changing the table constraint:

ALTER DOMAIN eye_color DROP CONSTRAINT valid_eye_colors;
ALTER DOMAIN eye_color ADD CONSTRAINT valid_eye_colors CHECK (
    VALUE IN ( 'blue', 'green', 'brown', 'hazel' )
);

Yep, you can alter domains just as you can alter tables. And of course now it will work:

INSERT INTO eye_colors VALUES ('hazel');
INSERT INTO faces (name, eye_color) VALUES ('Kat', 'hazel' );

And as usual the data is well-ordered when we need it to be:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
  name  | eye_color 
----+------
 David  | blue
 Anna   | blue
 Noriko | brown
 Julie  | green
 Kat    | hazel

And as an added bonus, if you happened to need an eye color in another table, you can just use the same domain and get all the proper semantics. Sweet!

Color Me Happy

Someday I’d love to see support for a PostgreSQL feature like enums, but allowing an arbitrary list of strings that are ordered by the contents of the text rather than the order in which they were declared, and that’s efficient to update. Maybe it could use integers for the underlying storage, too, and allow values to be modified without a table rewrite. Such would be the ideal for this use case. Hell, I’d find it much more useful than enums.

But domains get us pretty close to that without too much effort, so maybe it’s not that important. I’ve tried all of the above approaches and discussed it quite a lot with my colleagues before settling on domains, and I’m quite pleased with it. The only caveat I’d have is that it’s not to be used lightly. If the value set is likely to change fairly often (at least once a week, say), then you’d be better off with the lookup table.

In short, I recommend:

  • For an inherently ordered set of values that’s extremely unlikely to ever change, use an enum.
  • For a set of values that won’t often change and has no inherent ordering, use a domain.
  • For a set of values that changes often, use a lookup table.

What do you use to constrain a column to a defined set of unordered values?

Looking for the comments? Try the old layout.

My Catalyst Tutorial: Add Authors to the View

Another post in my ongoing series of posts on using Catalyst with Template::Declare and DBIx::Connector. This will be the last post covering material from chapter 3, I promise. This is a fun one, though, because we continue to use this really nice DSL called “SQL,” which I think is more expressive than an ORM would be.

To whit, the next task is to add the missing list of authors to the book list. The thing is, the more I work with databases, the more I’m inclined to think about them not only as the “M” in “MVC”, but also the “V”. I’ll show you what I mean.

A Quick Fix

But first, a diversion. In the second post in this series, I created an SQL statement to insert book authors, but I made a mistake: the values for surnames and given names were reversed. Oops. Furthermore, I included explicit author IDs, even though the id column uses a sequence for it’s default value. So first we need to fix these issues. Change the INSERT INTO authors statement in sql/001-books.sql to:

INSERT INTO authors (surname, given_name)
VALUES ('Bastien',      'Greg'),
       ('Nasseh',       'Sara'),
       ('Degu',         'Christian'),
       ('Stevens',      'Richard'),
       ('Comer',        'Douglas'),
       ('Christiansen', 'Tom'),
       ('Torkington',   'Nathan'),
       ('Zeldman',      'Jeffrey')
;

This time, we’re letting the sequence populate the id column. Fortunately, it starts from 1 just like we did, so we don’t need to update the values in the INSERT INTO book_author statement. Now let’s fix the database:

DELETE FROM book_author;
DELETE FROM authors;

Then run the above SQL query to restore the authors with their proper names, and then run the INSERT INTO book_author statement. That will get us back in business.

Constructing our Query

Now it’s time for the fun. The original SQL query we wrote to get the list of books was:

SELECT isbn, title, rating FROM books;

Nothing unusual there. But to get at the authors, we need to join to book_author and from there to authors. Our first cut looks like this:

SELECT b.isbn, b.title, b.rating, a.surname
  FROM books       b
  JOIN book_author ba ON b.isbn       = ba.isbn
  JOIN authors     a  ON ba.author_id = a.id;

Which yields this data:

       isbn        |               title                | rating |   surname    
-------------------+------------------------------------+--------+--------------
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | Bastien
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | Nasseh
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | Degu
 978-0-201-63346-7 | TCP/IP Illustrated, Volume 1       |      5 | Stevens
 978-0-13-018380-4 | Internetworking with TCP/IP Vol.1  |      4 | Comer
 978-1-56592-243-3 | Perl Cookbook                      |      5 | Christiansen
 978-1-56592-243-3 | Perl Cookbook                      |      5 | Torkington
 978-0-7357-1201-0 | Designing with Web Standards       |      5 | Zeldman

Good start, but note how we now have three rows for “CCSP SNRS Exam Certification Guideâ€? and two for “Perl Cookbookâ€?. We could of course modify our Perl code to look at the ISBN in each row and combine as appropriate, but it’s better to get the database to do that work, since it’s designed for that sort of thing. So let’s use an aggregate function to combine the values over multiple rows into a single row. All we have to do is use the column that changes (surname) in an aggregate function and tell PostgreSQL to use the other columns to group rows into one. PostgreSQL 8.4 introduces a really nice aggregate function, array_agg(), for pulling a series of strings together into an array. Let’s put it to use:

SELECT b.isbn, b.title, b.rating, array_agg(a.surname) as authors
  FROM books       b
  JOIN book_author ba ON b.isbn     = ba.isbn
  JOIN authors     a  ON ba.author_id = a.id
 GROUP BY b.isbn, b.title, b.rating;

Now the output is:

       isbn        |               title                | rating |          authors         
-------------------+------------------------------------+--------+--------------------------
 978-0-201-63346-7 | TCP/IP Illustrated, Volume 1       |      5 | {Stevens}
 978-0-13-018380-4 | Internetworking with TCP/IP Vol.1  |      4 | {Comer}
 978-1-56592-243-3 | Perl Cookbook                      |      5 | {Christiansen,Torkington}
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | {Bastien,Nasseh,Degu}
 978-0-7357-1201-0 | Designing with Web Standards       |      5 | {Zeldman}

Much better. We now have a single row for each book, and the authors are all grouped into a single column. Cool. But we can go one step further. Although we could use Perl to turn the array of author surnames into a comma-delimited string, there’s a PostgreSQL function for that, too: array_to_string(). Check it out:

SELECT b.isbn, b.title, b.rating,
       array_to_string(array_agg(a.surname), ', ') as authors
  FROM books       b
  JOIN book_author ba ON b.isbn     = ba.isbn
  JOIN authors     a  ON ba.author_id = a.id
 GROUP BY b.isbn, b.title, b.rating;

Now the rows will be:

       isbn        |               title                | rating |          authors          
-------------------+------------------------------------+--------+--------------------------
 978-0-201-63346-7 | TCP/IP Illustrated, Volume 1       |      5 | Stevens
 978-0-13-018380-4 | Internetworking with TCP/IP Vol.1  |      4 | Comer
 978-1-56592-243-3 | Perl Cookbook                      |      5 | Christiansen, Torkington
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | Bastien, Nasseh, Degu
 978-0-7357-1201-0 | Designing with Web Standards       |      5 | Zeldman

Create a Database View

Cool! All the formatting work is done! But since it’s likely what we’ll often need to fetch book titles along with their authors, let’s create an SQL view for this query. That way, we don’t have to write the same SQL in different places in the application: we can just use the view. So create a new file, sql/002-books_with_authors.sql, and add this SQL:

CREATE VIEW books_with_authors AS
SELECT b.isbn, b.title, b.rating,
       array_to_string(array_agg(a.surname), ', ') as authors
  FROM books       b
  JOIN book_author ba ON b.isbn     = ba.isbn
  JOIN authors     a  ON ba.author_id = a.id
 GROUP BY b.isbn, b.title, b.rating;

Now install this view in the database:

psql -U postgres -d myapp -f sql/002-books_with_authors.sql

And now we can make use of the view any time we want and get the results of the full query. It’s time to do that in our controller. Edit lib/MyApp/Controller/Books.pm and change this line in the list action:

my $sth = $_->prepare('SELECT isbn, title, rating FROM books');

To:

my $sth = $_->prepare(q{
    SELECT isbn, title, rating, authors FROM books_with_authors
});

The use of the q{} operator is a style I use for SQL queries in Perl code; you can use whatever style you like. Since this is a very short SQL statement (thanks to the view), it’s not really necessary to have it on multiple lines, but I like to be fairly consistent about this sort of thing.

The last thing we need to do is a a very simple change to the list template in lib/MyApp/Templates/HTML/Books.pm. In previous posts, I was referring to the non-existent “author” key in the each hash reference fetched from the database. In the new view, however, I’ve named that column “authors”. So change this line:

cell { $book->{author} };

To

cell { $book->{authors} };

And that’s it. Restart the server and reload http://localhost:3000/books/list and you should now see all of the books listed with their authors.

Notes

I think you can appreciate why, to a certain degree, I’m starting to think of the database as handling both the “M” and the “V” in “MVC”. It’s no mistake that the database object we created is known as a “view”. It was written in such a way that it not only expressed the relationship between books and authors in a compact but clear way, but it formatted the appropriate data for publishing on the site—all in a single, efficient query. All the Template::Declare view does is wrap it all up in the appropriate HTML.

PostgreSQL isn’t the only database to support feature such as this, by the way. All of the databases I’ve used support views, and many offer useful aggregate functions, as well. Among the MySQL aggregates, for example, is group_concat(), which sort of combines the array_to_string(array_agg()) PostgreSQL syntax into a single function. And I’ve personally written a custom aggregate for SQLite in Perl. So although I use PostgreSQL for these examples and make use of its functionality, you can do much the same thing in most other databases.

Either way, I find this to be a lot less work than using an ORM or other abstraction layer between my app and the database. Frankly, SQL provides just the right level of abstraction.

Looking for the comments? Try the old layout.

pgTAP Best Practices Slides Available

Last month I gave two presentations at the PostgreSQL Conference West. The first was a variation on Unit Test Your Database!, which I’ve now given about six times (and will at least two more times, including tomorrow night for Portland PLUG and in two weeks at the JPUG 10th Anniversary Conference). The second was a new talk, a 90-minute tutorial, called “pgTAP Best Practices.” And here it is, published on Slideshare. Enjoy.

You can also download a 76MB PDF if that’s more your style.

Enjoy!

Looking for the comments? Try the old layout.