Just a Theory

By David E. Wheeler

Posts about VCS

Sqitch: Back to the VCS

On reflection, the one thing that bothers me about the proposal to abandon the VCS in yesterday’s post is that things could be deployed out-of-order. Take these two sections from the example plan toward the end of the post:

# Some procedures.
+add_user :roles :users_table
+del_user :@alpha
-dr_evil
+upd_user :add_user !dr_evil
@beta     # woo!

+crypto
+add_user :roles :users_table :crypto
@gamma

If this was run on an empty database, it would be deployed in this order:

  • +crypto
  • +add_user
  • +del_user
  • -dr_evil
  • +upd_user
  • @beta
  • @gamma

Notice that crypto would be deployed before the @beta tag, despite the fact that it appears in the plan after @beta. Yes, one would end up with the correct database, but the actual deployment varies from the plan in a way that might be disconcerting to a user. I don’t like it.

Another issue is that there is no way to go back to @beta here, because there are no previous copies of add_user in the database. In theory this probably would not often be a big deal, but in practice, hey, sometimes you just need to go back in time no matter what. Maybe you need to repeatedly test a data migration (as opposed to a DDL change). I can certainly imagine needing that flexibility. So much for my “insight.”

I’m back to thinking about VCS integration again. I know I keep going back and forth on this stuff. I apologize. I appreciate the thoughtful comments and feedback I’ve received, and am taking the time to try to get this stuff right so that I can stop thinking about it in the future. I really want to reduce the complexity of database change management, but retain flexibility for those who need it. So yeah, I keep taking two steps forward and one step back, but there is overall forward momentum (I have had to trash less code than I expected).

The Duplicative Pattern

Back to my original complaint from yesterday: how can the plan indicate where to look in the VCS history for a particular copy of a file? As a corollary: is is possible to also support folks not using a VCS (which was one of the advantages to yesterday’s proposal)? Let’s take this plan as a working example:

%plan-syntax-v1
+users_table
+add_user :users_table
+del_user :users_table
+upd_user :users_table

Let’s say that we need to fix a bug in add_user. First we have to add it to the plan again:

% sqitch add add_user
Error: Step "add_user" already exists. Add a tag to modify it.

So we can’t repeat a step within a tag (or, in this case, when there are no tags). Let’s tag it and try again:

% sqitch tag alpha
% sqitch add add_user -vv
notice: Copied sql/deploy/add_user.sql to sql/deploy/add_user@alpha.sql
notice: Copied sql/revert/add_user.sql to sql/revert/add_user@alpha.sql
notice: Copied sql/deploy/add_user.sql to sql/revert/add_user.sql
notice: Copied sql/test/add_user.sql to sql/test/add_user@alpha.sql
Backed up previous version of "add_user"
Added "add_user" step. Modify these files:
sql/deploy/add_user.sql
sql/revert/add_user.sql
sql/test/add_user.sql

I use added verbosity (-vv) here to show what files are copied for the “backup” (the “notice” lines). So now the plan looks like this:

%plan-syntax-v1
+users_table
+add_user :users_table
+del_user :users_table
+upd_user :users_table
@alpha

+add_user :add_user@alpha

Note how the new step implicitly requires the previous version of the step (as of @alpha), and thus all of its dependencies. This is a clean way to “upgrade” the step.

Now you can edit sql/deploy/add_user.sql to make your changes, starting with the existing code. You can also edit sql/test/add_user.sql in order to update the tests for the new version. You don’t need to edit sql/revert/add_user.sql unless your changes are not idempotent.

Of course, this pattern leads to all the code duplication I complained about before, but there is nothing to be done in the absence of a VCS. The advantage is that we retain a complete history, so we can go back and forth as much as we want, regardless of whether we’re updating an existing database or creating a new database. The only change I need to make to the plan syntax is to ban the use of @ in step and tag names. Probably a good idea, anyway.

By the way, if we omit the extra verbosity, the output would look like this:

% sqitch add add_user
Backed up previous version of "add_user"
Added "add_user" step. Modify these files:
sql/deploy/add_user.sql
sql/revert/add_user.sql
sql/test/add_user.sql

Other than the “Backed up” line, the output is the same as for adding any other step. Maybe there would be something to say that the previous version was copied to the new version. But the point is to make it clear to the user what files are available to be edited.

VCS Integration

Let’s try again with a VCS. Starting at the same point as in the non-VCS example, we try to add a step

% sqitch add add_user
Error: Step "add_user" already exists. Add a tag to modify it.

So add the tag:

% sqitch tag alpha
% sqitch add add_user
Error: Cannot find reference to @alpha in the Git history. Please run `git tag alpha` to continue.

In order to be sure that we can use the VCS history, we need the tag there. Perhaps we could automatically add the tag in Git via sqitch tag, or have an option to do it. Either way, we need to have the same tag in the VCS so that we can travel back in time. So let’s do it:

% git tag alpha -am 'Tag alpha for Sqitch.'
% sqitch add add_user
Added "add_user" step. Modify these files:
sql/deploy/add_user.sql
sql/revert/add_user.sql
sql/test/add_user.sql

Note the lack of a “Backed up” line. It’s not necessary, since the code is already preserved in the Git history. Now we can edit the files in place and commit them to Git as usual. Sqitch will ask Git for the add_user step files as of the alpha tag when deploying the first instance of the step, and the current version for the latter. One can add add_user as many times as one likes, as long as there are always tags between instances.

Unbungled Bundle

Here’s the clincher for this iteration of this thing. My original plan for bundling (that is, packaging up the plan and change scripts for distribution outside the VCS) had the generation of a plan with different names than the plan in the VCS. That is, running:

% sqitch bundle

Against the above plan, the resulting plan in bundle/sqitch.plan would look something like this:

%plan-syntax-v1
+users_table
+add_user :users_table
+del_user :users_table
+upd_user :users_table
@alpha

+add_user_v2 :add_user

Note the add_user_v2 step there. Doesn’t exist in the original plan in the VCS, but was necessary in order to generate the change scripts for distribution bundling, so that all steps could be available for deployment outside the VCS:

% ls bundle/sql/deploy/add_user*
bundle/sql/deploy/add_user.sql
bundle/sql/deploy/add_user_v2.sql

This meant that databases deployed from the VCS would have a different deployment plan (and deployment history) than those deployed from a tarball distributed with the bundled plan. But hey, if we can create the files with the tag name in them for the non-VCS environment, we can do the same when bundling. So now, the bundled plan will be exactly the same as in the VCS, and the migration files will just be named with the tag, as appropriate:

% ls bundle/sql/deploy/add_user*
bundle/sql/deploy/add_user.sql
bundle/sql/deploy/add_user@alpha.sql

Much better. Much more consistent throughout. And must less stuff stored in the database to boot (no full deploy scripts copied to the DB).

Back to Work

So, I’m off to to continue modifying the plan code to support the syntax I proposed yesterday post, as well as the ability to have duplicate steps under different tags. Then I will start working on this proposal for how to name scripts and duplicate them.

That is, assuming that nothing else comes up to make me revise my plans again. Man, I sure hope not. This proposal nicely eliminates inconsistencies in the plan and deployment history regardless of whether deploying from a VCS, a bundled distribution, or a non-VCS project, the results should be the same. And it was those inconsistencies that I had been struggling with.

But hey, if I have overlooked something (again!), please do let me know.

Looking for the comments? Try the old layout.

The Ever Evolving Sqitch Plan

I’ve been working on the parser for the proposed new deployment plan format, and spent a day thinking it wasn’t going to work at all. But by the end of the day yesterday, I was back on board with it. I think I’ve also figured out how to eliminate the VCS dependency (and thus a whole level of complication). So first, the plan format:

  • Names of things (steps and tags) cannot start or end in punctuation characters
  • @ indicates a tag
  • + indicates a step to be deployed
  • - indicates a step to be reverted
  • # starts a comment
  • : indicates a required step
  • ! indicates a conflicting step
  • % is for Sqitch directives

So, here’s an example derived from a previous example:

%plan-syntax-v1
+roles
+users_table
+dr_evil
@alpha

# Some procedures.
+add_user :roles :users_table
+del_user :@alpha
-dr_evil
+upd_user :add_user !dr_evil
@beta     # woo!

So we start with a directive for the version of the plan file (thanks for the suggestion, Tom Davis!). Then we have deployments of the roles, users_table, and dr_evil steps. After that, it’s tagged as alpha.

Next, we have a comment, then the deployment of the add_user step. It requires that the roles and users_table steps be deployed. Then we deploy del_user. It requires all steps as of the $alpha tag. Next we revert the dr_evil step. Why? Because the next line deploys upd_user, which conflicts with dr_evil (and requires add_user). And finally, we tag it as beta.

There are a number of things I like about these changes:

  • Dependencies are spelled out in the plan file, rather than the deploy scripts. This allows the deploy scripts to have nothing special about them at all.

  • We now have a way to explicitly revert a step as part of the overall plan. This is useful for ensuring that conflicts can be dealt with.

  • We can deploy to any point in the plan by specifying a step:

    sqitch deploy add_user
    

    Or a tag:

    sqitch deploy @alpha
    

    For steps that are duplicated, we can disambiguate by specifying a tag:

    sqitch deploy dir_evil --as-of @alpha
    

    Naturally, this requires that a step not be repeated within the scope of a single tag.

Now, as for the VCS dependency, my impetus for this was to allow Sqitch to get earlier versions of a particular deploy script, so that it could be modified in place and redeployed to make changes inline, as described in an earlier post. However I’ve been troubled as to how to indicate in the plan where to look in the VCS history for a particular copy of a file. Yesterday, I had an insight: why do I need the earlier version of a particular deploy script at all? There are two situations where it would be used, assuming a plan that mentions the same step at two different points:

  1. To run it as it existed at the first point, and to run it the second time as it exists at that time.
  2. To run it in order to revert from the second point to the first.

As to the first, I could not think of a reason why that would be necessary. If I’m bootstrapping a new database, and the changes in that file are idempotent, is it really necessary to run the earlier version of the file at all? Maybe it is, but I could not think of one.

The second item is the bit I wanted, and I realized (thanks in part to prompt from Peter van Hardenberg while at PGCon) that I don’t need a VCS to get the script as it was at the time it was deployed. Instead, all I have to do is store the script in the database as it was at the time it was run. Boom, reversion time travel without a VCS.

As an example, take the plan above. Say we have a database that has been deployed all the way to @beta. Let’s add the add_user step again:

%plan-syntax-v1
+roles
+users_table
+dr_evil
@alpha

# Some procedures.
+add_user :roles :users_table
+del_user :@alpha
-dr_evil
+upd_user :add_user !dr_evil
@beta     # woo!

+crypto
+add_user :roles :users_table :crypto
@gamma

The last two lines are the new ones. At this point, the sql/deploy/add_user.sql script has been modified to fix a bug that now requires the crypto step. If we deploy to a new database, Sqitch will notice that the same step is listed twice and apply it only once. This works because, even though add_user is listed before pg_crypto, it is actually applied as described in its second declaration. So the run order would be:

  • crypto
  • add_user
  • del_user
  • upd_user

Note that this works because crypto declares no dependencies itself. If it did, it would be shuffled as appropriate. It would not work if it required, say, upd_user, as that would create a circular dependency (add_usercryptoupd_useradd_user).

Now say we want to deploy the changes to the production database, which is currently on @beta. That simply runs:

  • crypto
  • add_user

If something went wrong, and we needed to revert, all Sqitch has to do is to read add_user from the database, as it was deployed previously, and run that. This will return the add_user function to its previous state. So, no duplication and no need for a VCS.

The one thing that scares me a bit is being able to properly detect circular dependencies in the plan parser. I think it will be pretty straight-forward for steps that require other steps. Less so for steps that require tags. Perhaps it will just have to convert a tag into an explicit dependence on all steps prior to that tag.

So, I think this will work. But I’m sure I must have missed something. If you notice it please enlighten me in the comments. And thanks for reading this far!

Looking for the comments? Try the old layout.

Sqitch Update: The Plan

I gave my first presentation on Sqitch at PGCon last week. The slides are on Slideshare and the PGCon site. It came together at the last minute, naturally. I was not able to pay as close attention to PGCon sessions as I would have liked, as I was doing last minute hacking to get the deploy command working on PostgreSQL, and then writing the slides (which are based on the tutorial). I was pleased with the response, given that this is very much a project that is still under heavy development and available only as a very very early alpha. There was great discussion and feedback afterward, which I appreciate.

A number of folks offered to help, too. For that I am grateful. I’ve started a list of to-dos to give folks a starting point. Please fork and hack! Find me on #sqitch on Freenode for questions/comments/discussion.

But back to the guts. As a result of the work on the deploy command, as well as thinking about how I and my co-workers do database development with Git, I am starting to revise how I think about the deployment plan. You see, I personally often make a lot of changes to a deployment script as I develop a database, generally over many commits and even many days or weeks. If I were to then rely on the Git history to do deployments, it would probably work, but there might be ten times as many deployments as I actually need, just to get it from zero to release state. I had originally thought that using sqitch bundle --tags-only to create a bundle with a written plan would get around this, as it would write a plan file with only VCS tags for Sqitch tags, rather than every commit. That might be okay for releases, but still not great for the developers, such as myself, who will be using Sqitch as part of the development process all day long.

So now I’m thinking more that Sqitch should rely on an explicit plan file (which was to be the preferred method, if it existed, all along) rather than VCS history. That is, the plan file would be required, and a new command, sqitch plan, will allow one to interactively add steps and tags to it. It would also make it easier for the developer to hand-edit, as appropriate, so as not to rely on a funky Git history.

So I’m toying with changing the plan format, which up until now looked likes this:

[alpha]
foo
bar
init

[beta]
users
insert_user
delete_user
update_user

[gamma]
widgets
insert_widget

Each item in brackets is a tag, and each item below is a deployment step (which corresponds to a script) that is part of that tag. So if you deployed to the beta tag, it would deploy all the way up to update_user step. You could only specify tags for deployment, and either all the steps for a given tag succeeded or they failed. When you added a step, it was added to the most recent tag.

I came up with this approach by playing with git log. But now I’m starting to think that it should feel a bit more gradual, where steps are added and a tag is applied to a certain step. Perhaps a format like this:

foo
bar
init
@alpha

users
insert_user
delete_user
update_user
@beta

widgets
insert_widget

With this approach, one could deploy or revert to any step or tag. And a tag is just added to a particular step. So if you deployed to @beta, it would run all the steps through update_user, as before. But you could also update all, deploy through insert_widget, and then the current deployed point in the database would not have a tag (could perhaps use a symbolic tag, HEAD?).

I like this because it feels a bit more VCS-y. It also makes it easier to add steps to the plan without worrying about tagging before one was ready. And adding steps and tags can be automated by a sqitch plan command pretty easily.

So the plan file becomes the canonical source for deployment planning, and is required. What we’ve lost, however, is the ability to use the same step name at different points in the plan, and to get the proper revision of the step by traveling back in VCS history for it. (Examples of what I mean are covered in a previous post, as well as the aforementioned presentation.) However, I think that we can still do that by complementing the plan with VCS history.

For example, take this plan:

foo
bar
init
@alpha

users
insert_user
delete_user
update_user
@beta

insert_user
update_user
@gamma

Note how insert_user and update_user repeat. Normally, this would not be allowed. But if the plan is in a VCS, and if that VCS has tags corresponding to the tags, then we might allow it: when deploying, each step would be deployed at the point in time of the tag that follows it. In other words:

  • foo, bar, and init would be deployed as of the alpha tag.
  • users, insert_user, delete_user, and update_user would be deployed as they were as of the beta tag.
  • insert_user and update_user would again be deployed, this time as of the gamma tag.

This is similar to what I’ve described before, in terms of where in VCS history steps are read from. But whereas before I was using the VCS history to derive the plan, I am here reversing things, requiring an explicit plan and using its hints (tags) to pull stuff from the VCS history as necessary.

I think this could work. I am not sure if I would require that all tags be present, or only those necessary to resolve duplications (both approaches feel a bit magical to me, though I haven’t tried it yet, either). The latter would probably be more forgiving for users. And overall, I think the whole approach is less rigid, and more likely to allow developers to work they way they are used to working.

But I could be off my rocker entirely. What do you think? I want to get this right, please, if you have an opinion here, let me have it!

Looking for the comments? Try the old layout.

More about…

Sqitch — VCS-powered SQL Change Management

Back in January, I wrote three posts outlining some ideas I had about a straight-forward, sane way of managing SQL change management. The idea revolved around specifying scripts to deploy and revert in a plan file, and generating that plan file from VCS history. I still feel pretty good about the ideas there, and work has agreed to let me write it and open-source it. Here is the first step making it happen. I call it “Sqitch.”

Why “Sqitch”? Think of it as SQL changes with Git stuck in the middle. Of course I expect to support VCSs other than Git (probably Subversion and Mercurial, though I am not sure yet), but since Git is what I now have the most familiarity with, I thought it kind of fun to kind of reference a VCS in the name, if only obliquely.

This week, I started work on it. My first task is to outline a draft for the interface. Sqitch will be a command-line tool, primarily. The remainder of this post contains the documentation for the draft interface. Thoughts and feedback would be greatly appreciated, especially if you think I’ve overlooked anything! I do want to keep features pretty minimal for now, though, to build up a solid core to be built on later. But other than that, your criticism is greatly desired.

Next up, I will probably write a tutorial, just so I can make my way through some real-life(ish) examples and notice if I missed anything else. Besides, I’m going to need the tutorial myself! Watch for that next week.

Thanks!


Name

Sqitch - VCS-powered SQL change management

Synopsis

sqitch [<options>] <command> [<command-options>] [<args>]

Description

Sqitch is a VCS-aware SQL change management application. What makes it different from your typical migration-style approaches? A few things:

No opinions

Sqitch is not integrated with any framework, ORM, or platform. Rather, it is a standalone change management system with no opinions on your database or development choices.

Native scripting

Changes are implemented as scripts native to your selected database engine. Writing a PostgreSQL application? Write SQL scripts for psql. Writing a MySQL-backed app? Write SQL scripts for mysql.

VCS integration

Sqitch likes to use your VCS history to determine in what order to execute changes. No need to keep track of execution order, your VCS already tracks information sufficient for Sqitch to figure it out for you.

Dependency resolution

Deployment steps can declare dependencies on other deployment steps. This ensures proper order of execution, even when you’ve committed changes to your VCS out-of-order.

No numbering

Change deployment is managed either by maintaining a plan file or, more usefully, your VCS history. As such, there is no need to number your changes, although you can if you want. Sqitch does not care what you name your changes.

Packaging

Using your VCS history for deployment but need to ship a tarball or RPM? Easy, just have Sqitch read your VCS history and write out a plan file with your change scripts. Once deployed, Sqitch can use the plan file to deploy the changes in the proper order.

Reduced Duplication

If you’re using a VCS to track your changes, you don’t have to duplicate entire change scripts for simple changes. As long as the changes are idempotent, you can change your code directly, and Sqitch will know it needs to be updated.

Terminology

step

A named unit of change. A step name must be used in the file names of its corresponding deployment and a reversion scripts. It may also be used in a test script file name.

tag

A known deployment state with a list one or more steps that define the tag. A tag also implies that steps from previous tags in the plan have been applied. Think of it is a version number or VCS revision. A given point in the plan may have one or more tags.

state

The current state of the database. This is represented by the most recent tag or tags deployed. If the state of the database is the same as the most recent tag, then it is considered “up-to-date”.

plan

A list of one or more tags and associated steps that define the order of deployment execution. Sqitch reads the plan to determine what steps to execute to change the database from one state to another. The plan may be represented by a “Plan File” or by VCS history.

deploy

The act of deploying database changes to reach a tagged deployment point. Sqitch reads the plan, checks the current state of the database, and applies all the steps necessary to change the state to the specified tag.

revert

The act of reverting database changes to reach an earlier tagged deployment point. Sqitch checks the current state of the database, reads the plan, and applies reversion scripts for all steps to return the state to an earlier tag.

Options

-p --plan-file  FILE    Path to a deployment plan file.
-e --engine     ENGINE  Database engine.
-c --client     PATH    Path to the engine command-line client.
-d --db-name    NAME    Database name.
-u --username   USER    Database user name.
-h --host       HOST    Database server host name.
-n --port       PORT    Database server port number.
   --sql-dir    DIR     Path to directory with deploy and revert scripts.
   --deploy-dir DIR     Path to directory with SQL deployment scripts.
   --revert-dir DIR     Path to directory with SQL reversion scripts.
   --test-dir   DIR     Path to directory with SQL test scripts.
   --extension  EXT     SQL script file name extension.
   --dry-run            Execute command without making any changes.
-v --verbose            Increment verbosity.
-V --version            Print the version number and exit.
-H --help               Print a usage statement and exit.
-M --man                Print the complete documentation and exit.

Options Details

-p
--plan-file

sqitch –plan-file plan.conf sqitch -p sql/deploy.conf

Path to the deployment plan file. Defaults to ./sqitch.plan. If this file is not present, Sqitch will attempt to read from VCS files. If no supported VCS system is in place, an exception will be thrown. See “Plan File” for a description of its structure.

-e
--engine

sqitch –engine pg sqitch -e sqlite

The database engine to use. Supported engines include:

-c
--client

sqitch –client /usr/local/pgsql/bin/psql sqitch -c /usr/bin/sqlite3

Path to the command-line client for the database engine. Defaults to a client in the current path named appropriately for the specified engine.

-d
--db-name

Name of the database. For some engines, such as PostgreSQL and MySQL, the database must already exist. For others, such as SQLite, the database will be automatically created on first connect.

-u
--user
--username

User name to use when connecting to the database. Does not apply to all engines.

-h
--host

Host name to use when connecting to the database. Does not apply to all engines.

-n
--port

Port number to connect to. Does not apply to all engines.

--sql-dir

sqitch –sql-dir migrations/

Path to directory containing deployment, reversion, and test SQL scripts. It should contain subdirectories named deploy, revert, and (optionally) test. These may be overridden by --deploy-dir, --revert-dir, and --test-dir. Defaults to ./sql.

--deploy-dir

sqitch –deploy-dir db/up

Path to a directory containing SQL deployment scripts. Overrides the value implied by --sql-dir.

--revert-dir

sqitch –revert-dir db/up

Path to a directory containing SQL reversion scripts. Overrides the value implied by --sql-dir.

--test-dir

sqitch –test-dir db/t

Path to a directory containing SQL test scripts. Overrides the value implied by --sql-dir.

--extension

sqitch –extension ddl

The file name extension on deployment, reversion, and test SQL scripts. Defaults to sql.

--dry-run

sqitch –dry-run

Execute the Sqitch command without making any actual changes. This allows you to see what Sqitch would actually do, without doing it. Implies a verbosity level of 1; add extra --verboses for greater verbosity.

-v
--verbose

sqitch –verbose -v

A value between 0 and 3 specifying how verbose Sqitch should be. The default is 0, meaning that Sqitch will be silent. A value of 1 causes Sqitch to output some information about what it’s doing, while 2 and 3 each cause greater verbosity.

-H
--help

sqitch –help sqitch -H

Outputs a brief description of the options supported by sqitch and exits.

-M
--man

sqitch –man sqitch -M

Outputs this documentation and exits.

-V
--version

sqitch –version sqitch -V

Outputs the program name and version and exits.

Sqitch Commands

init

Initialize the database and create deployment script directories if they do not already exist.

status

Output information about the current status of the deployment, including a list of tags, deployments, and dates in chronological order. If any deploy scripts are not currently deployed, they will be listed separately.

check

Sanity check the deployment scripts. Checks include:

  • Make sure all deployment scripts have complementary reversion scripts.

  • Make sure no deployment script appears more than once in the plan file.

deploy

Deploy changes. Configuration properties may be specified under the [deploy] section of the configuration file, or via sqitch config:

sqitch config deploy.$property $value

Options and configuration properties:

--to

Tag to deploy up to. Defaults to the latest tag or to the VCS HEAD commit. Property name: deploy.to.
revert

Revert changes. Configuration properties may be specified under the [revert] section of the configuration file, or via sqitch config:

sqitch config revert.$property $value

Options and configuration properties:

--to

Tag to revert to. Defaults to reverting all changes. Property name: revert.to.
test

Test changes. All SQL scripts in --test-dir will be run. [XXX Not sure whether to have subdirectories for tests and expected output and to diff them, or to use some other approach.]

config

Set configuration options. By default, the options will be written to the local configuration file, sqitch.ini. Options:

--get

Get the value for a given key. Returns error code 1.
--unset

Remove the line matching the key from config file.

--list

List all variables set in config file.

--global

For writing options: write to global ~/.sqitch/config.ini file rather than the local sqitch.ini.

For reading options: read only from global ~/.sqitch/config.ini rather than from all available files.

--system

For writing options: write to system-wide $prefix/etc/sqitch.ini file rather than the local sqitch.ini.

For reading options: read only from system-wide $prefix/etc/sqitch.ini rather than from all available files.

--config-file

Use the given config file.

package

Package up all deployment and reversion scripts and write out a plan file. Configuration properties may be specified under the [package] section of the configuration file, or via sqitch config package.$property $value command. Options and configuration properties:

--from

Tag to start the plan from. All tags and steps prior to that tag will not be included in the plan, and their change scripts Will be omitted from the package directory. Useful if you’ve rejiggered your deployment steps to start from a point later in your VCS history than the beginning of time. Property name: package.from.
--to

Tag with which to end the plan. No steps or tags after that tag will be included in the plan, and their change scripts will be omitted from the package directory. Property name: package.to.

--tags-only

Write the plan file with deployment targets listed under VCS tags, rather than individual commits. Property name: package.tags_only.

--destdir

Specify a destination directory. The plan file and deploy, revert, and test directories will be written to it. Defaults to “package”. Property name: package.destdir.

Configuration

Sqitch configuration information is stored in standard INI files. The # and ; characters begin comments to the end of line, blank lines are ignored.

The file consists of sections and properties. A section begins with the name of the section in square brackets and continues until the next section begins. Section names are not case sensitive. Only alphanumeric characters, - and . are allowed in section names. Each property must belong to some section, which means that there must be a section header before the first setting of a property.

All the other lines (and the remainder of the line after the section header) are recognized as setting properties, in the form name = value. Leading and trailing whitespace in a property value is discarded. Internal whitespace within a property value is retained verbatim.

All sections are named for commands except for one, named “core”, which contains core configuration properties.

Here’s an example of a configuration file that might be useful checked into a VCS for a project that deploys to PostgreSQL and stores its deployment scripts with the extension ddl under the migrations directory. It also wants packages to be created in the directory _build/sql, and to deploy starting with the “gamma” tag:

[core]
    engine    = pg
    db        = widgetopolis
    sql_dir   = migrations
    extension = ddl

[revert]
    to        = gamma

[package]
    from      = gamma
    tags_only = yes
    dest_dir  = _build/sql

Core Properties

This is the list of core variables, which much appear under the [core] section. See the documentation for individual commands for their configuration options.

plan_file

The plan file to use. Defaults to sqitch.ini or, if that does not exist, uses the VCS history, if available.

engine

The database engine to use. Supported engines include:

client

Path to the command-line client for the database engine. Defaults to a client in the current path named appropriately for the specified engine.

db_name

Name of the database.

username

User name to use when connecting to the database. Does not apply to all engines.

password

Password to use when connecting to the database. Does not apply to all engines.

host

Host name to use when connecting to the database. Does not apply to all engines.

port

Port number to connect to. Does not apply to all engines.

sql_dir

Path to directory containing deployment, reversion, and test SQL scripts. It should contain subdirectories named deploy, revert, and (optionally) test. These may be overridden by deploy_dir, revert_dir, and test_dir. Defaults to ./sql.

deploy_dir

Path to a directory containing SQL deployment scripts. Overrides the value implied by sql_dir.

revert_dir

Path to a directory containing SQL reversion scripts. Overrides the value implied by sql_dir.

test_dir

Path to a directory containing SQL test scripts. Overrides the value implied by sql_dir.

extension

The file name extension on deployment, reversion, and test SQL scripts. Defaults to sql.

Plan File

A plan file describes the deployment tags and scripts to be run against a database. In general, if you use a VCS, you probably won’t need a plan file, since your VCS history should be able to provide all the information necessary to derive a deployment plan. However, if you really do need to maintain a plan file by hand, or just want to better understand the file as output by the package command, read on.

Format

The contents of the plan file are plain text encoded as UTF-8. It is divided up into sections that denote deployment states. Each state has a bracketed, space-delimited list of one or more tags to identify it, followed by any number of deployment steps. Here’s an example of a plan file with a single state and a single step:

[alpha]
users_table

The state has one tag, named “alpha”, and one step, named “users_table”. A state may of course have many steps. Here’s an expansion:

[root alpha]
users_table
insert_user
update_user
delete_user

This state has two tags, “root” and “alpha”, and four steps, “users_table”, “insert_user”, “update_user”, and “delete_user”.

Most plans will have multiple states. Here’s a longer example with three states:

[root alpha]
users_table
insert_user
update_user
delete_user

[beta]
widgets_table
list_widgets

[gamma]
ftw

Using this plan, to deploy to the “beta” tag, the “root”/“alpha” state steps must be deployed, as must the “beta” steps. To then deploy to the “gamma” tag, the “ftw” step must be deployed. If you then choose to revert to the “alpha” tag, then the “gamma” step (“ftw”) and all of the “beta” steps will be reverted in reverse order.

Using this model, steps cannot be repeated between states. One can repeat them, however, if the contents for a file in a given tag can be retrieved from a VCS. An example:

[alpha]
users_table

[beta]
add_widget
widgets_table

[gamma]
add_user

[44ba615b7813531f0acb6810cbf679791fe57bf2]
widgets_created_at

[HEAD epsilon master]
add_widget

This example is derived from a Git log history. Note that the “add_widget” step is repeated under the state tagged “beta” and under the last state. Sqitch will notice the repetition when it parses this file, and then, if it is applying all changes, will fetch the version of the file as of the “beta” tag and apply it at that step, and then, when it gets to the last tag, retrieve the deployment file as of its tags and apply it. This works in reverse, as well, as long as the changes in this file are always idempotent.

Grammar

Here is the EBNF Grammar for the plan file:

plan-file   = { <state> | <empty-line> | <comment> }* ;

state       = <tags> <steps> ;

tags        = "[" <taglist> "]" <line-ending> ;
taglist     = <name> | <name> <white-space> <taglist> ;

steps       = { <step> | <empty-line> | <line-ending> }* ;
step        = <name> <line-ending> ;

empty-line  = [ <white-space> ] <line-ending> ;
line-ending = [ <comment> ] <EOL> ;
comment     = [ <white-space> ] "#" [ <string> ] ;

name        = ? non-white space characters ? ;
white-space = ? white space characters ? ;
string      = ? non-EOL characters ? ;

See Also

The original design for Sqitch was sketched out in a number of blog posts:

Other tools that do database change management include:

Rails migrations

Numbered migrations for Ruby on Rails.

Module::Build::DB

Numbered changes in pure SQL, integrated with Perl’s Module::Build build system. Does not support reversion.

DBIx::Migration

Numbered migrations in pure SQL.

Versioning

PostgreSQL-specific dependency-tracking solution by depesz.

Author

David E. Wheeler <david@justatheory.com>

License

Copyright © 2012 iovation Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Looking for the comments? Try the old layout.

Git-R-Done

Just a quick followup on the completion of the Bricolage Git migration last week, today I completed writing up a set of GitHub wiki documents explaining to my fellow Bricoleurs how to start hacking. The most important bits are:

  • Working with Git, explaining how to get set up with a forked Bricolage repository
  • Contributing a Bug Fix, an intro to the Git way of doing things (as far as I understand it)
  • Working with Branches, describing how to track a maintenance branch in your fork
  • Merging with Git, to cover the frequent merging from Bricolage maintenance branches into master, and how to get said merges pushed upstream
  • Starting a Project Branch, which you’d need to read if you were taking on a major development task, such as a Summer of Code project
  • Contributing via Email, for those who don’t want a GitHub account (needs fleshing out)
  • Creating a Release, in which the fine art of branching, tagging, and releasing is covered

If you’re familiar with the “Git way,” I would greatly appreciate your feedback on these documents. Corrections and comments would be greatly appreciated.

I also just wanted to say that the process of reconstructing the merge history from CVS and Subversion was quite an eye-opener for me. Not because it was difficult (it was) and required a number of hacks (it did), but because it highlighted just how much better a fit Git is for the way in which we do Open Source software development. Hell, probably closed-source, too, for that matter. I no longer will have to think about what revisions to include in a merge, or create a branch just to “tag” a merge. Hell, I’ll probably be doing merges a hell of a lot more often, just because it’s so easy, the history remains intact, and everything just stays more up-to-date and closely integrated.

But I also really appreciate the project-based emphasis of Git. A Subversion repository, I now realize, is really very much like a versioned file system. That means where things go is completely ad-hoc, or convention-driven at best. And god forbid if you decide to change the convention and move stuff around! It’s just so much more sane to get a project repository, with all of the history, branches, tags, merges, and everything else, all in one package. It’s more portable, it’s a hell of a lot faster (ever tried to check out a Subversion repository with 80 tags?), and just tighter. it encourages modularization, which can only be good. I’ll tell you, I expect to have some frustrations and challenges as I learn more about using Git, but I’m already very much happier with the overall philosophy.

Enough evangelizing. As a last statement on this, I’ve uploaded the Perl scripts I wrote to do this migration, just in case someone else finds them useful:

  • bric_cvs_to_git migrated a CVS backup to Git.
  • bric_to_git migrated Subversion from r5517 to Git.
  • stitch stitched the CVS-migrated Git repository into the Subversion-migrated Git repository for a final product.

It turned out that there were a few files lost in the conversion, which I didn’t notice until after all was said and done, but overall I’m very happy. My thanks again to Ask and the denizens of #git for all the help.

Looking for the comments? Try the old layout.

Migrating Bricolage CVS and SVN to Git

Now that I’ve successfully migrated the old Bricolage SourceForge CVS repository to Git, and also migrated Subversion to Git, it’s time to stitch the two repositories together into one with all history intact. I’m glad to say that figuring out how to do so took substantially less time than the first two steps, thanks in large part to the great help from “doener,” “Ilari,” and “Fissure” on the Freenode #git channel.

Actually, they helped me with a bit more tweaking of my CVS and Subversion conversions. One thing I realized after writing [yesterday’s post]migrated Subversion was that, after running git filter-branch, I had twice as many commits as I should have had. It turns out that git filter-branch rewrites all commits, but keeps the old ones around in case you mess something up. doener also pointed out that I wasn’t having all grafts properly applied, because git filter-branch only applies to the currently checked-out branch. To get all of the branches, he suggested that I read the git-filter-branch documentation, where I’ll find that git filter-branch --tag-name-filter cat -- --all would hit all branches. Actually, such was not clear to me from the documentation, but I took his word for it. Once I did that, to get rid of the dupes, all I had to do was git clone the repository to a new repository. And that was that.

This worked great for my CVS migration, but I realized that I also wanted to clean out metadata from the Subversion migration. Of course, git clone throws out most of the metadata, but git svn also stores some metadata at the end of every commit log message, like this:

git-svn-id: file:///Users/david/svn/bricolage/trunk@8581 e630fb3e-2914-11de-beb6-f300316f8eb1

This had been very handy as I looked through commits in GitX to find parents to set up for grafts, but with that done and everything grafted, I no longer needed it. Ilari helped me to figure out how to properly use git filter-branch to get rid of those. To do it, all I had to do was add a filter for commit messages, like so:

git filter-branch --msg-filter \
'perl -0777 -pe "s/\r?\ngit-svn-id:.+\n//ms"' \
--tag-name-filter cat -- --all

This properly strips out that ugly bit of metadata and finalizes the grafts all at the same time. Very nice.

Now it was time to combine these two repositories for a single unified history. I wasn’t able to find a good tutorial for this on the web, other than one that used a third-party Debian utility and only hooked up the master branch, using a bogus intermediary commit to do it. On the other hand, simply copying the pack files, as mentioned in the Git Wiki–and demonstrated by the scripts linked from there–also appeared to be suboptimal: The new commits were not showing up in GitX! And besides, Ilari said, “just copying packs might not suffice. There can also be loose objects.” Well, we can’t have that, can we?

Ilari suggested git-fetch, the documentation for which says that it will “download objects and refs from another repository.” Perfect! I wanted to copy the objects from my CVS migration to the Subversion migration.

My first attempt failed: some commits showed up, but not others. Ilari pointed out that it wouldn’t copy remote branches unless you asked it to do so, via “refspecs.” Since I’d cloned the repositories to get rid of the duplicate commits created by git filter-branch, all of my lovingly recreated local branches were now remote branches. Actually, this is what I want for the final repository, so I just had to figure out how to copy them. What I came up with was this:

chdir $cvs;
my @branches = map { s/^\s+//; s/\s+$//; $_ } `git branch -r`;

chdir $svn;
system qw(git fetch --tags), $cvs;

for my $branch (@branches) {
    next if $branch eq 'origin/HEAD';
    my $target = $branch =~ m{/master|rev_1_[68]$} ? "$branch-cvs" : $branch;
    system qw(git fetch --tags), $cvs,
        "refs/remotes/$branch:refs/remotes/$target";
}

It took me a while to figure out the proper incantation for referencing and creating remote branches. Once I got the refs/remotes part figured out, I found that the master, rev_1_6, and rev_1_8 branches from CVS were overwriting the Subversion branches with the same names. What I really needed was to have the CVS branches grafted as parents to the Subversion branches. The #git channel again came to my rescue, where Fissure suggested that I rename those branches when importing them, do the grafts, and then drop the renamed branches. Hence the line above that adds “-cvs” to the names of those branches.

Once the branches were imported, I simply looked for the earliest commits to those branches in Subversion and mapped it to the latest commits to the same branches in CVS, then wrote their SHA1 IDs to .git/info/grafts, like so:

open my $fh, '>', ".git/info/grafts" or die "Cannot open grafts: $!\n";
print $fh '77a35487f18d68b96d294facc1f1a41745ad914c '
        => "835ff47ee1e3d1bf228b8d0976fbebe3c7f02ae6\n", # rev_1_6
            '97ef646f5c2a7c6f47c2046c8d289c1dfc30a73d '
        => "2b9f3c5979d062614ef54afd0a01631f746fa3cb\n", # rev_1_8
            'b3b2e7f53d789bea962fe8047e119148e28865c0 '
        => "8414b64a6a434b2117294c0568c1012a17bc863b\n", # master
    ;
close $fh;

With the branches all imported and the grafts created, I simply had to run git filter-branch to make them permanent and drop the temporary CVS branches:

system qw(git filter-branch --tag-name-filter cat -- --all);
unlink '.git/info/grafts';
system qw(git branch -r -D), "origin/$_-cvs" for qw(rev_1_6 rev_1_8 master);

Now I had a complete repository, but with duplicate commits left over by git-filter-branch. To get rid of those, I need to clone the repository. But before I clone, I need the remote branches to be local branches, so that the clone will see them as remotes. For this, I wrote the following function:

sub fix_branches {
    for my $remote (map { s/^\s+//; s/\s+$//; $_ } `git branch -r`) {
        (my $local = $remote) =~ s{origin/}{};
        next if $local eq 'master';
        next if $local eq 'HEAD';
        system qw(git checkout), $remote;
        system qw(git checkout -b), $local;
    }
    system qw(git checkout master);
}

It’s important to skip the master and HEAD branches, as they’ll automatically be created by git clone. So then I call the function and and run git gc to take out trash, and then clone:

fix_branches();

run qw(git gc);
chdir '..';
run qw(git clone), "file://$svn", 'git_final';

It’s important to use the file:/// URL to clone so as to get a real clone; just pointing to the directory instead makes hard links.

Now I that I had the final repository with all history intact, I was ready to push it to GitHub! Well, almost ready. First I needed to make the branches local again, and then see if I could get the repository size down a bit:

chdir 'git_final';
fix_branches();
system qw(git remote rm origin);
system qw(git remote add origin git@github.com:bricoleurs/bricolage.git);
system qw(git gc);
system qw(git repack -a -d -f --depth 50 --window 50);

And that’s it! My new Bricolage Git repository is complete, and I’ve now pushed it up to its new home on GitHub. I pushed it like this:

git push origin --all
git push origin --tags

Damn I’m glad that’s done! I’ll be getting the Subversion repository set to read-only next, and then writing some documentation for my fellow Bricoleurs on how to work with Git. For those of you who already know, fork and enjoy!

Looking for the comments? Try the old layout.

The Future of SVN::Notify

This week, I imported pgTAP into GitHub. It took me a day or so to wrap my brain around how it’s all supposed to work, with generous help from Tekkub. But I’m starting to get the hang of it, and I like it. By the end of the day, I had sent push requests to Test::More and Blosxom Plugins. I’m well on my way to being hooked.

One of the things I want, however, is SVN::Notify-type commit emails. I know that there are feeds, but they don’t have diffs, and for however much I like using NetNewsWire to feed by political news addiction, it never worked for me for commit activity. And besides, why download the whole damn thing again, diffs and all (assuming that ever happens), for every refresh. Seems like a hell of a lot unnecessary network activity—not to mention actual CPU cycles.

So I would need a decent notification application. I happen to have one. I originally wrote SVN::Notify after I had already written activitymail, which sends noticies for CVS commits. SVN::Notify has changed a lot over the years, and now it’s looking a bit daunting to consider porting it to Git.

However, just to start thinking about it, SVN::Notify really does several different things:

  • Fetches relevant information about a Subversion event.
  • Parses that information for a number of different outputs.
  • Writes the event information into one or more outputs (currently plain text or XHTML).
  • Constructs an email message from the outputs
  • Sends the email message via a specified method (sendmail or SMTP).

For the initial implementation of SVN::Notify, this made a lot of sense, because it was doing something fairly simple. It was designed to be extensible by subclassing (successfully done by SVN::Notify::Config and SVN::Notify::Mirror), and, later, by output filters, and that was about it.

But as I think about moving stuff to Git, and consider the weaknesses of extensibility by subclassing (it’s just not pretty), I’m naturally rethinking this architecture. I wouldn’t want to have to do it all over again should some future SCM system come along in the future. So, following from a private exchange with Martijn Van Beers, I have some preliminary thoughts on how a hypothetical SCM::Notify (VCS::Notify?) module might be constructed:

  • A single interface for fetching SCM activity information. There could be any number of implementations, just as long as they all provided the same interface. There would be a class for fetching information from Subversion, one for Git, one for CVS, etc.
  • A single interface for writing a report for a given transaction. Again, there could be any number of implementations, but all would have the same interface: taking an SCM module and writing output to a file handle.
  • A single interface for doing something with one or more outputs. Again, they can do things as varied as simply writing files to disk, appending to a feed, inserting into a database, or, of course, sending an email.
  • The core module would process command-line arguments to determine what SCM is being used any necessary contextual information and just pass it on to the appropriate classes.

In psedudo-code, what I’m thinking is something like this:

package SCM::Notify;

sub run {
    my $args = shift->getopt;
    my $scm  = SCM::Interface->new(
        scm      => $args->{scm} # e.g., "SVN" or "Git", etc.
        revision => $args->{revision},
        context  => $args->{context} # Might include repository path for SVN.
    );

    my $report = SCM::Report->new(
        method => $opts->{method}, # e.g., SMTP, sendmail, Atom, etc.
        scm    => $scm,
        format => $args->{output}, # text, html, both, etc.
        params => $args->{params}, # to, from, subject, etc.
    );

    $report->send;
}

Then a report class just has to create report in the specified format or formats and do something with them. For example, a Sendmail report would put together a report as a multipart message with each format in a single part, and then deliver it via /sbin/sendmail, something like this:

package SCM::Report::Sendmail;

sub send {
    my $self = shift;
    my $fh = $self->fh;
    for my $format ( $self->formats ) {
        print $fh SCM::Format->new(
            format => $format,
            scm    => $self->scm,
        );
    }

    $self->deliver;
}

So those are my rather preliminary thoughts. I think it’d actually be pretty easy to port the logic of this stuff over from SVN::Notify; what needs some more thought is what the command-line interface might look like and how options are passed to the various classes, since the Sendmail report class will require different parameters than the SMTP report class or the Atom report class. But once that’s worked out in a way that can be handled neutrally, we’ll have a much more extensible implementation that will be easy to add on to going forward.

Any suggestions for passing different parameters to different classes in a single interface? Everything needs to be able to be handled via command-line options and not be ugly or difficult to use.

So, you wanna work on this? :-)

Looking for the comments? Try the old layout.

Where iCal Keeps Invitations

I was fiddling with iCalendar invitations yesterday, trying to get Sandy’s .ics files to import into Outlook. I got that figured out (yes!), but in the meantime iCal started crashing on me. I was reasonable sure that it was due to a bogus invitation file, but could not for the life of me figure out where iCal was keeping such files. It just kept crashing on me as second or so after starting up, every time.

I finally figured it out by quitting all my apps, moving all of the folders in ~/Library to a temporary folder, and firing up iCal to see what folds it would create. And there it was: ~/Library/Caches/com.apple.iCal. I quit iCal, deleted the new folders in ~/Library, moved the originals back, and looked inside the iCal caches folder to find a bunch of invitation files in the incoming folder. I deleted them all and iCal fired up again without a hitch. W00t!

So if you’re having problems with iCal crashing and have a few invitations in it and you’re wondering how to get iCal to ignore them, just quit iCal, delete all of the files in /Users/yourusername/Library/Caches/com.apple.iCal/incoming, and start iCal back up again.

And now I’ll be able to find this information again when next I need it. :-)

Looking for the comments? Try the old layout.

More about…