Just a Theory

Trans rights are human rights

Posts about Migrations

Sqitch Update: Almost Usable

This week, I released v0.50 of Sqitch, the database change management app I’ve been working on for the last couple of months. Those interested in how it works should read the tutorial. A lot has changed since v0.30; here are some highlights:

  • The plan file is now required. This can make merges more annoying, but thanks to a comment from Jakub Narębski, I discovered that Git can be configured to use a “union merge driver”, which seems to simplify things a great deal. See the tutorial for a detailed example.
  • The plan now consists solely of a list of changes, roughly analogous to Git commits. Tags are simply pointers to specific changes.
  • Dependencies are now specified in the plan file, rather than in the deployment scripts. Once the plan file became required, this seemed like the much more obvious place for them.
  • The plan file now goes into the top-level directory of a project (which defaults to the current directory, assumed to be the top level directory of a VCS project), while the configuration file goes into the current directory. This allows one to have multiple top-level directories for different database engines, each with its own plan, and a single configuration file for them all.

Seems like a short list, but in reality, this release is the first I would call almost usable. Most of the core functionality and infrastructure is in place, and the architectural designs have been finalized. There should be much less flux in how things work from here on in, though this is still very much a developer release. Things might still change, so I’m being conservative and not doing a “stable” release just yet.

What works

So what commands actually work at this point? All of the most important functional ones:

  • sqitch init – Initialize a Sqitch project. Creates the project configuration file, a plan file, and directories for deploy, revert, and test scripts
  • sqitch config – Configure Sqitch. Uses the same configuration format as Git, including cascading local, user, and system-wide configuration files
  • sqitch help – Get documentation for specific commands
  • sqitch add – Add a new change to the plan. Generates deploy, revert, and test scripts based on user-modifiable templates
  • sqitch tag – Tag the latest change in the plan, or show a list of existing tags
  • sqitch deploy – Deploy changes to a database. Includes a --mode option to control how to revert changes in the event of a deploy failure (not at all, to last tag, or to starting point)
  • sqitch revert – Revert changes from a database
  • sqitch rework – Copy and modify a previous change

Currently, only PostgreSQL is supported by deploy and revert; I will at least add SQLite support soon.

The rework command is my solution to the problem of code duplication. It does not (yet) rely on VCS history, so it still duplicates code. However, it does so in such a way that it is still easier to see what has changed, because the new files are actually used by the previous instance of the command, while the new one uses the existing files. So a diff command, while showing the new files in toto, actually shows what changed in the existing scripts, making it easier to follow. I think this is a decent compromise, to allow Sqitch to be used with or without a VCS, and without disabling the advantages of VCS integration in the future.

The only requirement for reworking a change is that there must be a tag on that change or a change following it. Sqitch uses that tag in the name of the files for the previous instance of the change, as well as in internal IDs, so it’s required to disambiguate the scripts and deployment records of the two instances. The assumption here is that tags are generally used when a project is released, as otherwise, if you were doing development, you would just go back and modify the change’s scripts directly, and revert and re-deploy to get the changes in your dev database. But once you tag, this is a sort of promise that nothing will be changed prior to the tag.

I modify change scripts a lot in my own database development projects. Naturally, I think it is important to be free to change deployment scripts however one likes while doing development, and also important to promise not to change them once they have been released. As long as tags are generally thought of as marking releases or other significant milestones, it seems a reasonable promise not to change anything that appears before a tag.

See the tutorial for a detailed example. In a future release, VCS integration will be added, and the duplicated files will be unnecessary, too. But the current approach has the advantage that it will work anywhere, VCS or no. The VCS support will be backward-compatible with this design (indeed, it depends on it).

Still To Do

I think I might hold off a bit on the VCS integration, since the rework command no longer requires it. There also needs to be support for database engines other than PostgreSQL. But otherwise, mostly what needs to be done is the informational commands, packaging, and testing:

  • sqitch status – Show the current deployment status of a database
  • sqitch log – Show the deploy and revert history of a database
  • sqitch bundle – Bundle up the configuration, plan, and scripts for distribution packaging
  • sqitch test – Test changes. Mostly hand-wavy; see below
  • sqitch check – Validate a database deployment history against the plan

I will likely be working on the status and log commands next, as well as an SQLite engine, to make sure I have the engine encapsulation right.

Outstanding Questions

I’m still pondering some design decisions. Your thoughts and comments greatly appreciated.

  • Sqitch now requires a URI, which is set in the local configuration file by the init command. If you don’t specify one, it just creates a UUID-based URI. The URI is required to make sure that changes have unique IDs across projects (a change may have the same name as in another project). But maybe this should be more flexible? Maybe, like Git, Sqitch should require a user name and email address, instead? They would have to be added to the change lines of the plan, which is what has given me pause up to now. It would be annoying to parse.

  • How should testing work? When I do PostgreSQL stuff, I am of course rather keen on pgTAP. But I don’t think it makes sense to require a particular format of output or anything of that sort. It just wouldn’t be engine-neutral enough. So maybe test scripts should just run and considered passing if the engine client exits successfully, and failing if it exited unsuccessfully? That would allow one to use whatever testing was supported by the engine, although I would have to find some way to get pgTAP to make psql exit non-zero on failure.

    Another possibility is to require expected output files, and to diff them. I’m not too keen on this approach, as it makes it much more difficult to write tests to run on multiple engine versions and platforms, since the output might vary. It’s also more of a PITA to maintain separate test and expect files and keep them in sync. Still, it’s a tried-and-true approach.

Help Wanted

Contributions would be warmly welcomed. See the to-do list for what needs doing. Some highlights and additional items:

  • Convert to Dist::Zilla
  • Implement the Locale::TextDomain-based localization build. Should be done at distribution build time, not install time. Ideally, there would be a Dist::Zilla plugin to do it, based pattern implemented in this example Makefile (see also this README).
  • The web site could use some updating, though I realize it will regularly need changing until most of the core development has completed and more documentation has been written.
  • Handy with graphics? The project could use a logo. Possible themes: SQL, databases, change management, baby Sasquatch.
  • Packaging. It would greatly help developers and system administrators who don’t do CPAN if they could just use their familiar OS installers to get Sqitch. So RPM, Debian package, Homebrew, BSD Ports, and Windows distribution support would be hugely appreciated.

Take it for a Spin!

Please do install the v0.51 developer release from the CPAN (run cpan D/DW/DWHEELER/App-Sqitch-0.51-TRIAL.tar.gz) and kick the tires a bit. Follow along the tutorial to get a feel for it, or even just review the tutorial example’s Git history to get a feel for it. And if there is something you want out of Sqitch that you don’t see, please feel free to file an issue with your suggestion.

Looking for the comments? Try the old layout.

Sqitch — VCS-powered SQL Change Management

Back in January, I wrote three posts outlining some ideas I had about a straight-forward, sane way of managing SQL change management. The idea revolved around specifying scripts to deploy and revert in a plan file, and generating that plan file from VCS history. I still feel pretty good about the ideas there, and work has agreed to let me write it and open-source it. Here is the first step making it happen. I call it “Sqitch.”

Why “Sqitch”? Think of it as SQL changes with Git stuck in the middle. Of course I expect to support VCSs other than Git (probably Subversion and Mercurial, though I am not sure yet), but since Git is what I now have the most familiarity with, I thought it kind of fun to kind of reference a VCS in the name, if only obliquely.

This week, I started work on it. My first task is to outline a draft for the interface. Sqitch will be a command-line tool, primarily. The remainder of this post contains the documentation for the draft interface. Thoughts and feedback would be greatly appreciated, especially if you think I’ve overlooked anything! I do want to keep features pretty minimal for now, though, to build up a solid core to be built on later. But other than that, your criticism is greatly desired.

Next up, I will probably write a tutorial, just so I can make my way through some real-life(ish) examples and notice if I missed anything else. Besides, I’m going to need the tutorial myself! Watch for that next week.

Thanks!


Name

Sqitch - VCS-powered SQL change management

Synopsis

sqitch [<options>] <command> [<command-options>] [<args>]

Description

Sqitch is a VCS-aware SQL change management application. What makes it different from your typical migration-style approaches? A few things:

No opinions

Sqitch is not integrated with any framework, ORM, or platform. Rather, it is a standalone change management system with no opinions on your database or development choices.

Native scripting

Changes are implemented as scripts native to your selected database engine. Writing a PostgreSQL application? Write SQL scripts for psql. Writing a MySQL-backed app? Write SQL scripts for mysql.

VCS integration

Sqitch likes to use your VCS history to determine in what order to execute changes. No need to keep track of execution order, your VCS already tracks information sufficient for Sqitch to figure it out for you.

Dependency resolution

Deployment steps can declare dependencies on other deployment steps. This ensures proper order of execution, even when you’ve committed changes to your VCS out-of-order.

No numbering

Change deployment is managed either by maintaining a plan file or, more usefully, your VCS history. As such, there is no need to number your changes, although you can if you want. Sqitch does not care what you name your changes.

Packaging

Using your VCS history for deployment but need to ship a tarball or RPM? Easy, just have Sqitch read your VCS history and write out a plan file with your change scripts. Once deployed, Sqitch can use the plan file to deploy the changes in the proper order.

Reduced Duplication

If you’re using a VCS to track your changes, you don’t have to duplicate entire change scripts for simple changes. As long as the changes are idempotent, you can change your code directly, and Sqitch will know it needs to be updated.

Terminology

step

A named unit of change. A step name must be used in the file names of its corresponding deployment and a reversion scripts. It may also be used in a test script file name.

tag

A known deployment state with a list one or more steps that define the tag. A tag also implies that steps from previous tags in the plan have been applied. Think of it is a version number or VCS revision. A given point in the plan may have one or more tags.

state

The current state of the database. This is represented by the most recent tag or tags deployed. If the state of the database is the same as the most recent tag, then it is considered “up-to-date”.

plan

A list of one or more tags and associated steps that define the order of deployment execution. Sqitch reads the plan to determine what steps to execute to change the database from one state to another. The plan may be represented by a “Plan File” or by VCS history.

deploy

The act of deploying database changes to reach a tagged deployment point. Sqitch reads the plan, checks the current state of the database, and applies all the steps necessary to change the state to the specified tag.

revert

The act of reverting database changes to reach an earlier tagged deployment point. Sqitch checks the current state of the database, reads the plan, and applies reversion scripts for all steps to return the state to an earlier tag.

Options

-p --plan-file  FILE    Path to a deployment plan file.
-e --engine     ENGINE  Database engine.
-c --client     PATH    Path to the engine command-line client.
-d --db-name    NAME    Database name.
-u --username   USER    Database user name.
-h --host       HOST    Database server host name.
-n --port       PORT    Database server port number.
   --sql-dir    DIR     Path to directory with deploy and revert scripts.
   --deploy-dir DIR     Path to directory with SQL deployment scripts.
   --revert-dir DIR     Path to directory with SQL reversion scripts.
   --test-dir   DIR     Path to directory with SQL test scripts.
   --extension  EXT     SQL script file name extension.
   --dry-run            Execute command without making any changes.
-v --verbose            Increment verbosity.
-V --version            Print the version number and exit.
-H --help               Print a usage statement and exit.
-M --man                Print the complete documentation and exit.

Options Details

-p
--plan-file

sqitch –plan-file plan.conf sqitch -p sql/deploy.conf

Path to the deployment plan file. Defaults to ./sqitch.plan. If this file is not present, Sqitch will attempt to read from VCS files. If no supported VCS system is in place, an exception will be thrown. See “Plan File” for a description of its structure.

-e
--engine

sqitch –engine pg sqitch -e sqlite

The database engine to use. Supported engines include:

-c
--client

sqitch –client /usr/local/pgsql/bin/psql sqitch -c /usr/bin/sqlite3

Path to the command-line client for the database engine. Defaults to a client in the current path named appropriately for the specified engine.

-d
--db-name

Name of the database. For some engines, such as PostgreSQL and MySQL, the database must already exist. For others, such as SQLite, the database will be automatically created on first connect.

-u
--user
--username

User name to use when connecting to the database. Does not apply to all engines.

-h
--host

Host name to use when connecting to the database. Does not apply to all engines.

-n
--port

Port number to connect to. Does not apply to all engines.

--sql-dir

sqitch –sql-dir migrations/

Path to directory containing deployment, reversion, and test SQL scripts. It should contain subdirectories named deploy, revert, and (optionally) test. These may be overridden by --deploy-dir, --revert-dir, and --test-dir. Defaults to ./sql.

--deploy-dir

sqitch –deploy-dir db/up

Path to a directory containing SQL deployment scripts. Overrides the value implied by --sql-dir.

--revert-dir

sqitch –revert-dir db/up

Path to a directory containing SQL reversion scripts. Overrides the value implied by --sql-dir.

--test-dir

sqitch –test-dir db/t

Path to a directory containing SQL test scripts. Overrides the value implied by --sql-dir.

--extension

sqitch –extension ddl

The file name extension on deployment, reversion, and test SQL scripts. Defaults to sql.

--dry-run

sqitch –dry-run

Execute the Sqitch command without making any actual changes. This allows you to see what Sqitch would actually do, without doing it. Implies a verbosity level of 1; add extra --verboses for greater verbosity.

-v
--verbose

sqitch –verbose -v

A value between 0 and 3 specifying how verbose Sqitch should be. The default is 0, meaning that Sqitch will be silent. A value of 1 causes Sqitch to output some information about what it’s doing, while 2 and 3 each cause greater verbosity.

-H
--help

sqitch –help sqitch -H

Outputs a brief description of the options supported by sqitch and exits.

-M
--man

sqitch –man sqitch -M

Outputs this documentation and exits.

-V
--version

sqitch –version sqitch -V

Outputs the program name and version and exits.

Sqitch Commands

init

Initialize the database and create deployment script directories if they do not already exist.

status

Output information about the current status of the deployment, including a list of tags, deployments, and dates in chronological order. If any deploy scripts are not currently deployed, they will be listed separately.

check

Sanity check the deployment scripts. Checks include:

  • Make sure all deployment scripts have complementary reversion scripts.

  • Make sure no deployment script appears more than once in the plan file.

deploy

Deploy changes. Configuration properties may be specified under the [deploy] section of the configuration file, or via sqitch config:

sqitch config deploy.$property $value

Options and configuration properties:

--to

Tag to deploy up to. Defaults to the latest tag or to the VCS HEAD commit. Property name: deploy.to.

revert

Revert changes. Configuration properties may be specified under the [revert] section of the configuration file, or via sqitch config:

sqitch config revert.$property $value

Options and configuration properties:

--to

Tag to revert to. Defaults to reverting all changes. Property name: revert.to.

test

Test changes. All SQL scripts in --test-dir will be run. [XXX Not sure whether to have subdirectories for tests and expected output and to diff them, or to use some other approach.]

config

Set configuration options. By default, the options will be written to the local configuration file, sqitch.ini. Options:

--get

Get the value for a given key. Returns error code 1.

--unset

Remove the line matching the key from config file.

--list

List all variables set in config file.

--global

For writing options: write to global ~/.sqitch/config.ini file rather than the local sqitch.ini.

For reading options: read only from global ~/.sqitch/config.ini rather than from all available files.

--system

For writing options: write to system-wide $prefix/etc/sqitch.ini file rather than the local sqitch.ini.

For reading options: read only from system-wide $prefix/etc/sqitch.ini rather than from all available files.

--config-file

Use the given config file.

package

Package up all deployment and reversion scripts and write out a plan file. Configuration properties may be specified under the [package] section of the configuration file, or via sqitch config package.$property $value command. Options and configuration properties:

--from

Tag to start the plan from. All tags and steps prior to that tag will not be included in the plan, and their change scripts Will be omitted from the package directory. Useful if you’ve rejiggered your deployment steps to start from a point later in your VCS history than the beginning of time. Property name: package.from.

--to

Tag with which to end the plan. No steps or tags after that tag will be included in the plan, and their change scripts will be omitted from the package directory. Property name: package.to.

--tags-only

Write the plan file with deployment targets listed under VCS tags, rather than individual commits. Property name: package.tags_only.

--destdir

Specify a destination directory. The plan file and deploy, revert, and test directories will be written to it. Defaults to “package”. Property name: package.destdir.

Configuration

Sqitch configuration information is stored in standard INI files. The # and ; characters begin comments to the end of line, blank lines are ignored.

The file consists of sections and properties. A section begins with the name of the section in square brackets and continues until the next section begins. Section names are not case sensitive. Only alphanumeric characters, - and . are allowed in section names. Each property must belong to some section, which means that there must be a section header before the first setting of a property.

All the other lines (and the remainder of the line after the section header) are recognized as setting properties, in the form name = value. Leading and trailing whitespace in a property value is discarded. Internal whitespace within a property value is retained verbatim.

All sections are named for commands except for one, named “core”, which contains core configuration properties.

Here’s an example of a configuration file that might be useful checked into a VCS for a project that deploys to PostgreSQL and stores its deployment scripts with the extension ddl under the migrations directory. It also wants packages to be created in the directory _build/sql, and to deploy starting with the “gamma” tag:

[core]
    engine    = pg
    db        = widgetopolis
    sql_dir   = migrations
    extension = ddl

[revert]
    to        = gamma

[package]
    from      = gamma
    tags_only = yes
    dest_dir  = _build/sql

Core Properties

This is the list of core variables, which much appear under the [core] section. See the documentation for individual commands for their configuration options.

plan_file

The plan file to use. Defaults to sqitch.ini or, if that does not exist, uses the VCS history, if available.

engine

The database engine to use. Supported engines include:

client

Path to the command-line client for the database engine. Defaults to a client in the current path named appropriately for the specified engine.

db_name

Name of the database.

username

User name to use when connecting to the database. Does not apply to all engines.

password

Password to use when connecting to the database. Does not apply to all engines.

host

Host name to use when connecting to the database. Does not apply to all engines.

port

Port number to connect to. Does not apply to all engines.

sql_dir

Path to directory containing deployment, reversion, and test SQL scripts. It should contain subdirectories named deploy, revert, and (optionally) test. These may be overridden by deploy_dir, revert_dir, and test_dir. Defaults to ./sql.

deploy_dir

Path to a directory containing SQL deployment scripts. Overrides the value implied by sql_dir.

revert_dir

Path to a directory containing SQL reversion scripts. Overrides the value implied by sql_dir.

test_dir

Path to a directory containing SQL test scripts. Overrides the value implied by sql_dir.

extension

The file name extension on deployment, reversion, and test SQL scripts. Defaults to sql.

Plan File

A plan file describes the deployment tags and scripts to be run against a database. In general, if you use a VCS, you probably won’t need a plan file, since your VCS history should be able to provide all the information necessary to derive a deployment plan. However, if you really do need to maintain a plan file by hand, or just want to better understand the file as output by the package command, read on.

Format

The contents of the plan file are plain text encoded as UTF-8. It is divided up into sections that denote deployment states. Each state has a bracketed, space-delimited list of one or more tags to identify it, followed by any number of deployment steps. Here’s an example of a plan file with a single state and a single step:

[alpha]
users_table

The state has one tag, named “alpha”, and one step, named “users_table”. A state may of course have many steps. Here’s an expansion:

[root alpha]
users_table
insert_user
update_user
delete_user

This state has two tags, “root” and “alpha”, and four steps, “users_table”, “insert_user”, “update_user”, and “delete_user”.

Most plans will have multiple states. Here’s a longer example with three states:

[root alpha]
users_table
insert_user
update_user
delete_user

[beta]
widgets_table
list_widgets

[gamma]
ftw

Using this plan, to deploy to the “beta” tag, the “root”/“alpha” state steps must be deployed, as must the “beta” steps. To then deploy to the “gamma” tag, the “ftw” step must be deployed. If you then choose to revert to the “alpha” tag, then the “gamma” step (“ftw”) and all of the “beta” steps will be reverted in reverse order.

Using this model, steps cannot be repeated between states. One can repeat them, however, if the contents for a file in a given tag can be retrieved from a VCS. An example:

[alpha]
users_table

[beta]
add_widget
widgets_table

[gamma]
add_user

[44ba615b7813531f0acb6810cbf679791fe57bf2]
widgets_created_at

[HEAD epsilon master]
add_widget

This example is derived from a Git log history. Note that the “add_widget” step is repeated under the state tagged “beta” and under the last state. Sqitch will notice the repetition when it parses this file, and then, if it is applying all changes, will fetch the version of the file as of the “beta” tag and apply it at that step, and then, when it gets to the last tag, retrieve the deployment file as of its tags and apply it. This works in reverse, as well, as long as the changes in this file are always idempotent.

Grammar

Here is the EBNF Grammar for the plan file:

plan-file   = { <state> | <empty-line> | <comment> }* ;

state       = <tags> <steps> ;

tags        = "[" <taglist> "]" <line-ending> ;
taglist     = <name> | <name> <white-space> <taglist> ;

steps       = { <step> | <empty-line> | <line-ending> }* ;
step        = <name> <line-ending> ;

empty-line  = [ <white-space> ] <line-ending> ;
line-ending = [ <comment> ] <EOL> ;
comment     = [ <white-space> ] "#" [ <string> ] ;

name        = ? non-white space characters ? ;
white-space = ? white space characters ? ;
string      = ? non-EOL characters ? ;

See Also

The original design for Sqitch was sketched out in a number of blog posts:

Other tools that do database change management include:

Rails migrations

Numbered migrations for Ruby on Rails.

Module::Build::DB

Numbered changes in pure SQL, integrated with Perl’s Module::Build build system. Does not support reversion.

DBIx::Migration

Numbered migrations in pure SQL.

Versioning

PostgreSQL-specific dependency-tracking solution by depesz.

Author

David E. Wheeler <david@justatheory.com>

License

Copyright (c) 2012 iovation Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Looking for the comments? Try the old layout.

Git-R-Done

Just a quick followup on the completion of the Bricolage Git migration last week, today I completed writing up a set of GitHub wiki documents explaining to my fellow Bricoleurs how to start hacking. The most important bits are:

  • Working with Git, explaining how to get set up with a forked Bricolage repository
  • Contributing a Bug Fix, an intro to the Git way of doing things (as far as I understand it)
  • Working with Branches, describing how to track a maintenance branch in your fork
  • Merging with Git, to cover the frequent merging from Bricolage maintenance branches into master, and how to get said merges pushed upstream
  • Starting a Project Branch, which you’d need to read if you were taking on a major development task, such as a Summer of Code project
  • Contributing via Email, for those who don’t want a GitHub account (needs fleshing out)
  • Creating a Release, in which the fine art of branching, tagging, and releasing is covered

If you’re familiar with the “Git way,” I would greatly appreciate your feedback on these documents. Corrections and comments would be greatly appreciated.

I also just wanted to say that the process of reconstructing the merge history from CVS and Subversion was quite an eye-opener for me. Not because it was difficult (it was) and required a number of hacks (it did), but because it highlighted just how much better a fit Git is for the way in which we do Open Source software development. Hell, probably closed-source, too, for that matter. I no longer will have to think about what revisions to include in a merge, or create a branch just to “tag” a merge. Hell, I’ll probably be doing merges a hell of a lot more often, just because it’s so easy, the history remains intact, and everything just stays more up-to-date and closely integrated.

But I also really appreciate the project-based emphasis of Git. A Subversion repository, I now realize, is really very much like a versioned file system. That means where things go is completely ad-hoc, or convention-driven at best. And god forbid if you decide to change the convention and move stuff around! It’s just so much more sane to get a project repository, with all of the history, branches, tags, merges, and everything else, all in one package. It’s more portable, it’s a hell of a lot faster (ever tried to check out a Subversion repository with 80 tags?), and just tighter. it encourages modularization, which can only be good. I’ll tell you, I expect to have some frustrations and challenges as I learn more about using Git, but I’m already very much happier with the overall philosophy.

Enough evangelizing. As a last statement on this, I’ve uploaded the Perl scripts I wrote to do this migration, just in case someone else finds them useful:

  • bric_cvs_to_git migrated a CVS backup to Git.
  • bric_to_git migrated Subversion from r5517 to Git.
  • stitch stitched the CVS-migrated Git repository into the Subversion-migrated Git repository for a final product.

It turned out that there were a few files lost in the conversion, which I didn’t notice until after all was said and done, but overall I’m very happy. My thanks again to Ask and the denizens of #git for all the help.

Looking for the comments? Try the old layout.

Migrating Bricolage CVS and SVN to Git

Now that I’ve successfully migrated the old Bricolage SourceForge CVS repository to Git, and also migrated Subversion to Git, it’s time to stitch the two repositories together into one with all history intact. I’m glad to say that figuring out how to do so took substantially less time than the first two steps, thanks in large part to the great help from “doener,” “Ilari,” and “Fissure” on the Freenode #git channel.

Actually, they helped me with a bit more tweaking of my CVS and Subversion conversions. One thing I realized after writing yesterday’s post was that, after running git filter-branch, I had twice as many commits as I should have had. It turns out that git filter-branch rewrites all commits, but keeps the old ones around in case you mess something up. doener also pointed out that I wasn’t having all grafts properly applied, because git filter-branch only applies to the currently checked-out branch. To get all of the branches, he suggested that I read the git-filter-branch documentation, where I’ll find that git filter-branch --tag-name-filter cat -- --all would hit all branches. Actually, such was not clear to me from the documentation, but I took his word for it. Once I did that, to get rid of the dupes, all I had to do was git clone the repository to a new repository. And that was that.

This worked great for my CVS migration, but I realized that I also wanted to clean out metadata from the Subversion migration. Of course, git clone throws out most of the metadata, but git svn also stores some metadata at the end of every commit log message, like this:

git-svn-id: file:///Users/david/svn/bricolage/trunk@8581 e630fb3e-2914-11de-beb6-f300316f8eb1

This had been very handy as I looked through commits in GitX to find parents to set up for grafts, but with that done and everything grafted, I no longer needed it. Ilari helped me to figure out how to properly use git filter-branch to get rid of those. To do it, all I had to do was add a filter for commit messages, like so:

git filter-branch --msg-filter \
'perl -0777 -pe "s/\r?\ngit-svn-id:.+\n//ms"' \
--tag-name-filter cat -- --all

This properly strips out that ugly bit of metadata and finalizes the grafts all at the same time. Very nice.

Now it was time to combine these two repositories for a single unified history. I wasn’t able to find a good tutorial for this on the web, other than one that used a third-party Debian utility and only hooked up the master branch, using a bogus intermediary commit to do it. On the other hand, simply copying the pack files, as mentioned in the Git Wiki–and demonstrated by the scripts linked from there–also appeared to be suboptimal: The new commits were not showing up in GitX! And besides, Ilari said, “just copying packs might not suffice. There can also be loose objects.” Well, we can’t have that, can we?

Ilari suggested git-fetch, the documentation for which says that it will “download objects and refs from another repository.” Perfect! I wanted to copy the objects from my CVS migration to the Subversion migration.

My first attempt failed: some commits showed up, but not others. Ilari pointed out that it wouldn’t copy remote branches unless you asked it to do so, via “refspecs.” Since I’d cloned the repositories to get rid of the duplicate commits created by git filter-branch, all of my lovingly recreated local branches were now remote branches. Actually, this is what I want for the final repository, so I just had to figure out how to copy them. What I came up with was this:

chdir $cvs;
my @branches = map { s/^\s+//; s/\s+$//; $_ } `git branch -r`;

chdir $svn;
system qw(git fetch --tags), $cvs;

for my $branch (@branches) {
    next if $branch eq 'origin/HEAD';
    my $target = $branch =~ m{/master|rev_1_[68]$} ? "$branch-cvs" : $branch;
    system qw(git fetch --tags), $cvs,
        "refs/remotes/$branch:refs/remotes/$target";
}

It took me a while to figure out the proper incantation for referencing and creating remote branches. Once I got the refs/remotes part figured out, I found that the master, rev_1_6, and rev_1_8 branches from CVS were overwriting the Subversion branches with the same names. What I really needed was to have the CVS branches grafted as parents to the Subversion branches. The #git channel again came to my rescue, where Fissure suggested that I rename those branches when importing them, do the grafts, and then drop the renamed branches. Hence the line above that adds “-cvs” to the names of those branches.

Once the branches were imported, I simply looked for the earliest commits to those branches in Subversion and mapped it to the latest commits to the same branches in CVS, then wrote their SHA1 IDs to .git/info/grafts, like so:

open my $fh, '>', ".git/info/grafts" or die "Cannot open grafts: $!\n";
print $fh '77a35487f18d68b96d294facc1f1a41745ad914c '
        => "835ff47ee1e3d1bf228b8d0976fbebe3c7f02ae6\n", # rev_1_6
            '97ef646f5c2a7c6f47c2046c8d289c1dfc30a73d '
        => "2b9f3c5979d062614ef54afd0a01631f746fa3cb\n", # rev_1_8
            'b3b2e7f53d789bea962fe8047e119148e28865c0 '
        => "8414b64a6a434b2117294c0568c1012a17bc863b\n", # master
    ;
close $fh;

With the branches all imported and the grafts created, I simply had to run git filter-branch to make them permanent and drop the temporary CVS branches:

system qw(git filter-branch --tag-name-filter cat -- --all);
unlink '.git/info/grafts';
system qw(git branch -r -D), "origin/$_-cvs" for qw(rev_1_6 rev_1_8 master);

Now I had a complete repository, but with duplicate commits left over by git-filter-branch. To get rid of those, I need to clone the repository. But before I clone, I need the remote branches to be local branches, so that the clone will see them as remotes. For this, I wrote the following function:

sub fix_branches {
    for my $remote (map { s/^\s+//; s/\s+$//; $_ } `git branch -r`) {
        (my $local = $remote) =~ s{origin/}{};
        next if $local eq 'master';
        next if $local eq 'HEAD';
        system qw(git checkout), $remote;
        system qw(git checkout -b), $local;
    }
    system qw(git checkout master);
}

It’s important to skip the master and HEAD branches, as they’ll automatically be created by git clone. So then I call the function and and run git gc to take out trash, and then clone:

fix_branches();

run qw(git gc);
chdir '..';
run qw(git clone), "file://$svn", 'git_final';

It’s important to use the file:/// URL to clone so as to get a real clone; just pointing to the directory instead makes hard links.

Now I that I had the final repository with all history intact, I was ready to push it to GitHub! Well, almost ready. First I needed to make the branches local again, and then see if I could get the repository size down a bit:

chdir 'git_final';
fix_branches();
system qw(git remote rm origin);
system qw(git remote add origin git@github.com:bricoleurs/bricolage.git);
system qw(git gc);
system qw(git repack -a -d -f --depth 50 --window 50);

And that’s it! My new Bricolage Git repository is complete, and I’ve now pushed it up to its new home on GitHub. I pushed it like this:

git push origin --all
git push origin --tags

Damn I’m glad that’s done! I’ll be getting the Subversion repository set to read-only next, and then writing some documentation for my fellow Bricoleurs on how to work with Git. For those of you who already know, fork and enjoy!

Looking for the comments? Try the old layout.

Rails Migrations with Slony?

The new app I’m developing is written in Ruby on Rails and runs on PostgreSQL. We’re replicating our production database using Slony-I, but we’ve run into a bit of a snag: database schema updates must be run as plain SQL through a Slony script in order to ensure proper replication of the schema changes within a transaction, but Rails migrations run as Ruby code updating the database via the Rails database adapter.

So how do others handle Rails migrations with their Slony-I replication setups? How do you update the Slony-I configuration file for the changes? How do you synchronize changes to the master schema out to the slaves? Do you shut down your apps, shut down Slony-I, make the schema changes to both the master and the slaves, and then restart Slony-I and your apps?

For that matter, people running Slony for their Bricolage databases must have the same issue, because the Bricolage upgrade scripts are just Perl using the DBI, not SQL files. Can anyone shed a little light on this for me?

Oh, and one last question: Why is this such a PITA? Can’t we have decent replication that replicates everything, including schema changes? Please?

Looking for the comments? Try the old layout.

david.wheeler.net Content Migration

I’ve completed the migration of all of the content from my old site, david.wheeler.net. All requests to that domain will get a permanent redirect to this site. Where possible, I tried to make the old URLs redirect to the new URLs. So if you try to connect to david.wheeler.net/osx.html, you should be automatically redirected to www.justatheory.com/computers/os/macosx/my_adventures.html. The same goes for the following documents:

If you happen to notice that I missed anything, comment on this blog entry to let me know.

Looking for the comments? Try the old layout.