Use of DBI in Sqitch

Sqitch uses the native database client applications (psql, sqlite3, mysql, etc.). So for tracking metadata about the state of deployments, I have been trying to stick to using them. I’m first targeting PostgreSQL, and as a result need to open a connection to psql, start a transaction, and be able to read and write stuff to it as migrations go along. The IPC is a huge PITA. Furthermore, getting things properly quoted is also pretty annoying — and it will be worse for SQLite and MySQL, I expect (psql’s --set support is pretty slick).

If, on the other hand, I used the DBI, on the other hand, all this would be very easy. There is no IPC, just a direct connection to the database. It would save me a ton of time doing development, and be robust and safer to use (e.g., exception handling rather than platform-dependent signal handling (or not, in the case of Windows)). I am quite tempted to just so that.

However, I have been trying to be sensitive to dependencies. I had planned to make Sqitch simple to install on any system, and if you had the command-line client for your preferred database, it would just work. If I used the DBI instead, then Sqitch would not work at all unless you installed the appropriate DBI driver for your database of choice. This is no big deal for Perl people, of course, but I don’t want this to be a Perl people tool. I want it to be dead simple for anyone to use for any database. Ideally, there will be RPMs and Ubuntu packages, so one can just install it and go, and not have to worry about figuring out what additional Perl DBD to install for your database of choice. It should be transparent.

That is still my goal, but at this point the IPC requirements for controlling the clients is driving me a little crazy. Should I just give up and use the DBI (at least for now)? Or persevere with the IPC stuff and get it to work? Opinions wanted!

Sqitch Status: A Step at a Time

I've just released Sqitch v0.20-TRIAL, the third testing release of Sqitch. Since last week, I've implemented add-step. So let's have a look-see at what all it can do. First, let's initialize a Sqitch project.

> mkdir myproj 
> cd myproj 
myproj> git init
Initialized empty Git repository in myproj/.git/
myproj> sqitch --engine pg init
Created sql/deploy
Created sql/revert
Created sql/test
Created ./sqitch.conf

Doesn't look like much, does it? Let's set the database name and look at the configuration:

myproj> sqitch config core.pg.db_name flipr_test
myproj> less sqitch.conf
[core]
    engine = pg
    # plan_file = sqitch.plan
    # sql_dir = sql
    # deploy_dir = sql/deploy
    # revert_dir = sql/revert
    # test_dir = sql/test
    # extension = sql
# [core "pg"]
    # db_name = 
    # client = psql
    # sqitch_schema = sqitch
    # password = 
    # port = 
    # host = 
    # username = 
[core "pg"]
    db_name = flipr_test

I've made an effort to make the default configuration file as useful as possible by including all the core and engine settings. Defaults are present, too, but commented-out. Some you'd probably never want to change in the local file, but might in your user file or in the system configuration file. Peruse the sqitch-config man page for all the Git-like awesomeness.s

So now we can add a step:

myproj> sqitch add-step user_roles
Created sql/deploy/user_roles.sql
Created sql/revert/user_roles.sql
Created sql/test/user_roles.sql

Wee! Again, doesn't look like much, I know. But in fact the generated scripts are created from Template::Tiny templates, and again, they can be overridden on a user or system basis. Have a look at the add-step man page for the details. Or just start with what's there: edit the generated scripts to deploy and revert your changes. Go crazy. The deploy script looks like this:

myproj> less sql/deploy/user_roles.sql 
-- Deploy user_roles

BEGIN;

-- XXX Add DDLs here.

COMMIT;

Next up, deployment. I think that will require that the plan interface be written, first. I'll be getting on that tomorrow.

Sqitch Update

A quick update on Sqitch. I started implementation about a couple of weeks ago. It’s coming a long a bit more slowly than I'd like, given that I need to give a presentation on it soon. But I did things a little differently than I usually do with project like this: I wrote documentation first. In addition to the basic docs I posted a couple weeks back, I’ve written a tutorial. I put quite a lot of time into it, studying the Git interface as I did so, to try to develop useful workflows. The nice thing about this it that it will not only serve as the foundation for my presentation (PHEW! Half the work done already!), but it also serves as a design specification.

So I've been diligently plugging away on it, and have uploaded a couple of trial releases to CPAN. So far, we have decent support for:

  • sqitch help and sqitch help command. The latter only works for the implemented commands, of course.
  • sqitch config, which is a near perfect duplication of git-config, thanks to the very useful Config::GitLike. It supports a local, project-specific config file, a user config file, and a system config file.
  • sqitch init, which creates a new project by creating directories for the deploy, revert, and test scripts, and writes a project-specific config file. This file has options you specify in the call to sqitch (such as the database engine you plan to use), and all unmodified settings or settings set in user or system configuration are written out as comments.

So yeah, not a ton so far, but the foundations for how it all goes together are there, so it should take less time to develop other commands, all things being equal.

Next up:

  • sqitch add-step, which will create deploy and revert scripts for a new step, based on simple templates.
  • sqitch deploy, which is the big one. Initial support will be there for PostgreSQL and SQLite (and perhaps MySQL).

Interested in helping out?

  • I'm going to need a parser for the plan file pretty soon. The interface will need an iterator to move back and forth in the file, as well as a way to write to the file, add steps to it, etc. The grammar is pretty simple, so anyone familiar with parsers and iterators could probably knock something out pretty quickly.

  • The interface for testing needs some thinking through. I had been thinking that it could be something as simple as just diffing the output of a script file against an expected output file, at least to start. One could even use pgTAP or MyTAP in such scripts, although it might be a pain to get the output exactly right for varying environments. But maybe that doesn't matter for deployment, so much? Because it tends to be to a more controlled environment than your typical open-source library test suite, I mean.

Got something to add? Fork it!

Always Use TIMESTAMP WITH TIME ZONE

My recommendations for sane time zone management in PostgreSQL:

  • Set timezone = 'UTC' in postgresq.conf. This makes UTC the default time zone for all connections.
  • Use timestamp with time zone (aka timestamptz) and time with time zone (aka timetz). They store values as UTC, but convert them on selection to whatever your time zone setting is.
  • Avoid timestamp without time zone (aka timestamp) and time without time zone (aka time). These columns do not know the time zone of a value, so different apps can insert values in different zones no one would ever know.
  • Always specify a time zone when inserting into a timestamptz or timetz column. Unless the zone is UTC. But even then, append a "Z" to your value: it's more explicit, and will keep you sane.
  • If you need to get timestamptz or timetz values in a zone other than UTC, use the AT TIME ZONE expression in your query. But be aware that the returned value will be a timestamp or time value, with no more time zone. Good for reporting and queries, bad for storage.
  • If your app always needs data in some other time zone, have it SET timezone = 'UTC' on connection. All values then retrieved from the database will be in the configured time zone. The app should still include the time zone in values sent to the database.

The one exception to the rule preferring timestamptz and timetz is a special case: partitioning. When partitioning data on timestamps, you must not use timestamptz. Why? Because almost no expression involving timestamptz comparison is immutable. Use one in a WHERE clause, and constraint exclusion may well be ignored and all partitions scanned. This is usually something you want to avoid.

So in this one case and only in this one case, use a timestamp without time zone column, but always insert data in UTC. This will keep things consistent with the timestamptz columns you have everywhere else in your database. Unless your app changes the value of the timestamp GUC when it connects, it can just assume that everything is always UTC, and should always send updates as UTC.

New Home

After the Fireballing week before last, I put aside a bit of time to rejigger things. This blog now has a new home.

  • Moved all content to a Linode virtual server. No more serving from my crappy old desktop system behind my Comcast connection. The VPS is kind of skimpy on the RAM, but seems fine for my basic needs.
  • Still using Blosxom, but all content is statically-generated.
  • Switched to Nginx. It's fast. Especially for a 100% static site.
  • Search is gone. No one used it, anyway. That's what Duck Duck Go is for.
  • Comments are gone, sort of. I removed the plugin for adding comments to posts. Existing comments are still shown, though.
  • Added Disqus commenting. The upshot is that, for the first time in years, one can comment on any post at any time. No more closing comments after two weeks.
  • Got rid of the "sociable" junk. No one needs hand-holding for sharing, and very few sharing sites are relevant anymore, anyway.

Oh, and I also moved strongrrl.com, kineticode.com, and my PGXN mirror to the Linode host. They're all static, too, so everything is nice and peppy.

So that's step 1. It's enough that I can get back to posting stuff and, on the off chance that I get Fireballed again, I think things will hold up (a simple ab test shows pretty good throughput at about 100 requests/second at a concurrency of 100). Over the next few months, I have other plans:

  • Throw up a new kineticode.com. The company has actually shut down, so I need to put up a new page to direct interested parties elsewhere.
  • Redesign Just a Theory. This design was okay in 2004, but never very forward-looking. I want to vastly simplify things. Just down to the bare essentials, really. Be prepared for more junk to disappear.
  • Move to a new blog engine. Blosxom is okay, but finicky. There are a lot of steps to publishing a post, most of them involving SCP and SSH. I just want to write to a directory to do stuff, and support drafts and whatnot.

That last task is the one I'm least likely to find a lot of time to work on, though, as I am already overcommitted to numerous other things, and thinking of new stuff all the time. But I'd really like to make things much nicer for myself, so we'll see.

Sqitch - VCS-powered SQL Change Management

Back in January, I wrote three posts outlinining some ideas I had about a straight-forward, sane way of managing SQL change managment. The idea revolved around specifying scripts to deploy and revert in a plan file, and generating that plan file from VCS history. I still feel pretty good about the ideas there, and work has agreed to let me write it and open-source it. Here is the first step making it happen. I call it “Sqitch.”

Why “Sqitch”? Think of it as SQL changes with Git stuck in the middle. Of course I expect to support VCSs other than Git (probably Subversion and Mercurial, though I am not sure yet), but since Git is what I now have the most familiarity with, I thought it kind of fun to kind of reference a VCS in the name, if only obliquely.

This week, I started work on it. My first task is to outline a draft for the interface. Sqitch will be a command-line tool, primarily. The remainder of this post contains the documentation for the draft interface. Thoughts and feedback would be greatly appreciated, especially if you think I've overlooked anything! I do want to keep features pretty minimal for now, though, to build up a solid core to be built on later. But other than that, your criticism is greatly desired.

Next up, I will probably write a tutorial, just so I can make my way through some real-life(ish) examples and notice if I missed anything else. Besides, I'm going to need the tutorial myself! Watch for that next week.

Thanks!


Name

Sqitch - VCS-powered SQL change management

Synopsis

sqitch [<options>] <command> [<command-options>] [<args>]

Description

Sqitch is a VCS-aware SQL change management application. What makes it different from your typical migration-style approaches? A few things:

No opinions

Sqitch is not integrated with any framework, ORM, or platform. Rather, it is a standalone change management system with no opinions on your database or development choices.

Native scripting

Changes are implemented as scripts native to your selected database engine. Writing a PostgreSQL application? Write SQL scripts for psql. Writing a MySQL-backed app? Write SQL scripts for mysql.

VCS integration

Sqitch likes to use your VCS history to determine in what order to execute changes. No need to keep track of execution order, your VCS already tracks information sufficient for Sqitch to figure it out for you.

Dependency resolution

Deployment steps can declare dependencies on other deployment steps. This ensures proper order of execution, even when you've committed changes to your VCS out-of-order.

No numbering

Change deployment is managed either by maintaining a plan file or, more usefully, your VCS history. As such, there is no need to number your changes, although you can if you want. Sqitch does not care what you name your changes.

Packaging

Using your VCS history for deployment but need to ship a tarball or RPM? Easy, just have Sqitch read your VCS history and write out a plan file with your change scripts. Once deployed, Sqitch can use the plan file to deploy the changes in the proper order.

Reduced Duplication

If you're using a VCS to track your changes, you don't have to duplicate entire change scripts for simple changes. As long as the changes are idempotent, you can change your code directly, and Sqitch will know it needs to be updated.

Terminology

step

A named unit of change. A step name must be used in the file names of its corresponding deployment and a reversion scripts. It may also be used in a test script file name.

tag

A known deployment state with a list one or more steps that define the tag. A tag also implies that steps from previous tags in the plan have been applied. Think of it is a version number or VCS revision. A given point in the plan may have one or more tags.

state

The current state of the database. This is represented by the most recent tag or tags deployed. If the state of the database is the same as the most recent tag, then it is considered "up-to-date".

plan

A list of one or more tags and associated steps that define the order of deployment execution. Sqitch reads the plan to determine what steps to execute to change the database from one state to another. The plan may be represented by a "Plan File" or by VCS history.

deploy

The act of deploying database changes to reach a tagged deployment point. Sqitch reads the plan, checks the current state of the database, and applies all the steps necessary to change the state to the specified tag.

revert

The act of reverting database changes to reach an earlier tagged deployment point. Sqitch checks the current state of the database, reads the plan, and applies reversion scripts for all steps to return the state to an earlier tag.

Options

-p --plan-file  FILE    Path to a deployment plan file.
-e --engine     ENGINE  Database engine.
-c --client     PATH    Path to the engine command-line client.
-d --db-name    NAME    Database name.
-u --username   USER    Database user name.
-h --host       HOST    Database server host name.
-n --port       PORT    Database server port number.
   --sql-dir    DIR     Path to directory with deploy and revert scripts.
   --deploy-dir DIR     Path to directory with SQL deployment scripts.
   --revert-dir DIR     Path to directory with SQL reversion scripts.
   --test-dir   DIR     Path to directory with SQL test scripts.
   --extension  EXT     SQL script file name extension.
   --dry-run            Execute command without making any changes.
-v --verbose            Increment verbosity.
-V --version            Print the version number and exit.
-H --help               Print a usage statement and exit.
-M --man                Print the complete documentation and exit.

Options Details

-p
--plan-file
sqitch --plan-file plan.conf
sqitch -p sql/deploy.conf

Path to the deployment plan file. Defaults to ./sqitch.plan. If this file is not present, Sqitch will attempt to read from VCS files. If no supported VCS system is in place, an exception will be thrown. See "Plan File" for a description of its structure.

-e
--engine
sqitch --engine pg
sqitch -e sqlite

The database engine to use. Supported engines include:

-c
--client
sqitch --client /usr/local/pgsql/bin/psql
sqitch -c /usr/bin/sqlite3

Path to the command-line client for the database engine. Defaults to a client in the current path named appropriately for the specified engine.

-d
--db-name

Name of the database. For some engines, such as PostgreSQL and MySQL, the database must already exist. For others, such as SQLite, the database will be automatically created on first connect.

-u
--user
--username

User name to use when connecting to the database. Does not apply to all engines.

-h
--host

Host name to use when connecting to the database. Does not apply to all engines.

-n
--port

Port number to connect to. Does not apply to all engines.

--sql-dir
sqitch --sql-dir migrations/

Path to directory containing deployment, reversion, and test SQL scripts. It should contain subdirectories named deploy, revert, and (optionally) test. These may be overridden by --deploy-dir, --revert-dir, and --test-dir. Defaults to ./sql.

--deploy-dir
sqitch --deploy-dir db/up

Path to a directory containing SQL deployment scripts. Overrides the value implied by --sql-dir.

--revert-dir
sqitch --revert-dir db/up

Path to a directory containing SQL reversion scripts. Overrides the value implied by --sql-dir.

--test-dir
sqitch --test-dir db/t

Path to a directory containing SQL test scripts. Overrides the value implied by --sql-dir.

--extension
sqitch --extension ddl

The file name extension on deployment, reversion, and test SQL scripts. Defaults to sql.

--dry-run
sqitch --dry-run

Execute the Sqitch command without making any actual changes. This allows you to see what Sqitch would actually do, without doing it. Implies a verbosity level of 1; add extra --verboses for greater verbosity.

-v
--verbose
sqitch --verbose -v

A value between 0 and 3 specifying how verbose Sqitch should be. The default is 0, meaning that Sqitch will be silent. A value of 1 causes Sqitch to output some information about what it's doing, while 2 and 3 each cause greater verbosity.

-H
--help
sqitch --help
sqitch -H

Outputs a brief description of the options supported by sqitch and exits.

-M
--man
sqitch --man
sqitch -M

Outputs this documentation and exits.

-V
--version
sqitch --version
sqitch -V

Outputs the program name and version and exits.

Sqitch Commands

init

Initialize the database and create deployment script directories if they do not already exist.

status

Output information about the current status of the deployment, including a list of tags, deployments, and dates in chronological order. If any deploy scripts are not currently deployed, they will be listed separately.

check

Sanity check the deployment scripts. Checks include:

  • Make sure all deployment scripts have complementary reversion scripts.

  • Make sure no deployment script appears more than once in the plan file.

deploy

Deploy changes. Configuration properties may be specified under the [deploy] section of the configuration file, or via sqitch config:

sqitch config deploy.$property $value

Options and configuration properties:

--to

Tag to deploy up to. Defaults to the latest tag or to the VCS HEAD commit. Property name: deploy.to.

revert

Revert changes. Configuration properties may be specified under the [revert] section of the configuration file, or via sqitch config:

sqitch config revert.$property $value

Options and configuration properties:

--to

Tag to revert to. Defaults to reverting all changes. Property name: revert.to.

test

Test changes. All SQL scripts in --test-dir will be run. [XXX Not sure whether to have subdirectories for tests and expected output and to diff them, or to use some other approach.]

config

Set configuration options. By default, the options will be written to the local configuration file, sqitch.ini. Options:

--get

Get the value for a given key. Returns error code 1.

--unset

Remove the line matching the key from config file.

--list

List all variables set in config file.

--global

For writing options: write to global ~/.sqitch/config.ini file rather than the local sqitch.ini.

For reading options: read only from global ~/.sqitch/config.ini rather than from all available files.

--system

For writing options: write to system-wide $prefix/etc/sqitch.ini file rather than the local sqitch.ini.

For reading options: read only from system-wide $prefix/etc/sqitch.ini rather than from all available files.

--config-file

Use the given config file.

package

Package up all deployment and reversion scripts and write out a plan file. Configuration properties may be specified under the [package] section of the configuration file, or via sqitch config package.$property $value command. Options and configuration properties:

--from

Tag to start the plan from. All tags and steps prior to that tag will not be included in the plan, and their change scripts Will be omitted from the package directory. Useful if you've rejiggered your deployment steps to start from a point later in your VCS history than the beginning of time. Property name: package.from.

--to

Tag with which to end the plan. No steps or tags after that tag will be included in the plan, and their change scripts will be omitted from the package directory. Property name: package.to.

--tags-only

Write the plan file with deployment targets listed under VCS tags, rather than individual commits. Property name: package.tags_only.

--destdir

Specify a destination directory. The plan file and deploy, revert, and test directories will be written to it. Defaults to "package". Property name: package.destdir.

Configuration

Sqitch configuration information is stored in standard INI files. The # and ; characters begin comments to the end of line, blank lines are ignored.

The file consists of sections and properties. A section begins with the name of the section in square brackets and continues until the next section begins. Section names are not case sensitive. Only alphanumeric characters, - and . are allowed in section names. Each property must belong to some section, which means that there must be a section header before the first setting of a property.

All the other lines (and the remainder of the line after the section header) are recognized as setting properties, in the form name = value. Leading and trailing whitespace in a property value is discarded. Internal whitespace within a property value is retained verbatim.

All sections are named for commands except for one, named "core", which contains core configuration properties.

Here's an example of a configuration file that might be useful checked into a VCS for a project that deploys to PostgreSQL and stores its deployment scripts with the extension ddl under the migrations directory. It also wants packages to be created in the directory _build/sql, and to deploy starting with the "gamma" tag:

[core]
    engine    = pg
    db        = widgetopolis
    sql_dir   = migrations
    extension = ddl

[revert]
    to        = gamma

[package]
    from      = gamma
    tags_only = yes
    dest_dir  = _build/sql

Core Properties

This is the list of core variables, which much appear under the [core] section. See the documentation for individual commands for their configuration options.

plan_file

The plan file to use. Defaults to sqitch.ini or, if that does not exist, uses the VCS history, if available.

engine

The database engine to use. Supported engines include:

client

Path to the command-line client for the database engine. Defaults to a client in the current path named appropriately for the specified engine.

db_name

Name of the database.

username

User name to use when connecting to the database. Does not apply to all engines.

password

Password to use when connecting to the database. Does not apply to all engines.

host

Host name to use when connecting to the database. Does not apply to all engines.

port

Port number to connect to. Does not apply to all engines.

sql_dir

Path to directory containing deployment, reversion, and test SQL scripts. It should contain subdirectories named deploy, revert, and (optionally) test. These may be overridden by deploy_dir, revert_dir, and test_dir. Defaults to ./sql.

deploy_dir

Path to a directory containing SQL deployment scripts. Overrides the value implied by sql_dir.

revert_dir

Path to a directory containing SQL reversion scripts. Overrides the value implied by sql_dir.

test_dir

Path to a directory containing SQL test scripts. Overrides the value implied by sql_dir.

extension

The file name extension on deployment, reversion, and test SQL scripts. Defaults to sql.

Plan File

A plan file describes the deployment tags and scripts to be run against a database. In general, if you use a VCS, you probably won't need a plan file, since your VCS history should be able to provide all the information necessary to derive a deployment plan. However, if you really do need to maintain a plan file by hand, or just want to better understand the file as output by the package command, read on.

Format

The contents of the plan file are plain text encoded as UTF-8. It is divided up into sections that denote deployment states. Each state has a bracketed, space-delimited list of one or more tags to identify it, followed by any number of deployment steps. Here's an example of a plan file with a single state and a single step:

[alpha]
users_table

The state has one tag, named "alpha", and one step, named "users_table". A state may of course have many steps. Here's an expansion:

[root alpha]
users_table
insert_user
update_user
delete_user

This state has two tags, "root" and "alpha", and four steps, "users_table", "insert_user", "update_user", and "delete_user".

Most plans will have multiple states. Here's a longer example with three states:

[root alpha]
users_table
insert_user
update_user
delete_user

[beta]
widgets_table
list_widgets

[gamma]
ftw

Using this plan, to deploy to the "beta" tag, the "root"/"alpha" state steps must be deployed, as must the "beta" steps. To then deploy to the "gamma" tag, the "ftw" step must be deployed. If you then choose to revert to the "alpha" tag, then the "gamma" step ("ftw") and all of the "beta" steps will be reverted in reverse order.

Using this model, steps cannot be repeated between states. One can repeat them, however, if the contents for a file in a given tag can be retrieved from a VCS. An example:

[alpha]
users_table

[beta]
add_widget
widgets_table

[gamma]
add_user

[44ba615b7813531f0acb6810cbf679791fe57bf2]
widgets_created_at

[HEAD epsilon master]
add_widget

This example is derived from a Git log history. Note that the "add_widget" step is repeated under the state tagged "beta" and under the last state. Sqitch will notice the repetition when it parses this file, and then, if it is applying all changes, will fetch the version of the file as of the "beta" tag and apply it at that step, and then, when it gets to the last tag, retrieve the deployment file as of its tags and apply it. This works in reverse, as well, as long as the changes in this file are always idempotent.

Grammar

Here is the EBNF Grammar for the plan file:

plan-file   = { <state> | <empty-line> | <comment> }* ;

state       = <tags> <steps> ;

tags        = "[" <taglist> "]" <line-ending> ;
taglist     = <name> | <name> <white-space> <taglist> ;

steps       = { <step> | <empty-line> | <line-ending> }* ;
step        = <name> <line-ending> ;

empty-line  = [ <white-space> ] <line-ending> ;
line-ending = [ <comment> ] <EOL> ;
comment     = [ <white-space> ] "#" [ <string> ] ;

name        = ? non-white space characters ? ;
white-space = ? white space characters ? ;
string      = ? non-EOL characters ? ;

See Also

The original design for Sqitch was sketched out in a number of blog posts:

Other tools that do database change management include:

Rails migrations

Numbered migrations for Ruby on Rails.

Module::Build::DB

Numbered changes in pure SQL, integrated with Perl's Module::Build build system. Does not support reversion.

DBIx::Migration

Numbered migrations in pure SQL.

Versioning

PostgreSQL-specific dependency-tracking solution by depesz.

Author

David E. Wheeler <david@justatheory.com>

License

Copyright (c) 2012 iovation Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

How Not to Withstand a Fireballing

Fireballed!
  • Set up a simple Blosxom-powered blog on the advice of a friend who wrote it
  • Never bother to make it anything other than a CGI script that crawls your file system on every request
  • Add a bunch of plugins for various doodads and useless doohickies, also executing a bunch of crap on every request
  • Put it on your eight year old Dell Dimension running Ubuntu Linux
  • In your basement
  • On your residential Comcast connection
  • Let many years go by, giving it very little attention
  • Write something interesting
  • Laugh about your complete inability to connect to the box from work while Daring Fireball melts your connection

I have more to say on the topic of iPad magazines, but I'm using the meager tuits I have to first make some infrastructural changes.

Condé Nast on the iPad

I've been a The New Yorker reader for nearly two decades. I'm a huge fan of the magazine, which in my humble has the best reportage anywhere. So I was thrilled last year when the magazine's publisher, Condé Nast, decided to make all issues available to paper subscribers for no extra charge. Ever since, I have loved getting the latest issue late on Sunday nights, rather than on the following Friday or Saturday. It has also made it easier for Strongrrl and me to share the magazine: I usually read the iPad version and she reads the paper version.

That's not to say that it has all been happiness and joy, mind. The New Yorker app is terrible. Issues are huge (typically around 120 MB). Downloads cannot be performed in the background (except by Newsstand, which may not notice a new issue for days). You can't even read other issues while waiting for one to download; the download pauses. Sometimes I have difficulty getting it to start downloading again. And the navigation, while unique and kind of interesting, really does not work for me.

Still, I was excited to try it on the new iPad's retina display. Well, "excited" is not quite the right word. More like "dreading." Because Condé iPad magazines actually use images for most of their articles, rather than text. The underlying technology is the charmingly named Adobe Digital Publishing Suite family, which, at root, is basically an image reader. Way back in September, 2010, Condé told All Things D: "The goal is to be all HTML, and we will be." Alas, that day has yet to come, as I confirmed when I loaded the latest issue of The New Yorker on my new iPad and immediately saw fuzzy text. Sure, the text-based articles, such as the comment, display beautifully. But longer articles, which are carefully laid out and rendered as images, do not. They look worse than on the old iPad, because the anti-aliasing is even easier to see.

The New Yorker on the 1st gen iPad The New Yorker on the 3rd gen iPad with retina display

Text content from The New Yorker on the first generation iPad (top) and the third generation iPad with retina display (bottom). Looks great because it's text. Click for larger images.

I complained about this on Twitter, Brian Lam chimed in, and Scott Dadich, Condé's Vice President of Content Innovation, was kind enough to respond:

I suspect that the reality of the overlap of The New Yorker and Vogue readers resembles the Venn diagram to the right. But I want to see The New Yorker rock on the iPad, so I put aside some time to download the new Vogue app and take it for a spin.

The New Yorker & Condé Nast Venn Diagram

First, the good news. I got a free month's subscription and downloaded the latest issue of Vogue, which does indeed have hi-res images. The articles look great. Ads are still low-res, but some might consider that a feature (not, I daresay, the advertisers). Alas, that's where the good news ends. Overall, this app is almost exactly like the The New Yorker app. I find this a little odd, since in print the magazines could not be much less different: The New Yorker is a slim, staple-bound, mostly-text weekly. Vogue is a phone book-sized perfect bound, glossy fashion magazine. Quite different beasts. Business-wise, I can understand why they would be the same: It's less expensive to have a single "media player" for all of your company's periodical properties, and the tablet form factor allows you to eliminate some of the differences. After all, bits don't weigh anything.

Vogue on the new iPad

Image content from Vogue on the new iPad. Not shown: the second that it's blurry while the image engine finishes loading and displaying the image. Click for larger images.

Except that they do. The Vogue April 2012 issue weighs in at a whopping 408 MB. A special "exclusive download" covering The Met Gala demands 530 MB of disk space. The latter I can kinda/sorta understand, as it contains a bunch of videos (all low-res text and image content, though). The fact that the new issue is so huge tells me that one of two things, or perhaps both, is going on here:

  1. Vogue is such an image-driven magazine that it will just be big no matter what you do.
  2. The text content is still images, just bigger ones.

We'll have to wait for the much less image-driven content of The New Yorker to find out if the its downloads are smaller, but the Vogue example does not make me optimistic.

Worse than the download size, though, is the fact that user-triggered downloads do not happen in the background. I started a download and quit the app, then came back after 15 minutes of doing other stuff, and it had gone nowhere, though it restarted the downloading without me needing to do anything more. This is one of the biggest beefs I've had with The New Yorker app: I have to start a download, and then wait for it to finish, often up to 5 minutes, without being able to do anything else on my iPad. This sucks.

Other issues I have with the Vogue app, and which are also present in the existing The New Yorker app:

  • When not reading an issue, but looking in the "Store" or "My Account," pages are quite slow to load. They appear to be web views that download content every time they are loaded, with no caching. With hi-res images, it gets even slower. It would be nice if these were cached, so that a new download would happen only if there was actually new content.
  • Controls can be very slow to respond. I clicked a "Buy Issue" button, and nothing happened for 15 seconds. No activity indicator icon, nothing. Subsequent taps of the button were a bit better, taking only a second or so to respond.
  • Not all controls are obvious. For example, in "My Account," the "Complete Account Setup" button is slightly darker than the others, so perhaps disabled. But if I tap it, it depresses. But nothing happens. This is quite different from how standard iOS controls work, where if a button is disabled, its color is greatly reduced and tapping it does nothing.
  • When viewing the high-res magazine, images and text start out low-res, then sharpen. Clearly we are still dealing with image content, even for the text. (Or perhaps PDFs and a slow rendering engine. The rendering reminds me of the iterative`` display of progressive JPEGs on the web in the 90s.)
  • Some articles have a Chevron icon for different or related content. If you tap one, it jumps to a completely different part of the magazine (think the back pages). There is no back arrow.
  • I never cared for the swipe left and right to switch articles/ads, swipe up and down for more pages of content. I think it works okay for scrolling apps; Byline works that way, for example. But not so much for pagination. No other reading apps work like that. And since some of the chevron controls also scroll left and right, they seem to behave differently but show the same animation.
  • Some pages looks as though you ought to be able to tap something, such as an invitation to watch a video or a description of an article (especially in the "In This Issue" section), but nothing happens when you tap. They really ought to respond to taps.
  • Other places are less obvious that should be tapped for more info, but there are instructions, such as a little circle in the Special Edition that says "tap circle to show caption." I tap the circle and the caption appears, right where the tip had been. Why not just show the caption?
  • Some articles have sharing features, where you can share via Twitter, Facebook, or email. The Twitter feature connects you to twitter.com in a web view, rather than use the iOS 5 Twitter support. Worse, it does not remember that I logged in between sessions. So if I tweeted yesterday and want to do it again today, I have to log in again. I suspect the Facebook feature works the same. Sharing via Email just opens the Mail app, rather than use the embedded iOS email controller.
  • When I use the table of contents popover, it is always scrolled to the top, no matter how far down the currently-displayed article is or where I left it the last time I used it.

There are other issues, as well, some minor, such as the boring grey background if you make an image or article bounce when you scroll past the end, or the display of the issue name in the iTunes store as "Vogue Magazine_200_30." Other issues are more annoying, such as the difficulty of discovering and managing the different types and layers of navigation. But honestly, if just the following issues were addressed, the app would be so much better:

  • Use plain text for layout. HTML would be great.
  • Get the download size down. Reducing the use of images for laying out text will help a lot for The New Yorker, I'm sure; less perhaps for Vogue.
  • Allow the downloads to happen in the background, no matter what else I'm doing.
  • Make less use of embedded web browsers for stuff, or at least cache them. This is a tablet, with a lot of great features built in. Take advantage of them to make the app as responsive as possible!

These steps will help a lot. But even then, I can't help but think that there continues to be more need for UX exploration and experimentation. A tablet is not a magazine and not a web browser, but offers its own features and constraints. I think a better fit for selling editorial or image-based content could still be created; these apps don't come close. I can think of three reasons for why not:

  1. The limitations of the Adobe publishing platform. It's a lowest-common denominator experience, in that the player has to work on a bunch of different devices, and so would suck on all of them. And it just might not have the controls for a strong text-based layout, though I don't see why Adobe wouldn't have the resources to address that issue.
  2. The desire for layout integrity. But as Craig Grannell writes, that's akin to 90s web sites that were nothing but a single big image. It didn't work well for a whole slew of reasons.
  3. Copyright. Plain-text content would just be too easy to "steal," but if it's in a PNG, no one will bother. I know nothing about this personally, but it would not surprise me if there were folks inside Condé and some of these other magazines who don't want to use text-based content because it would be too easy to copy.

I find none of these reasons compelling.

Look, I write this out of love. The New Yorker is my favorite magazine, bar none, and I want it to succeed. Print is dying, but there is so much opportunity on devices like the iPad. Continue to create the best content, and provide it in a form factor and experience that takes advantage of the features and limitations of your targeted platforms, to allow readers to enjoy reading, and success will be assured.

SQL Change Management Sans Duplication

In the previous episode in this series, I had one more issue with regard to SQL change management that I wanted to resolve:

  1. There is still more duplication of code than I would like, in that a procedure defined in one change script would have to be copied whole to a new script for any changes, even simple single-line changes.

So let’s see what we can do about that. Loading it into Git, our first example looks like this:

> alias sqlhist="git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print \"\"} /./'"
> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (HEAD, gamma, master)]
sql/deploy/add_user.sql

(Aside: I’ve created an alias, sqlhist, on the first line, so that all the Git and Awk magic doesn’t clutter the remaining examples.)

So, we’ve got the creation of the users table under the alpha tag, the addition of the widgets table and an accompanying add_widget() function under the beta tag, and the creation of the add_user() function under the gamma tag. So far so good. Now, let’s say that gamma has been deployed to production, and now we’re ready to add a feature for the next release.

Modify This

It turns out that our users really want a timestamp for the time a widget was created. So let’s add a new change script that adds a created_at column to the widgets table. First we add sql/deploy/widgets_created_at.sql with:

-- requires: widgets_table
ALTER TABLE widgets ADD created_at TIMESTAMPTZ;

And then the accompanying revert script, sql/revert/widgets_created_at.sql:

ALTER TABLE widgets DROP COLUMN IF EXISTS created_at;

Commit them and now our deployment configuration looks like this:

> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2 (HEAD, master)]
sql/deploy/widgets_created_at.sql

So far so good. We have a simple delta script that modifies the existing table, and there is no code duplication. Time to modify the add_widget() function to insert the timestamp. Recall that, in the first article in this series, I created a separate sql/deploy/add_widgets_v2.sql file, copied the existing function in its entirety into the new file, and modified it there. If we were to do that here, the resulting deployment configuration would look something like this:

> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2]
sql/deploy/widgets_created_at.sql

[dfba488cfd9145928a25d8d48de3231da84s4bd2 (HEAD, master)]
sql/deploy/add_widget_v2.sql

Which would be fine, except that if someone else wanted to see what had changed, here’s what git diff would output:

> git diff HEAD^ sql/deploy 
diff --git a/sql/deploy/add_widget_v2.sql b/sql/deploy/add_widget_v2.sql
new file mode 100644
index 0000000..9132195
--- /dev/null
+++ b/sql/deploy/add_widget_v2.sql
@@ -0,0 +1,8 @@
+-- requires widgets_created_at
+CREATE OR REPLACE FUNCTION add_widget(
+    username   TEXT,
+    widgetname TEXT
+) RETURNS VOID LANGUAGE SQL AS $$
+    INSERT INTO widgets (created_by, name, created_at)
+    VALUES ($1, $2, NOW());
+$$;

So, what changed in the add_widget() function between gamma and now? One cannot tell from this diff: it looks like a brand new function. And no web-based VCS interface will show you, either; it’s just not inherent in the commit. We have to actually know that it was just an update to an existing function, and what files to manually diff, like so:

 > diff -u sql/deploy/add_widget.sql sql/deploy/add_widget_v2.sql 
--- sql/deploy/add_widget.sql   2012-01-28 13:06:24.000000000 -0800
+++ sql/deploy/add_widget_v2.sql    2012-01-28 13:26:59.000000000 -0800
@@ -1,8 +1,8 @@
--- requires: widgets_table
-
+-- requires: widgets_created_at
 CREATE OR REPLACE FUNCTION add_widget(
     username   TEXT,
     widgetname TEXT
 ) RETURNS VOID LANGUAGE SQL AS $$
-    INSERT INTO widgets (created_by, name) VALUES ($1, $2);
+    INSERT INTO widgets (created_by, name, created_at)
+    VALUES ($1, $2, NOW());
 $$;

Much better, but how annoying is that? It doesn’t allow us to really take advantage of the VCS, all because we need SQL changes to run in a very specific order.

But let’s ignore that for the moment. Let’s just throw out the commit with add_widgets_v2.sql and go ahead and change the add_widget change script directly. So the history now looks like this:

> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2]
sql/deploy/widgets_created_at.sql

[e4b970aa36f27451fe377791eab040a73c6eb47a (HEAD, epsilon, master)]
sql/deploy/add_widget.sql

Naturally, the add_widget script appears twice now, once under the beta tag and once under epsilon (which I’ve just tagged). What are the consequences for our migration? Well, if we were to build a new database from the beginning, running these migrations as listed here, we would get an error while applying the beta changes:

ERROR:  column "created_at" of relation "widgets" does not exist
LINE 5:     INSERT INTO widgets (created_by, name, created_at)

This is because the created_at column won’t exist until the widgets_created_at change is applied. That won’t do, will it? Fortunately, Git knows exactly what the add_widget deploy script looked like under the beta tag, and we can ask it:

> git show beta:sql/deploy/add_widget.sql
-- requires: widgets_table

CREATE OR REPLACE FUNCTION add_widget(
    username   TEXT,
    widgetname TEXT
) RETURNS VOID LANGUAGE SQL AS $$
    INSERT INTO widgets (created_by, name) VALUES ($1, $2);
$$;

Boom, there it is, with no reference to created_at. Using this technique, our SQL deployment app can successfully apply all of our database changes by iterating over the list of changes and applying the contents of each script at the time of the appropriate commit or tag. In other words, it could apply the output from each of these commands:

git show alpha:sql/deploy/users_table.sql
git show beta:sql/deploy/widgets_table.sql
git show beta:sql/deploy/add_widget.sql
git show gamma:sql/deploy/add_user.sql
git show 44ba615b7813531f0acb6810cbf679791fe57bf2:sql/deploy/widget_created_at.sql
git show epsilon:sql/deploy/add_widget.sql

And everything will work exactly as it should: the original version of the add_widget change script will be for the beta tag, and the next version will be applied for the epsilon tag. Not bad, right? We get a nice, clean Git history and can exploit it to manage the changes.

Reversion to the Mean

But what about reversion? What if the deploy to epsilon failed, and we need to revert back to gamma? Recall that in the first article, I eliminated duplication by having the add_widget_v2 revert script simply call the add_widget deploy script. But such is not possible now that we’ve changed add_widget in place. What to do?

The key is for the change management script to know the difference between a new change script and a modified one. Fortunately, Git knows that, too, and we can get it to cough up that information with a simple change to the sqlhist alias: instead of passing --name-only, pass --name-status:

% alias sqlhist="git log -p --format='[%H%d]' --name-status --reverse sql/deploy \
| awk '/^\[/ {print \"\"} /./'"

Using this new alias, our history looks like:

> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
A   sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
A   sql/deploy/add_widget.sql
A   sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
A   sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2]
A   sql/deploy/widgets_created_at.sql

[e4b970aa36f27451fe377791eab040a73c6eb47a (HEAD, epsilon, master)]
M   sql/deploy/add_widget.sql

Now we have a letter defining the status of each file. An “A” means the file was added in that commit; an “M” means it was modified. But the upshot is that, to revert to gamma, our change management can see that add_widget was modified in epsilon, and, rather than apply a revert change script, it can just apply the version of the script as it existed under gamma:

> git show gamma:sql/deploy/add_widget.sql
-- requires: widgets_table

CREATE OR REPLACE FUNCTION add_widget(
    username   TEXT,
    widgetname TEXT
) RETURNS VOID LANGUAGE SQL AS $$
    INSERT INTO widgets (created_by, name) VALUES ($1, $2);
$$;

And there we are, right back to where we should be. Of course, the remaining epsilon deploy script, widget_created_at, was added in its commit, so we just apply the revert script and we’re set, back to gamma.

Still Configurable

To get back to the original idea of a migration configuration file, I still think it’s entirely do-able. All we need to is to have the change management app generate it, just as before. When it comes to modified — rather than added — deploy scripts, it can automatically insert new scripts with the full copies of previous versions, much as before. The resulting configuration would look something like this:

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2]
sql/deploy/widgets_created_at.sql

[e4b970aa36f27451fe377791eab040a73c6eb47a (HEAD, epsilon, master)]
sql/deploy/add_widget_v2.sql

Note that last line, where we now have add_widget_v2. The change management script would simply generate this file, and create an additional revert script with the same name that just contains the contents of the deploy script as it was under the gamma tag.

Too Baroque?

Having written down these ideas that have plagued by brain for the last week, along with some examples using Git to confirm them, I’m convinced more than ever that this is entirely workable. But it also leads me to wonder if it’s too baroque. I intend these posts as a rough spec for how this thing should work, and I plan to implement it in the coming weeks. But I’m wondering how difficult it will be to explain it all to people?

So let me see if I can break it down to a few simple rules.

  • In general, you should create independent deploy and revert scripts for your SQL. Put a CREATE TABLE statement into its own script. If it requires some some other table, require declare the dependency. If you need to change it later, create a new script that uses ALTER TABLE.
  • In special cases where a simple change cannot be made without copying something wholesale, and where the deploy script is idempotent, you may simply modify the deploy script in-place.

That’s about it. The idempotence of the deploy script is important for ensuring consistency, and applies very well to features such as user-defined functions. For other objects, there are generally ALTER statements that allow changes to be made without wholesale copying of existing code.

So what am I missing? What have I overlooked? What mistakes in my logic have I made? Do you think this will be too tricky to implement, or to use? Is it hard to understand? Your comments would be greatly appreciated, because I am going to write an app to do this stuff, and want to get it right.

Thanks for sticking with me through all the thought experiments. For my next post on this topic, I expect to have an interface spec for the new app.

VCS-Enabled SQL Change Management

In my previous post, I outlined the basics of a configuration-file and dependency-tracking SQL deployment architecture, but left a couple of additional challenges unresolved. They were:

  1. I would rather not have to hand-edit a configuration file, as it it’s finicky and error-prone.

  2. There is still more duplication of code than I would like, in that a procedure defined in one change script would have to be copied whole to a new script for any changes, even single-line simple changes.

I believe I can solve both of these issues by simple use of a VCS. Since all of my current projects currently use Git, I will use it for the examples here.

Git it On

First, recall the structure of the configuration file, which was something like this:

[alpha]
users_table

[beta]
add_widget
widgets_table

[gamma]
add_user

[delta]
widgets_created_at
add_widget_v2

Basically, we have bracketed tags identifying changes that should be deployed. Now have a look at this:

> git log -p --format='[%H]' --name-only --reverse sql/deploy
[8920aaf7947a56f6777e69a21b70fd877c8fd6dc]

sql/deploy/users_table.sql
[f7da5fd4b7391747f75d85db6fa82de47b9e4c00]

sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql
[ea10b9e566934ef256debe8752504189436e162a]

sql/deploy/add_user.sql
[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4]

sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

Look familiar? Let’s use a bit of awk magic to neaten things a bit (Thanks helwig!):

> git log -p --format='[%H]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

Ah, that’s better. We have commit SHA1s for tags, followed by the appropriate lists of deployment scripts. But wait, we can decorate it, too:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql
[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (HEAD, delta, master)]

Look at that! Actual VCS tags built right in to the output. So, assuming our deployment app can parse this output, we can deploy or revert to any commit or tag. Better yet, we don’t have to maintain a configuration file, because the VCS is already tracking all that stuff for us! Our change management app can automatically detect if we’re in a Git repository (or Mercurial or CVS or Subversion or whatever) and fetch the necessary information for us. It’s all there in the history. We can name revision identifiers (SHA1s here) to deploy or revert to, or use tags (alpha, beta, gamma, delta, HEAD, or master in this example).

And with careful repository maintenance, this approach will work for branches, as well. For example, say you have developers working in two branches, feature_foo and feature_bar. In feature_foo, a foo_table change script gets added in one commit, and an add_foo script in a second commit. Merge it into master and the history now looks like this:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e (HEAD, master, feature_foo)]
sql/deploy/add_foo.sql

So far so good.

Meanwhile, development in the feature_bar branch has added a bar_table change script in one commit and add_bar in another. Because development in this branch was going on concurrently with the feature_foo branch, if we just merged it into master, we might get a history like this:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'
[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[d1882d7b4cfcf5c57030bd5a15f8571bfd7e48e2]
sql/deploy/bar_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e]
sql/deploy/add_foo.sql

[2330da1caae9a46ea84502bd028ead399ca3ca02 (feature_bar)]
sql/deploy/add_bar.sql

[73979ede2c8589cfe24c9213a9538f305e6f508f (HEAD, master, feature_foo)]

Note that bar_table comes before add_foo. In other words, the feature_foo and feature_bar commits are interleaved. If we were to deploy to HEAD, and then need to revert feature_bar, bar_table would not be reverted. This is, shall we say, less than desirable.

There are at least two ways to avoid this issue. One is to squash the merge into a single commit using git merge --squash feature_bar. This would be similar to accepting a single patch and applying it. The resulting history would look like this:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e]
sql/deploy/add_foo.sql

[91a048c05e0444682e2e4763e8a7999a869b4a77 (HEAD, master)]
sql/deploy/add_bar.sql
sql/deploy/bar_table.sql

Now both of the feature_bar change scripts come after the feature_foo changes. But it might be nice to keep the history. So a better solution (and the best practice, I believe), is to rebase the feature_bar branch before merging it into master, like so:

> git rebase master
First, rewinding head to replay your work on top of it...
Applying: Add bar.
Applying: Add add_bar().
> git checkout master
Switched to branch 'master'
> git merge feature_bar
Updating 7f89e23..0fab7a0
Fast-forward
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 sql/deploy/add_bar.sql
 create mode 100644 sql/deploy/bar_table.sql
 create mode 100644 sql/revert/add_bar.sql
 create mode 100644 sql/revert/bar_table.sql

And now we should have:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e]
sql/deploy/add_foo.sql

[0e53c29eb47c618d0a8818cc17bd5a0aab0acd6d]
sql/deploy/bar_table.sql

[0fab7a0ba928b34a46a9495d4efc1c73d9133d37 (HEAD, master, feature_bar)]
sql/deploy/add_bar.sql

Awesome, now everything is in the correct order. We did lose the feature_foo “tag,” though. That’s because it wasn’t a tag, and neither is feature_bar here. They are, rather, branch names, which we becomes obvious when using “full” decoration:

git log --format='%d' --decorate=full HEAD^..      
 (HEAD, refs/heads/master, refs/heads/feature_foo)

After the next commit, it will disappear from the history. So let’s just tag the relevant commits ourselves:

> git tag feature_foo 7f89e23c9f1e7fc298c69400f6869d701f76759e
> git tag feature_bar
> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e (feature_foo)]
sql/deploy/add_foo.sql

[0e53c29eb47c618d0a8818cc17bd5a0aab0acd6d]
sql/deploy/bar_table.sql

[0fab7a0ba928b34a46a9495d4efc1c73d9133d37 (HEAD, feature_bar, master, feature_bar)]
sql/deploy/add_bar.sql

Ah, there we go! After the next commit, one of those feature_bars will disappear, since the branch will have been left behind. But we’ll still have the tag.

Not Dead Yet

Clearly we can intelligently use Git to manage SQL change management. (Kind of stands to reason, doesn’t it?) Nevertheless, I believe that a configuration file still might have its uses. Not only because not every project is in a VCS (it ought to be!), but because oftentimes a project is not deployed to production as a git clone. It might be distributed as a source tarball or an RPM. In such a case, including a configuration file in the distribution would be very useful. But there is still no need to manage it by hand; our deployment app can generate it from the VCS history before packaging for release.

More to Come

I’d planned to cover the elimination of duplication, but I think this is enough for one post. Watch for that idea in my next post.

Simple SQL Change Management

I’ve been thinking a lot about SQL change management. I know I have written about this before. But I was never satisfied with that idea, mostly because it required managing database changes in two separate but interdependent ways. Blargh. So for my Perl projects the last couple of years, I have stuck to the very simple but ugly Rails-style migration model, as implemented in Module::Build::DB.

But it has been on my brain more lately because I’m writing more and more database applications at work, and managing changes over time is becoming increasingly annoying. I’ve been using a variation on Depesz’s Versioning package, mainly because its idea of specifying dependencies instead of ordered deployment scripts is so useful. However, its implementation in pure SQL, with accompanying shell and Perl scripts, is not entirely satisfying. Worse, one cannot easily include the contents of an earlier deployment script in a reversion script, because the dependency registration function embedded in a script will throw an error if it has been run before. The upshot is that if you make a one-line change to a database function, you still have to paste the entire thing into a new file and commit it to your source code repository. This makes tracking diffs annoying.

Oh, and did I mention that there is no simple built-in way to revert changes, and even if there were, because there are no named releases, it can be difficult to decide what to revert to? I don’t often need that capability, but when I need it, I need it.

Then, this week, Robert Haas described a deployment implementation he implemented. It was simple:

My last implementation worked by keeping a schema_versions table on the server with one column, a UUID. The deployment tarball contained a file with a list of UUIDs in it, each one associated to an SQL script. At install time, the install script ran through that file in order and ran any scripts whose UUID didn’t yet appear in the table, and then added the UUIDs of the run scripts to the table.

I like this simplicity, but there are some more things I think could be done, including dependency reslolution and reversion. And it seems silly to have a UUID stand for a script name; why not just list script names? Better yet, tag groups of changes for easy reference.

Yet Another SQL Deployment Strategy

So here’s my proposal. Following Robert, we create a configuration file, but instead of just listing changes, we fill it with tags and the names of the changes are associated with each. An example:

[alpha]
users_table

[beta]
add_widget
widgets_table

[gamma]
add_user

Our change management app will parse this file, finding the tag for each stage of the migration in brackets, and apply the associated changes, simply finding each of them in sql/deploy/$change.sql. If it’s reverting changes, it finds the reversion scripts named sql/revert/$change.sql. The tags can be anything you want; release tags might be useful. Easy so far, right?

Except notice that I have a minor ordering problem here. The add_widget change, which adds a function to insert a record into the widgets table, comes before the widgets_table script. If we run the add_widget change first, it will fail, because the widgets table does not yet exist.

Of course we can re-order the lines in the configuration file. But given that one might have many changes for a particular tag, with many cross-referenceing dependencies, I think it’s better to overcome this problem in the scripts themselves. So I suggest that the sql/deploy/add_widget.sql file look something like this:

-- requires: widgets_table

CREATE OR REPLACE FUNCTION add_widget(
    username   TEXT,
    widgetname TEXT
) RETURNS VOID LANGUAGE SQL AS $$
    INSERT INTO widgets (created_by, name) VALUES ($1, $2);
$$;

Here I’m stealing Depesz’s dependency tracking idea. With a simple comment at the top of the script, we specify that this change requires that the widgets_table change be run first. So let’s look at sql/deploy/widgets_table.sql:

-- requires: users_table

CREATE TABLE widgets (
    created_by TEXT NOT NULL REFERENCES users(name),
    name       TEXT NOT NULL
);

Ah, now here we also require that the users_table change be deployed first. Of course, it likely would be, given that it appears under a tag earlier in the file, but it’s best to be safe and explicitly spell out dependencies. Someone might merge the two tags at some point before release, right?

The users_table change has no dependencies, but the later add_user change of course does; our sql/deploy/add_user.sql:

-- requires: users_table

CREATE OR REPLACE FUNCTION add_user(
    name TEXT
) RETURNS VOID LANGUAGE SQL AS $$
    INSERT INTO users (name) VALUES ($1);
$$;

Our deployment app can properly resolve these dependencies. Of course, we also need reversion scripts in the sql/revert directory. They might look something like:

-- sql/revert/users_table.sql
DROP TABLE IF EXISTS users;

-- sql/revert/add_widget.sql
DROP FUNCTION IF EXISTS add_widget(text, text);

-- sql/revert/widgets_table.sql
DROP TABLE IF EXISTS widgets;

-- sql/revert/add_user.sql
DROP FUNCTION IF EXISTS add_user(text);

So far so good, right? Our app can resolve dependencies in both directions, so that if we tell it to revert to beta, it can do so in the proper order.

Now, as the deployment app runs the scripts, deploying or reverting changes, it tracks them and their dependencies in its own metadata table in the database, not unlike Depesz’s Versioning package. But because dependencies are parsed from comments in the scripts, we are free to include the contents of one script in another. For example, say that we later need to revise the add_widget() function to log the time a widget is created. First we add a new script to add the necessary column:

-- requires: widgets_table
ALTER TABLE widgets ADD created_at TIMESTAMPTZ;

Call that script sql/deploy/widgets_created_at.sql. Next we add a script that changes add_widgets():

-- requires widgets_created_at
CREATE OR REPLACE FUNCTION add_widget(
    username   TEXT,
    widgetname TEXT
) RETURNS VOID LANGUAGE SQL AS $$
    INSERT INTO widgets (created_by, name, created_at)
    VALUES ($1, $2, NOW());
$$;

Call it sql/deploy/add_widget_v2.sql. Then update the deployment configuration file with a new tag and the associated changes:

[delta]
widgets_created_at
add_widget_v2

With me so far? Now, what about reversion? sql/revert/widgets_created_at.sql is simple, of course:

ALTER TABLE widgets DROP COLUMN IF EXISTS created_at;

But what should sql/revert/add_widget_v2.sql look like? Why, to go back to the first version of add_widget(), it would be identical to sql/deploy/add_widget.sql. But it would be silly to copy the whole file, wouldn’t it? Why duplicate when we can just include?

\i sql/deploy/add_widget.sql

Boom, we get the reversion script for free. No unnecessary duplication between deployment and reversion scripts, and all dependencies are nicely resolved. Plus, the tags in the configuration file make it easy to deploy and revert change sets as necessary, with dependencies properly followed.

There’s More!

To recap, I had two primary challenges with Depesz’s Versioning package to overcome: inability to easily revert to an earlier implementation; and the inability to easily include one script in another. Both of course are do-able with workarounds, but I think that the addition of a deployment configuration file with tagged sets of changes and the elimination of SQL-embedded dependency specification overcome these issues much more effectively and intuitively.

Still, there are two more challenges I would like to overcome:

  1. It would be nice not to need the configuration file at all. Maintaining such a thing can be finicky and error-prone.

  2. I still had to duplicate the entire add_widget() function in the add_widget_v2 script for a very simple change. This means no easy way to simply see the diff for this change in my VCS. It would be nice not to have to copy the entire function.

I think I have solutions for these issues, as well. More in my next post.

Today on the Perl Advent Calendar

Hey look everybody, I wrote today's Perl Advent Calendar post, Less Tedium, More Transactions. Go read it!

How to Integrate the TestFlight SDK into an iOS Project

I've started using TestFlight to release DesignScene betas to testers. The documentation is thin, so I had to futz a bit, but fortunately it's a pretty simple app, so once I figured out that I just needed to stick to one "Team", I was off and running. And let me tell you, TestFlight is a far easier way to distribute betas than the convoluted methods suggested by Apple. Much more beta user-friendly.

For us developers, the TestFlight SDK is particularly handy. Add it to your TestFlight-distributed project and get crash reports and remote logging, ask your testers for feedback, and other cool stuff. I've only just started using it, but the immediate diagnostic feedback has already proved invaluable.

Getting the TestFlight SDK to work is dead simple, but it's not supported in App Store distributions. So I wanted to set things up so that it would always be included in beta releases and never in production releases. Getting to that point took a couple of days of futzing, as it's not explicitly supposed by Xcode's UI. The solution I came up with, thanks to this StackOverflow post, is to:

  • Add a "Beta" configuration to complement the default "Release" and "Debug" configurations
  • Add a preprocessor macro to allow conditional use of the TestFlight SDK
  • Use the EXCLUDED_SOURCE_FILE_NAMES setting to exclude the TestFlight library from "Release" builds

That last step makes me a bit nervous, but EXCLUDED_SOURCE_FILE_NAMES, while undocumented, seems to be reasonably well known. At any rate, I could find no better way to tie the inclusion of a library to a specific configuration, so I'm going with it. Better solutions welcome.

At any rate, here's the step-by-step for Xcode 4.2:

  • Download the TestFlight SDK and unpack it.
  • Drag it into your project. Make sure that "Copy items into destination group's folder" is checked, as is "Create groups for any added folders". Include it in all relevant targets.
  • Create a "Beta" configuration:

    • Click on the app name in the navigator, then on the project name and then the info tab.
    • Under "Configurations", click the plus sign and select "Duplicate "Release" Configuration"
    • Type "Beta" to name the new configuration.

    You should end up with something like this:

    Configurations

  • Create configuration marcos:

    Configurations

    • Still in the project settings, go to the "Build Settings" tab.
    • Search for "preprocessor macros".
    • Double-click the value section next to the "Preprocessor Macros" label, hit the + button, and enter CONFIGURATION_$(CONFIGURATION).

    You should end up with a window like the above. Once you close it, you should see the macros names for each individual configuration, shown here:

    Configurations

  • Add the EXCLUDED_SOURCE_FILE_NAMES build setting.

    • Still in the "Build Settings" tab, click the "Add Build Setting" button in the lower-left corner and select "Add User-Defined Setting".
    • Input EXCLUDED_SOURCE_FILE_NAMES as the name of the setting.
    • Open the reveal triangle next to the setting name.
    • Double-click to the right of "Release".
    • Enter *libTestFlight.a as the value.

    You should end up with the value *libTestFlight.a only for the "Release" configuration, as shown here:

    Configurations

  • Go ahead and use the TestFlight SDK:

    • In your app delegate, add #include "testFlight.h"
    • In -application:didFinishLaunchingWithOptions:, just before returning, add these lines:

      #ifdef CONFIGURATION_Beta
          [TestFlight takeOff:@"Insert your Team Token here"];
      #endif

Now, when you build or archive with the "Beta" target, the TestFlight SDK will be included and log sessions. But when you build with the "Release" target, TestFlight will neither be bundled or referenced in the app. You can include it anywhere, though, and use any of its features, as long as you do so only within a #ifdef CONFIGURATION_Beta block. Check out the complete SDK docs for details. Then, get your beta on!

Missing the Point

Thanks for all the comments on my Disposable Computing post. Alas, I’m beginning to see why sites like Daring Fireball don’t allow comments. Not that anyone was rude; it’s just that everyone missed the point. Every last one of you. (Well, except commenter “John”, who pointed out an inaccuracy in my post.) Here’s what that post was not about:

  • How many iPhones I’ve had over the years
  • The economics of owning an iPad vs. owning a Kindle
  • How long any given Apple product lasted (yours or mine)
  • How durable previous generations of Kindles are
  • The inherent value of the iPhone 3GS or iPad 1
  • The difference in quality between E-ink and LCD displays

Here’s what the post was about:

  • Apple products tend to have great build quality and durability
  • Marco Arment’s $79 Kindle review used the terms “cheap” and “disposable” to describe the Kindle.
  • The last thing I want in my life is cheap and disposable objects
  • I’d rather have objects that are more durable and likely to last
  • So I’d rather have an iPad than a $79 Kindle.

Contigo thermal mug

That’s about it. I’ve never used a Kindle device, just the Kindle iOS app. I don’t know how crappy the new Kindle actually is. Maybe Marco’s wrong, and the $79 Kindle is actually incredibly well-built and durable and will last for years. I just know that if it appears to be cheap and non-durable, I don’t want it. And I think Marco is a pretty reliable source. So I don’t want a $79 Kindle.

Perhaps my analogy of the $79 Kindle to a Dixie cup was a wee bit overwrought. Sorry about that; it occurred to me as I was writing the piece and I felt that it captured what I wanted to day. Because, you know, I’d rather drink my coffee out of a Contigo thermal mug than out of a Starbucks paper cup. Maybe that’s not fair.

Starbucks plastic mug

So how about this? The $79 Kindle is a Starbucks plastic mug. Not quite Dixie-cup disposable, and the advertising helps keep the price down.

Disposable Computing

Minimalist Steve Jobs

Over the last year or so, we’ve been on a “clear out the junk” kick in our house. We’ve gotten rid of all of our CDs, most of our books, and a lot of clutter and crap in our house. It’s kind of funny, because as a former academic, I’ve always liked having stuff, especially lots of books. But no more. I mainly want to read books on my iPad or iPhone nowadays and have less clutter in the house. I’m at the point now that, when I saw Panic’s commemoration of Steve Jobs yesterday — featuring a recently-discovered photo of Steve sitting on the floor or his house with nothing but a lamp, a stereo, and a mug of tea — I thought to myself, “now that’s about right.” Nothing but the essentials.

I think that Steve’s philosophy of minimalism applies just as well to most Mac products. I don’t mean only not in the way most folks have discussed ad nauseum: the simple interfaces, few buttons, smooth edges. But also in the quality. Apple products are meant to be used — well used — for a long time. Macs famously last longer than PCs, and the build quality of the unibody MacBook Pros, Airs, iPhones, and iPads, is universally lauded. I’ve had a few iPhones since 2007, and have always managed to get a good price for them via Craig’s List, not only because people want them, but because they were still in good shape after extended use. These products are meant to last.

Kindle

Marco Arment’s $79 Kindle review catalyzed these thoughts for me. The words that really struck me were “cheap” and “disposable.” The new low-end Kindle is designed to be disposable. Money quote:

Knowing that this new Kindle costs less than the cover for my Kindle 2 is freeing: I can just carry it around uncased and unprotected in a (large) pocket, use it anywhere, and not worry about damaging an expensive electronic item, because it’s not. And it’s so inexpensive that I have no hesitation recommending it to pretty much anyone who ever reads books, because I know that if they end up disliking it or not using it much, it wasn’t a lot of money.

It also means that, if you have it for six or twelve months and it breaks, it’s not a big deal to buy another one. A year later, another. Pretty soon you have a bunch of these things in your house, collecting like corpses in a cemetery. Because they’re so cheap to begin with, once they break, they’re not worth anything at all. You won’t be able to sell them to anyone. They’re just junk at that point.

iPad

I’m over junk culture. I hate wasteful packaging, but even worse is wasteful products. I don’t want a cheap, crappy Kindle, because in a year it will just be more trash — either additional household clutter or landfill. We already generate way too much waste, especially in the US. I think it’s a much better investment to buy a product with good build quality, that’s built to last. Not only will my iPad last me for years, but it will still have value years from now. It’s far more likely that it will still be functional in two years than the $79 Kindle will be, or even two or three Kindles.

The iPad is a Contigo thermal mug; the Kindle is a Dixie cup. I’ll take the quality and durability every time. Longer term, it likely won’t cost me any more — and certainly costs the planet less.