Just a Theory

Black lives matter

Posts about Version Control

Sqitch: Trust, But Verify

New today: Sqitch v0.950. There are a few bug fixes, but the most interesting new feature in this release is the verify command, as well as the complementary --verify option to the deploy command. The add command has created test scripts since the beginning; they were renamed verify in v0.940. In v0.950 these scripts are actually made useful.

The idea is simply to test that a deploy script did what it was supposed to do. Such a test should make no assumptions about data or state other than that affected by the deploy script, so that it can be run against a production database without doing any damage. If it finds that the deploy script failed, it should die.

This is easier than you might at first think. Got a Sqitch change that creates a table with two columns? Just SELECT from it:

SELECT user_id, name
  FROM user
 WHERE FALSE;

If the table does not exist, the query will die. Got a change that creates a function? Make sure it was created by checking a privilege:

SELECT has_function_privilege('insert_user(text, text)', 'execute');

PostgreSQL will throw an error if the function does not exist. Not running PostgreSQL? Well, you’re probably not using Sqitch yet, but if you were, you might force an error by dividing by zero. Here’s an example verifying that a schema exists:

SELECT 1/COUNT(*)
  FROM information_schema.schemata
 WHERE schema_name = 'myapp';

At this point, Sqitch doesn’t care at all what you put into your verify scripts. You just need to make sure that they indicate failure by throwing an error when passed to the database command-line client.

The best time to run a change verify script is right after deploying the change. The --verify option to the deploy command does just that. If a verify script fails, the deploy is considered to have failed. Here’s what failure looks like:

> sqitch deploy
Deploying changes to flipr_test
  + appschema ................. ok
  + users ..................... ok
  + insert_user ............... ok
  + change_pass @v1.0.0-dev1 .. ok
  + lists ..................... psql:verify/lists.sql:7: ERROR:  column "timestamp" does not exist
LINE 1: SELECT nickname, name, description, timestamp
                                            ^
Verify script "verify/lists.sql" failed.
not ok
Reverting all changes
  - change_pass @v1.0.0-dev1 .. ok
  - insert_user ............... ok
  - users ..................... ok
  - appschema ................. ok
Deploy failed

Good, right? In addition, you can always verify the state of a database using the verify command. It runs the verify scripts for all deployed changes. It also ensures that all the deployed changes were deployed in the same order as they’re listed in the plan, and that no changes are missing. The output is similar to that for deploy:

> sqitch verify
Verifying flipr_test
  * appschema ................. ok
  * users ..................... ok
  * insert_user ............... ok
  * change_pass @v1.0.0-dev1 .. ok
  * lists ..................... ok
  * insert_list ............... ok
  * delete_list ............... ok
  * flips ..................... ok
  * insert_flip ............... ok
  * delete_flip @v1.0.0-dev2 .. ok
  * pgcrypto .................. ok
  * insert_user ............... ok
  * change_pass ............... ok
Verify successful

Don’t want verification tests/scripts? Use --no-verify when you call sqitch add and none will be created. Or tell it never to create verify scripts by setting the turning off the add.with_verify option:

sqitch config --bool add.with_verify no

If you somehow run deploy --verify or verify anyway, Sqitch will emit a warning for any changes without verify scripts, but won’t consider them failures.

Up Front Dependency Checking

The other significant change in v0.950 is that the deploy and revert commands (and, by extension the rebase command) now verify that dependencies have been checked before deploying or reverting anything. Previously, Sqitch checked the dependencies for each change before deploying it, but it makes much more sense to check them for all changes to be deployed before doing anything at all. This reduces the chances of unexpected reversions.

Still hacking on Sqitch, of course, though nearly all the commands I initially envisioned are done. Next up, I plan to finally implement support for SQLite, add a few more commands to simplify plan file modification, and to create a new site, since the current site is woefully out-of-date. Until then, though, check out this presentation and, of course, the tutorial.

Looking for the comments? Try the old layout.

SQL Change Management Sans Duplication

In the previous episode in this series, I had one more issue with regard to SQL change management that I wanted to resolve:

  1. There is still more duplication of code than I would like, in that a procedure defined in one change script would have to be copied whole to a new script for any changes, even simple single-line changes.

So let’s see what we can do about that. Loading it into Git, our first example looks like this:

> alias sqlhist="git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print \"\"} /./'"
> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (HEAD, gamma, master)]
sql/deploy/add_user.sql

(Aside: I’ve created an alias, sqlhist, on the first line, so that all the Git and Awk magic doesn’t clutter the remaining examples.)

So, we’ve got the creation of the users table under the alpha tag, the addition of the widgets table and an accompanying add_widget() function under the beta tag, and the creation of the add_user() function under the gamma tag. So far so good. Now, let’s say that gamma has been deployed to production, and now we’re ready to add a feature for the next release.

Modify This

It turns out that our users really want a timestamp for the time a widget was created. So let’s add a new change script that adds a created_at column to the widgets table. First we add sql/deploy/widgets_created_at.sql with:

-- requires: widgets_table
ALTER TABLE widgets ADD created_at TIMESTAMPTZ;

And then the accompanying revert script, sql/revert/widgets_created_at.sql:

ALTER TABLE widgets DROP COLUMN IF EXISTS created_at;

Commit them and now our deployment configuration looks like this:

> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2 (HEAD, master)]
sql/deploy/widgets_created_at.sql

So far so good. We have a simple delta script that modifies the existing table, and there is no code duplication. Time to modify the add_widget() function to insert the timestamp. Recall that, in the first article in this series, I created a separate sql/deploy/add_widgets_v2.sql file, copied the existing function in its entirety into the new file, and modified it there. If we were to do that here, the resulting deployment configuration would look something like this:

> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2]
sql/deploy/widgets_created_at.sql

[dfba488cfd9145928a25d8d48de3231da84s4bd2 (HEAD, master)]
sql/deploy/add_widget_v2.sql

Which would be fine, except that if someone else wanted to see what had changed, here’s what git diff would output:

diff --git a/sql/deploy/add_widget_v2.sql b/sql/deploy/add_widget_v2.sql
new file mode 100644
index 0000000..9132195
--- /dev/null
+++ b/sql/deploy/add_widget_v2.sql
@@ -0,0 +1,8 @@
+-- requires widgets_created_at
+CREATE OR REPLACE FUNCTION add_widget(
+    username   TEXT,
+    widgetname TEXT
+) RETURNS VOID LANGUAGE SQL AS $$
+    INSERT INTO widgets (created_by, name, created_at)
+    VALUES ($1, $2, NOW());
+$$;

So, what changed in the add_widget() function between gamma and now? One cannot tell from this diff: it looks like a brand new function. And no web-based VCS interface will show you, either; it’s just not inherent in the commit. We have to actually know that it was just an update to an existing function, and what files to manually diff, like so:

> diff -u sql/deploy/add_widget.sql sql/deploy/add_widget_v2.sql 
--- sql/deploy/add_widget.sql   2012-01-28 13:06:24.000000000 -0800
+++ sql/deploy/add_widget_v2.sql    2012-01-28 13:26:59.000000000 -0800
@@ -1,8 +1,8 @@
--- requires: widgets_table
-
+-- requires: widgets_created_at
    CREATE OR REPLACE FUNCTION add_widget(
        username   TEXT,
        widgetname TEXT
    ) RETURNS VOID LANGUAGE SQL AS $$
-    INSERT INTO widgets (created_by, name) VALUES ($1, $2);
+    INSERT INTO widgets (created_by, name, created_at)
+    VALUES ($1, $2, NOW());
    $$;

Much better, but how annoying is that? It doesn’t allow us to really take advantage of the VCS, all because we need SQL changes to run in a very specific order.

But let’s ignore that for the moment. Let’s just throw out the commit with add_widgets_v2.sql and go ahead and change the add_widget change script directly. So the history now looks like this:

> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2]
sql/deploy/widgets_created_at.sql

[e4b970aa36f27451fe377791eab040a73c6eb47a (HEAD, epsilon, master)]
sql/deploy/add_widget.sql

Naturally, the add_widget script appears twice now, once under the beta tag and once under epsilon (which I’ve just tagged). What are the consequences for our migration? Well, if we were to build a new database from the beginning, running these migrations as listed here, we would get an error while applying the beta changes:

ERROR:  column "created_at" of relation "widgets" does not exist
LINE 5:     INSERT INTO widgets (created_by, name, created_at)

This is because the created_at column won’t exist until the widgets_created_at change is applied. That won’t do, will it? Fortunately, Git knows exactly what the add_widget deploy script looked like under the beta tag, and we can ask it:

> git show beta:sql/deploy/add_widget.sql
-- requires: widgets_table

CREATE OR REPLACE FUNCTION add_widget(
    username   TEXT,
    widgetname TEXT
) RETURNS VOID LANGUAGE SQL AS $$
    INSERT INTO widgets (created_by, name) VALUES ($1, $2);
$$;

Boom, there it is, with no reference to created_at. Using this technique, our SQL deployment app can successfully apply all of our database changes by iterating over the list of changes and applying the contents of each script at the time of the appropriate commit or tag. In other words, it could apply the output from each of these commands:

git show alpha:sql/deploy/users_table.sql
git show beta:sql/deploy/widgets_table.sql
git show beta:sql/deploy/add_widget.sql
git show gamma:sql/deploy/add_user.sql
git show 44ba615b7813531f0acb6810cbf679791fe57bf2:sql/deploy/widget_created_at.sql
git show epsilon:sql/deploy/add_widget.sql

And everything will work exactly as it should: the original version of the add_widget change script will be for the beta tag, and the next version will be applied for the epsilon tag. Not bad, right? We get a nice, clean Git history and can exploit it to manage the changes.

Reversion to the Mean

But what about reversion? What if the deploy to epsilon failed, and we need to revert back to gamma? Recall that in the first article, I eliminated duplication by having the add_widget_v2 revert script simply call the add_widget deploy script. But such is not possible now that we’ve changed add_widget in place. What to do?

The key is for the change management script to know the difference between a new change script and a modified one. Fortunately, Git knows that, too, and we can get it to cough up that information with a simple change to the sqlhist alias: instead of passing --name-only, pass --name-status:

% alias sqlhist="git log -p --format='[%H%d]' --name-status --reverse sql/deploy \
| awk '/^\[/ {print \"\"} /./'"

Using this new alias, our history looks like:

> sqlhist

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
A   sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
A   sql/deploy/add_widget.sql
A   sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
A   sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2]
A   sql/deploy/widgets_created_at.sql

[e4b970aa36f27451fe377791eab040a73c6eb47a (HEAD, epsilon, master)]
M   sql/deploy/add_widget.sql

Now we have a letter defining the status of each file. An “A” means the file was added in that commit; an “M” means it was modified. But the upshot is that, to revert to gamma, our change management can see that add_widget was modified in epsilon, and, rather than apply a revert change script, it can just apply the version of the script as it existed under gamma:

> git show gamma:sql/deploy/add_widget.sql
-- requires: widgets_table

CREATE OR REPLACE FUNCTION add_widget(
    username   TEXT,
    widgetname TEXT
) RETURNS VOID LANGUAGE SQL AS $$
    INSERT INTO widgets (created_by, name) VALUES ($1, $2);
$$;

And there we are, right back to where we should be. Of course, the remaining epsilon deploy script, widget_created_at, was added in its commit, so we just apply the revert script and we’re set, back to gamma.

Still Configurable

To get back to the original idea of a migration configuration file, I still think it’s entirely do-able. All we need to is to have the change management app generate it, just as before. When it comes to modified — rather than added — deploy scripts, it can automatically insert new scripts with the full copies of previous versions, much as before. The resulting configuration would look something like this:

[3852b378aa029cc610a03806e8268ed452dce8a6 (alpha)]
sql/deploy/users_table.sql

[32883d5a08691351b07928fa4e4fb7e68c500973 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[b8b9f5c152675305c6b2d3e105d55a25019e0828 (gamma)]
sql/deploy/add_user.sql

[44ba615b7813531f0acb6810cbf679791fe57bf2]
sql/deploy/widgets_created_at.sql

[e4b970aa36f27451fe377791eab040a73c6eb47a (HEAD, epsilon, master)]
sql/deploy/add_widget_v2.sql

Note that last line, where we now have add_widget_v2. The change management script would simply generate this file, and create an additional revert script with the same name that just contains the contents of the deploy script as it was under the gamma tag.

Too Baroque?

Having written down these ideas that have plagued by brain for the last week, along with some examples using Git to confirm them, I’m convinced more than ever that this is entirely workable. But it also leads me to wonder if it’s too baroque. I intend these posts as a rough spec for how this thing should work, and I plan to implement it in the coming weeks. But I’m wondering how difficult it will be to explain it all to people?

So let me see if I can break it down to a few simple rules.

  • In general, you should create independent deploy and revert scripts for your SQL. Put a CREATE TABLE statement into its own script. If it requires some some other table, require declare the dependency. If you need to change it later, create a new script that uses ALTER TABLE.
  • In special cases where a simple change cannot be made without copying something wholesale, and where the deploy script is idempotent, you may simply modify the deploy script in-place.

That’s about it. The idempotence of the deploy script is important for ensuring consistency, and applies very well to features such as user-defined functions. For other objects, there are generally ALTER statements that allow changes to be made without wholesale copying of existing code.

So what am I missing? What have I overlooked? What mistakes in my logic have I made? Do you think this will be too tricky to implement, or to use? Is it hard to understand? Your comments would be greatly appreciated, because I am going to write an app to do this stuff, and want to get it right.

Thanks for sticking with me through all the thought experiments. For my next post on this topic, I expect to have an interface spec for the new app.

Looking for the comments? Try the old layout.

VCS-Enabled SQL Change Management

In my previous post, I outlined the basics of a configuration-file and dependency-tracking SQL deployment architecture, but left a couple of additional challenges unresolved. They were:

  1. I would rather not have to hand-edit a configuration file, as it it’s finicky and error-prone.

  2. There is still more duplication of code than I would like, in that a procedure defined in one change script would have to be copied whole to a new script for any changes, even single-line simple changes.

I believe I can solve both of these issues by simple use of a VCS. Since all of my current projects currently use Git, I will use it for the examples here.

Git it On

First, recall the structure of the configuration file, which was something like this:

[alpha]
users_table

[beta]
add_widget
widgets_table

[gamma]
add_user

[delta]
widgets_created_at
add_widget_v2

Basically, we have bracketed tags identifying changes that should be deployed. Now have a look at this:

> git log -p --format='[%H]' --name-only --reverse sql/deploy
[8920aaf7947a56f6777e69a21b70fd877c8fd6dc]

sql/deploy/users_table.sql
[f7da5fd4b7391747f75d85db6fa82de47b9e4c00]

sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql
[ea10b9e566934ef256debe8752504189436e162a]

sql/deploy/add_user.sql
[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4]

sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

Look familiar? Let’s use a bit of awk magic to neaten things a bit (Thanks helwig!):

> git log -p --format='[%H]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

Ah, that’s better. We have commit SHA1s for tags, followed by the appropriate lists of deployment scripts. But wait, we can decorate it, too:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql
[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (HEAD, delta, master)]

Look at that! Actual VCS tags built right in to the output. So, assuming our deployment app can parse this output, we can deploy or revert to any commit or tag. Better yet, we don’t have to maintain a configuration file, because the VCS is already tracking all that stuff for us! Our change management app can automatically detect if we’re in a Git repository (or Mercurial or CVS or Subversion or whatever) and fetch the necessary information for us. It’s all there in the history. We can name revision identifiers (SHA1s here) to deploy or revert to, or use tags (alpha, beta, gamma, delta, HEAD, or master in this example).

And with careful repository maintenance, this approach will work for branches, as well. For example, say you have developers working in two branches, feature_foo and feature_bar. In feature_foo, a foo_table change script gets added in one commit, and an add_foo script in a second commit. Merge it into master and the history now looks like this:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e (HEAD, master, feature_foo)]
sql/deploy/add_foo.sql

So far so good.

Meanwhile, development in the feature_bar branch has added a bar_table change script in one commit and add_bar in another. Because development in this branch was going on concurrently with the feature_foo branch, if we just merged it into master, we might get a history like this:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'
[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[d1882d7b4cfcf5c57030bd5a15f8571bfd7e48e2]
sql/deploy/bar_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e]
sql/deploy/add_foo.sql

[2330da1caae9a46ea84502bd028ead399ca3ca02 (feature_bar)]
sql/deploy/add_bar.sql

[73979ede2c8589cfe24c9213a9538f305e6f508f (HEAD, master, feature_foo)]

Note that bar_table comes before add_foo. In other words, the feature_foo and feature_bar commits are interleaved. If we were to deploy to HEAD, and then need to revert feature_bar, bar_table would not be reverted. This is, shall we say, less than desirable.

There are at least two ways to avoid this issue. One is to squash the merge into a single commit using git merge --squash feature_bar. This would be similar to accepting a single patch and applying it. The resulting history would look like this:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e]
sql/deploy/add_foo.sql

[91a048c05e0444682e2e4763e8a7999a869b4a77 (HEAD, master)]
sql/deploy/add_bar.sql
sql/deploy/bar_table.sql

Now both of the feature_bar change scripts come after the feature_foo changes. But it might be nice to keep the history. So a better solution (and the best practice, I believe), is to rebase the feature_bar branch before merging it into master, like so:

> git rebase master
First, rewinding head to replay your work on top of it...
Applying: Add bar.
Applying: Add add_bar().
> git checkout master
Switched to branch 'master'
> git merge feature_bar
Updating 7f89e23..0fab7a0
Fast-forward
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 sql/deploy/add_bar.sql
 create mode 100644 sql/deploy/bar_table.sql
 create mode 100644 sql/revert/add_bar.sql
 create mode 100644 sql/revert/bar_table.sql

And now we should have:

> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e]
sql/deploy/add_foo.sql

[0e53c29eb47c618d0a8818cc17bd5a0aab0acd6d]
sql/deploy/bar_table.sql

[0fab7a0ba928b34a46a9495d4efc1c73d9133d37 (HEAD, master, feature_bar)]
sql/deploy/add_bar.sql

Awesome, now everything is in the correct order. We did lose the feature_foo “tag,” though. That’s because it wasn’t a tag, and neither is feature_bar here. They are, rather, branch names, which we becomes obvious when using “full” decoration:

git log --format='%d' --decorate=full HEAD^..      
 (HEAD, refs/heads/master, refs/heads/feature_foo)

After the next commit, it will disappear from the history. So let’s just tag the relevant commits ourselves:

> git tag feature_foo 7f89e23c9f1e7fc298c69400f6869d701f76759e
> git tag feature_bar
> git log -p --format='[%H%d]' --name-only --reverse sql/deploy \
| awk '/^\[/ {print ""} /./'

[8920aaf7947a56f6777e69a21b70fd877c8fd6dc (alpha)]
sql/deploy/users_table.sql

[f7da5fd4b7391747f75d85db6fa82de47b9e4c00 (beta)]
sql/deploy/add_widget.sql
sql/deploy/widgets_table.sql

[ea10b9e566934ef256debe8752504189436e162a (gamma)]
sql/deploy/add_user.sql

[89e85f98d891a2984ad4e3c42d8ca8cf31f3b2b4 (delta)]
sql/deploy/add_widget_v2.sql
sql/deploy/widgets_created_at.sql

[cbb48144065dd345c5248e5f1e42c1c7391a88ed]
sql/deploy/foo_table.sql

[7f89e23c9f1e7fc298c69400f6869d701f76759e (feature_foo)]
sql/deploy/add_foo.sql

[0e53c29eb47c618d0a8818cc17bd5a0aab0acd6d]
sql/deploy/bar_table.sql

[0fab7a0ba928b34a46a9495d4efc1c73d9133d37 (HEAD, feature_bar, master, feature_bar)]
sql/deploy/add_bar.sql

Ah, there we go! After the next commit, one of those feature_bars will disappear, since the branch will have been left behind. But we’ll still have the tag.

Not Dead Yet

Clearly we can intelligently use Git to manage SQL change management. (Kind of stands to reason, doesn’t it?) Nevertheless, I believe that a configuration file still might have its uses. Not only because not every project is in a VCS (it ought to be!), but because oftentimes a project is not deployed to production as a git clone. It might be distributed as a source tarball or an RPM. In such a case, including a configuration file in the distribution would be very useful. But there is still no need to manage it by hand; our deployment app can generate it from the VCS history before packaging for release.

More to Come

I’d planned to cover the elimination of duplication, but I think this is enough for one post. Watch for that idea in my next post.

Looking for the comments? Try the old layout.