Sqitch on Oracle

I found myself with a little unexpected time at work recently, and since we use Oracle (for a few more months), I decided to port Sqitch. Last night, I released v0.970 with full support for Oracle. I did the development against an 11.2 VirtualBox VM, though I think it should work on 10g, as well.

Sqitch is available from the usual locations. For Oracle support, you’ll need the Instant Client, including SQL*Plus. Make sure you have $ORACLE_HOM set and you’ll be ready to install. Via CPAN, it’s

cpan install App::Sqitch DBD::Oracle

Via Homebrew:

brew tap theory/sqitch
brew install sqitch-oracle

Via ActiveState PPM, install ActivePerl, then run:

ppm install App-Sqitch DBD-Oracle
PGCon 2013

There are a few other minor tweaks and fixed in this release; check the release notes for details.

Want more? I will be giving a half-day tutorial, entitled “Agile Database Development,” on database development with Git, Sqitch, and pgTAP at on May 22 PGCon 2013 in Ottawa, Ontario. Come on up!

Sqitch: Now with SQLite Support

This week I released Sqitch v0.961. There are a number of great new features v0.95x, including the beginning of two features I’ve had in mind since the beginning: VCS integration and support for multiple databases.

First the VCS integration. This comes in the form of the new checkout command, which automatically makes database changes for you when you change VCS branches. Say you have two branches, “widgets” and “big-fix”, and that their Sqitch plans diverge. If you’re in the “widgets” branch and want to switch to “big-fix”, just run

sqitch checkout big-fix

Sqitch will look at the “big-fix” plan, figure out the last change in common with “widgets”, and revert to it. Then it checks out “big-fix” and deploys. That’s it. Yes, you could do this yourself, but do you really remember the last common change between the two branches? Do you want to take the time to look for it, then revert, check out the new branch, and deploy? This is exactly the sort of common developer task that Sqitch aims to take the pain out of, and I’m thrilled to provide it.

You know what’s awesome, though? This feature never occurred to me. I didn’t come up with it, and didn’t implement it. No, it was dreamt up and submitted in a pull request by Ronan Dunklau. I have wanted VCS integration since the beginning, but had yet to get ‘round to it. Now Ronan has jumpstarted it. A million thanks!

One downside: it’s currently Git-only. I plan to add infrastructure for supporting multiple VCSes, probably with Git and Subversion support to begin with. Watch for that in v0.970 in the next couple months.

The other big change is the addition of SQLite support alongside the existing PostgreSQL support. Fortunately, I was able to re-use nearly all the code, so the SQLite adapter is just a couple hundred lines long. For the most part, Sqitch on SQLite works just like on PostgreSQL. The main difference is that Sqitch stores its metadata in a separate SQLite database file. This allows one to use a single metadata file to maintain multiple databases, which can be important if you use multiple databases as schemas pulled into a single connection via ATTACH DATABASE.

Curious to try it out? Install Sqitch from CPAN or via the Homebrew Tap and then follow the new Sqitch SQLite tutorial.

Of the multitude of other Changes, one other bears mentioning: the new plan command. This command is just like log, except that it shows what is in the plan file, rather than what changes have been made to the database. This can be useful for quickly listing what’s in a plan, for example when you need to remember the names of changes required by a change you’re about to add. The --oneline option is especially useful for this functionality. An example from the tutorial’s plan:

> sqitch plan --oneline
In sqitch.plan
6238d8 deploy change_pass
d82139 deploy insert_user
7e6e8b deploy pgcrypto
87952d deploy delete_flip @v1.0.0-dev2
b0a951 deploy insert_flip
834e6a deploy flips
d0acfa deploy delete_list
77fd99 deploy insert_list
1a4b9a deploy lists
0acf77 deploy change_pass @v1.0.0-dev1
ec2dca deploy insert_user
bbb98e deploy users
ae1263 deploy appschema

I personally will be using this a lot, Yep, scratching my own itch here. What itch do you have to scratch with Sqitch?

In related news, I’ll be giving a tutorial at PGCon next month, entitled “Agile Database Development”. We’ll be developing a database for a web application using Git for source code management, Sqitch for database change management, and pgTAP for unit testing. This is the stuff I do all day long at work, so you can also think of it as “Theory’s Pragmatic approach to Database Development.” See you there?

Sqitch on Windows (and Linux, Solaris, and OS X)

Thanks to the hard-working hamsters at the ActiveState PPM Index, Sqitch is available for installation on Windows. According to the Sqitch PPM Build Status, the latest version is now available for installation. All you have to do is:

  1. Download and install ActivePerl
  2. Open the Command Prompt
  3. Type ppm install App-Sqitch

As of this writing, only PostgreSQL is supported, so you will need to install PostgreSQL.

But otherwise, that’s it. In fact, this incantation works for any OS that ActivePerl supports. Here’s where you can find the sqitch executable on each:

  • Windows: C:\perl\site\bin\sqitch.bat
  • Mac OS X: ~/Library/ActivePerl-5.16/site/bin/sqitch (Or /usr/local/ActivePerl-5.16/site/bin if you run sudo ppm)
  • Linux: /opt/ActivePerl-5.16/site/bin/sqitch
  • Solaris/SPARC (Business edition-only): /opt/ActivePerl-5.16/site/bin/sqitch

This makes it easy to get started with Sqitch on any of those platforms without having to become a Perl expert. So go for it, and then get started with the tutorial!

Sqitch Homebrew Tap

If Sqitch is to succeed, it needs to get into the hands of as many people as possible. That means making it easy to install for people who are not Perl hackers and don’t want to deal with CPAN. The Sqitch Homebrew Tap is my first public stab at that. It provides a series of “Formulas” for Homebrew users to easily download, build, and install Sqitch and all of its dependencies.

If you are one of these lucky people, here’s how to configure the Sqitch tap:

brew tap theory/sqitch

Now you can install the core Sqitch application:

brew install sqitch

That’s it. Make sure it works:

> sqitch --version
sqitch (App::Sqitch) 0.953

It won’t do you much good without support for your database, though. Currently, there is a build for PostgreSQL. Note that this requires the Homebrew core PostgreSQL server:

brew install sqitch_pg

Sqitch hasn’t been ported to other database engines yet, but once it is, expect other formulas to follow. But if you use PostgreSQL (or just want to experiment with it), you’re ready to rock! I suggest following along the tutorial or taking in the latest iteration of the introductory presentation (video of an older version here).

My thanks to IRC user “mistym” for the help and suggestions in getting this going. My Ruby is pretty much rusted through, soI could not have done it without the incredibly responsive help!

Bootstrapping Bucardo Master/Master Replication

Let’s say you have a production database up and running and you want to set up a second database with Bucardo-powered replication between them. Getting a new master up and running without downtime for an existing master, and without losing any data, is a bit fiddly and under-documented. Having just figured out one way to do it with the forthcoming Bucardo 5 code base, I wanted to blog it as much for my own reference as for yours.

First, let’s set up some environment variables to simplify things a bit. I’m assuming that the database names and usernames are the same, and only the host names are different:

export PGDATABASE=widgets
export PGHOST=here.example.com
export PGHOST2=there.example.com
export PGSUPERUSER=postgres

And here are some environment variables we’ll use for Bucardo configuration stuff:

export BUCARDOUSER=bucardo
export BUCARDOPASS=*****
export HERE=here
export THERE=there

First, let’s create the new database as a schema-only copy of the existing database:

createdb -U $PGSUPERUSER -h $PGHOST2 $PGDATABASE
pg_dump -U $PGSUPERUSER -h $PGHOST --schema-only $PGDATABASE \
| psql -U $PGSUPERUSER -h $PGHOST2 -d $PGDATABASE

You might also have to copy over roles; use pg_dumpall --globals-only to do that.

Next, we configure Bucardo. Start by telling it about the databases:

bucardo add db $HERE$PGDATABASE dbname=$PGDATABASE host=$PGHOST user=$BUCARDOUSER pass=$BUCARDOPASS
bucardo add db $THERE$PGDATABASE dbname=$PGDATABASE host=$PGHOST2 user=$BUCARDOUSER pass=$BUCARDOPASS

Tell it about all the tables we want to replicate:

bucardo add table public.foo public.bar relgroup=myrels db=$HERE$PGDATABASE 

Create a multi-master database group for the two databases:

bucardo add dbgroup mydbs $HERE$PGDATABASE:source $THERE$PGDATABASE:source  

And create the sync:

bucardo add sync mysync relgroup=myrels dbs=mydbs autokick=0

Note autokick=0. This ensures that, while deltas are logged, they will not be copied anywhere until we tell Bucardo to do so.

And now that we know that any changes from here on in will be queued for replication, we can go ahead and copy over the data. The only caveat is that we need to disable the Bucardo triggers on the target system, so that our copying does not try to queue up. We do that by setting the session_replication_role GUC to “replica” while doing the copy:

pg_dump -U $PGSUPERUSER -h $PGHOST --data-only -N bucardo $PGDATABASE \
| PGOPTIONS='-c session_replication_role=replica' \
psql -U $PGSUPERUSER -h $PGHOST2 -d $PGDATABASE

Great, now all the data is copied over, we can have Bucardo copy any changes that have been made in the interim, as well as any going forward:

bucardo update sync mysync autokick=1
bucardo reload config

Bucardo will fire up the necessary syncs and copy over any interim deltas. And any changes you make to either system in the future will be copied to the other.

Dist::Zilla::LocaleTextDomain for Translators

Here’s a followup on my post about localizing Perl modules with Locale::TextDomain. Dist::Zilla::LocaleTextDomain was great for developers, less so for translators. A Sqitch translator asked how to test the translation file he was working on. My only reply was to compile the whole module, then install it and test it. Ugh.

Today, I released Dist::Zilla::LocaleTextDomain v0.85 with a new command, msg-compile. This command allows translators to easily compile just the file they’re working on and nothing else. For pure Perl modules in particular, it’s pretty easy to test then. By default, the compiled catalog goes into ./LocaleData, where convincing the module to find it is simple. For example, I updated the test sqitch app to take advantage of this. Now, to test, say, the French translation file, all the translator has to do is:

> dzil msg-compile po/fr.po
[LocaleTextDomain] po/fr.po: 155 translated messages, 24 fuzzy translations, 16 untranslated messages.

> LANGUAGE=fr ./t/sqitch foo
"foo" n'est pas une commande valide

I hope this simplifies things for translators. See the notes for translators for a few more words on the subject.

Sqitch: Trust, But Verify

New today: Sqitch v0.950. There are a few bug fixes, but the most interesting new feature in this release is the verify command, as well as the complementary --verify option to the deploy command. The add command has created test scripts since the beginning; they were renamed verify in v0.940. In v0.950 these scripts are actually made useful.

The idea is simply to test that a deploy script did what it was supposed to do. Such a test should make no assumptions about data or state other than that affected by the deploy script, so that it can be run against a production database without doing any damage. If it finds that the deploy script failed, it should die.

This is easier than you might at first think. Got a Sqitch change that creates a table with two columns? Just SELECT from it:

SELECT user_id, name
  FROM user
 WHERE FALSE;

If the table does not exist, the query will die. Got a change that creates a function? Make sure it was created by checking a privilege:

SELECT has_function_privilege('insert_user(text, text)', 'execute');

PostgreSQL will throw an error if the function does not exist. Not running PostgreSQL? Well, you’re probably not using Sqitch yet, but if you were, you might force an error by dividing by zero. Here’s an example verifying that a schema exists:

SELECT 1/COUNT(*)
  FROM information_schema.schemata
 WHERE schema_name = 'myapp';

At this point, Sqitch doesn’t care at all what you put into your verify scripts. You just need to make sure that they indicate failure by throwing an error when passed to the database command-line client.

The best time to run a change verify script is right after deploying the change. The --verify option to the deploy command does just that. If a verify script fails, the deploy is considered to have failed. Here’s what failure looks like:

> sqitch deploy
Deploying changes to flipr_test
  + appschema ................. ok
  + users ..................... ok
  + insert_user ............... ok
  + change_pass @v1.0.0-dev1 .. ok
  + lists ..................... psql:verify/lists.sql:7: ERROR:  column "timestamp" does not exist
LINE 1: SELECT nickname, name, description, timestamp
                                            ^
Verify script "verify/lists.sql" failed.
not ok
Reverting all changes
  - change_pass @v1.0.0-dev1 .. ok
  - insert_user ............... ok
  - users ..................... ok
  - appschema ................. ok
Deploy failed

Good, right? In addition, you can always verify the state of a database using the verify command. It runs the verify scripts for all deployed changes. It also ensures that all the deployed changes were deployed in the same order as they’re listed in the plan, and that no changes are missing. The output is similar to that for deploy:

> sqitch verify
Verifying flipr_test
  * appschema ................. ok
  * users ..................... ok
  * insert_user ............... ok
  * change_pass @v1.0.0-dev1 .. ok
  * lists ..................... ok
  * insert_list ............... ok
  * delete_list ............... ok
  * flips ..................... ok
  * insert_flip ............... ok
  * delete_flip @v1.0.0-dev2 .. ok
  * pgcrypto .................. ok
  * insert_user ............... ok
  * change_pass ............... ok
Verify successful

Don’t want verification tests/scripts? Use --no-verify when you call sqitch add and none will be created. Or tell it never to create verify scripts by setting the turning off the add.with_verify option:

sqitch config --bool add.with_verify no

If you somehow run deploy --verify or verify anyway, Sqitch will emit a warning for any changes without verify scripts, but won’t consider them failures.

Up Front Dependency Checking

The other significant change in v0.950 is that the deploy and revert commands (and, by extension the rebase command) now verify that dependencies have been checked before deploying or reverting anything. Previously, Sqitch checked the dependencies for each change before deploying it, but it makes much more sense to check them for all changes to be deployed before doing anything at all. This reduces the chances of unexpected reversions.

Still hacking on Sqitch, of course, though nearly all the commands I initially envisioned are done. Next up, I plan to finally implement support for SQLite, add a few more commands to simplify plan file modification, and to create a new site, since the current site is woefully out-of-date. Until then, though, check out this presentation and, of course, the tutorial.

Sqitch Update: All Your Rebase Are…Never Mind

I’m pleased to announce the release of Sqitch v0.940. The focus of this release? Sanity.

I’ve been doing a lot of Sqitch-based database development at work. Overall it has worked quite well. Except for one thing: often the order in which changes would be arranged would change from one run to the next. Oy.

Out of Order

The reason? The plan parser would perform a topological sort of all the changes between tags based on their dependencies. I’ve been careful, for the most part, to keep my changes in the proper order in our plan files, but the topological sort would often pick a different order. Still valid in terms of dependency ordering, but different from the plan file.

Given the same inputs, the sort always produced the same order. However, whenever I added a new changes (and I do that all the time while developing), there would then be a new input, which could result in a completely different order. The downside is that I would add a change, run sqitch deploy, and it would die because it thought something needed to be deployed that had already been deployed, simply because it sorted it to come after an undeployed change. So annoying.. It also caused problems in for production deployments, because different machines with different Perls would sort the plans in different ways.

So I re-wrote the sorting part of the the plan parser so that it no longer sorts. The list of changes is now always identical to the order in the plan file. It still checks dependencies, of course, only now it throws an exception if it finds an ordering problem, rather than re-ordering for you. I’ve made an effort to tell the user how to move things around in the plan file to fix ordering issues, so hopefully everything will be less mysterious.

Of course, many will never use dependencies, in which case this change has effect. But it was important to me, as I like to specify dependencies as much as I can, for my own sanity.

See? There’s that theme!

Everyone has a Mom

Speaking of ordering, as we have been starting to do production deployments, I realized that my previous notion to allow developers to reorder changes in the plan file without rebuilding databases was a mistake. It was too easy for someone to deploy to an existing database and miss changes because there was nothing to notice that changes had not been deployed. This was especially a problem before I addressed the ordering issue.

Even with ordering fixed, I thought about how git push works, and realized that it was much more important to make sure things really were consistent than it was to make things slightly more convenient for developers.

So I changed the way change IDs are generated. The text hashed for IDs now includes the ID of the parent change (if there is one), the change dependencies, and the change note. If any of these things change, the ID of the change will change. So they might change a lot during development, while one moves things arounds, changes dependencies, and tweaks the description. But the advantage is for production, where things have to be deployed exactly right, with no modifications, or else the deploy will fail. This is sort of like requiring all Git merges to be fast-forwarded, and philosophically in line with the Git practice of never changing commits after they’re pushed to a remote repository accessible to others.

Curious what text is hashed for the IDs? Check out the new show command!

Rebase

As a database hacker, I still need things to be relatively convenient for iterative development. So I’ve also added the rebase command. It’s simple, really: It just does a revert and a deploy a single command. I’m doing this all day long, so I’m happy to save myself a few steps. It’s also nice that I can do sqitch rebase @HEAD^ to revert and re-apply the latest change over and over again without fear that it will fail because of an ordering problem. But I already mentioned that, didn’t I?

Order Up

Well, mostly. Another ordering issue I addressed was for the revert --to option. It used to be that it would find the change to revert to in the plan, and revert based on the plan order. (And did I mention that said order might have changed since the last deploy?) v0.940 now searches the database for the revert target. Not only that, the full list of changes to deploy to revert to the target is also returned from the database. In fact, the revert no longer consults the plan file at all. This is great if you’ve re-ordered things, because the revert will always be the reverse order of the previous deploy. Even if IDs have changed, revert will find the changes to revert by name. It will only fail if you’ve removed the revert script for a change.

So simple, conceptually: revert reverts in the proper order based on what was deployed before. deploy deploys based on the order in the plan.

Not @FIRST, Not @LAST

As a result of the improved intelligence of revert, I have also deprecated the @FIRST and @LAST symbolic tags. These tags forced a search of the database, but were mainly used for revert. Now that revert always searches the database, there’s nothing to force. They’re still around for backward compatibility, but no longer documented. Use @ROOT and @HEAD, instead.

Not Over

So lots of big changes, including some compatibility changes. But I’ve tried hard to make them as transparent as possible (old IDs will automatically be updated by deploy). So take it for a spin!

Meanwhile, I still have quite a few other improvements I need to make. On my short list are:

New in PostgreSQL 9.2: format()

There’s a new feature in PostgreSQL 9.2 that I don’t recall seeing blogged about elsewhere: the format() function. From the docs:

Format a string. This function is similar to the C function sprintf; but only the following conversion specifications are recognized: %s interpolates the corresponding argument as a string; %I escapes its argument as an SQL identifier; %L escapes its argument as an SQL literal; %% outputs a literal %. A conversion can reference an explicit parameter position by preceding the conversion specifier with n$, where n is the argument position.

If you do a lot of dynamic query building in PL/pgSQL functions, you’ll immediately see the value in format(). Consider this function:

CREATE OR REPLACE FUNCTION make_month_partition(
    base_table   TEXT,
    schema_name  TEXT,
    month        TIMESTAMP
) RETURNS VOID LANGUAGE plpgsql AS $_$
DECLARE
    partition TEXT := quote_ident(base_table || '_' || to_char(month, '"y"YYYY"m"MM'));
    month_start TIMESTAMP := date_trunc('month', month);
BEGIN
    EXECUTE '
        CREATE TABLE ' || quote_ident(schema_name) || '.' || partition || ' (CHECK (
               created_at >= ' || quote_literal(month_start) || '
           AND created_at < '  || quote_literal(month_start + '1 month'::interval) || '
        )) INHERITS (' || quote_ident(schema_name) || '.' || base_table || ')
    ';
    EXECUTE 'GRANT SELECT ON ' || quote_ident(schema_name) || '.' || partition || '  TO dude;';
END;
$_$;

Lots of concatenation and use of quote_ident() to get things just right. I don’t know about you, but I always found this sort of thing quite difficult to read. But format() allows use to eliminate most of the operators and function calls. Check it:

CREATE OR REPLACE FUNCTION make_month_partition(
    base_table   TEXT,
    schema_name  TEXT,
    month        TIMESTAMP
) RETURNS VOID LANGUAGE plpgsql AS $_$
DECLARE
    partition TEXT := base_table || '_' || to_char(month, '"y"YYYY"m"MM');
    month_start TIMESTAMP := date_trunc('month', month);
BEGIN
    EXECUTE format(
        'CREATE TABLE %I.%I (
            CHECK (created_at >= %L AND created_at < %L)
        ) INHERITS (%I.%I)',
        schema_name, partition,
        month_start, month_start + '1 month'::interval,
        schema_name, base_table
    );
    EXECUTE format('GRANT SELECT ON %I.%I TO dude', schema_name, partition);
END;
$_$;

I don’t know about you, but I find that a lot easier to read. which means it’ll be easier to maintain. So if you do much dynamic query generation inside the database, give format() a try, I think you’ll find it a winner.

Update 2012-11-16: Okay, so I somehow failed to notice that format() was actually introduced in 9.1 and covered by depesz. D’oh! Well, hopefully my little post will help to get the word out more, at least. Thanks to my commenters.

Thinking about Changing Sqitch Change IDs

When Sqitch, (the database change management app I’ve been working on for the last several months) parses a deployment plan, it creates a unique ID for each change in the plan. This ID is a SHA1 hash generated from information about the change, which is a string that looks something like this:

project flipr
change add_users_table
planner Marge N. O’Vera <marge@example.com>
date 2012-11-14T01:10:13Z

The nice thing about the ID is that it’s unique: it’s unlikely that the same user with the same email address will add a change with the same name to a project with the same name within a single second. If the plan includes a URI, that’s included, too, for additional uniqueness.

Note, however, that it does not include information about any other changes. Git, from which I modeled the generation of these IDS, always includes the parent commit SHA1 in its uniquely-identifying info. An example:

> git cat-file commit 744c01bfa3798360c1792a8caf784b650e52d89e               
tree d3a64897cca4538ff5c0c41db3f82ab033a09bec
parent 482a79ae2cda5085eed731be2e70739ab37997ee
author David E. Wheeler <david@justatheory.com> 1337355746 -0400
committer David E. Wheeler <david@justatheory.com> 1337355746 -0400

Timestamp v0.30.

The reason Git does this is so that a commit is not just uniquely identified globally, but so that it can only follow an existing commit. Mark Jason Dominus calls this Linus Torvalds' greatest invention. Why? This is now Git knows it can fast-forward changes.

Why doesn’t Sqitch do something similar? My original thinking had been to make it easier for a database developer to do iterative development. And one of the requirements for that, in my experience, is the ability to freely reorder changes in the plan. Including the SHA1 of the preceding change would make that trickier. But it also means that, when you deploy to a production database, you lose that extra layer of security that ensures that, yes, the next change really should be deployed. That is, it would be much harder to deploy with changes missing or changed from what was previously expected. And I think that’s only sane for a production environment.

Given that, I’ve started to rethink my decision to omit the previous change SHA1 from the identifier of a change. Yes, it could be a bit more of hassle for a developer, but not, I think, that much of a hassle. The main thing would be to allow reverts to look up their scripts just by change name or even file name, rather than ID. We want deploys to always be correct, but I’m thinking that reverts should always just try very hard to remove changes. Even in production.

I am further thinking that the ID should even include the list of prerequisite changes for even stronger identification. After all, one might change just the dependencies and nothing else, but it would still be a different change. And maybe it should include the note, too? The end result would be a hash of something like this:

project flipr
change add_users_table
parent 7cd96745746cd6baa5da352de782354b21838b25
requires [schemas roles common:utils]
planner Marge N. O’Vera <marge@example.com>
date 2012-11-14T01:10:13Z

Adds the users table to the database.

This will break existing installations, so I’d need to add a way to update them, but otherwise, I think it might be a win overall. Thoughts?

Mocking Serialization Failures

I’ve been hacking on the forthcoming Bucardo 5 code base the last couple weeks, as we’re going to start using it pretty extensively at work, and it needed a little love to get it closer to release. The biggest issue I fixed was the handling of serialization failures.

When copying deltas from one database to another, Bucardo sets the transaction isolation to “Serializable”. As of PostgreSQL 9.1, this is true serializable isolation. However, there were no tests for it in Bucardo. And since pervious versions of PostgreSQL had poorer isolation (retained in 9.1 as “Repeatable Read”), I don’t think anyone really noticed it much. As I’m doing all my testing against 9.2, I was getting the serialization failures about half the time I ran the test suite. It took me a good week to chase down the issue. Once I did, I posted to the Bucardo mail list pointing out that Bucardo was not attempting to run a transaction again after failure, and at any rate, the model for how it thought to do so was a little wonky: it let the replicating process die, on the assumption that a new process would pick up where it left off. It did not.

Bucardo maintainer Greg Sabino Mullane proposed that we let the replicating process try again on its own. So I went and made it do that. And then the tests started passing every time. Yay!

Returning to the point of this post, I felt that there ought to be tests for serialization failures in the Bucardo test suite, so that we can ensure that this continues to work. My first thought was to use PL/pgSQL in 8.4 and higher to mock a serialization failure. Observe:

david=# \set VERBOSITY verbose
david=# DO $$BEGIN RAISE EXCEPTION 'Serialization error'
       USING ERRCODE = 'serialization_failure'; END $$;
ERROR:  40001: Serialization error
LOCATION:  exec_stmt_raise, pl_exec.c:2840

Cool, right? Well, the trick is to get this to run on the replication target, but only once. When Bucardo retries, we want it to succeed, thus properly demonstrating the COPY/SERIALIZATION FAIL/ROLLBACK/COPY/SUCCESS pattern. Furthermore, when it copies deltas to a target, Bucardo disables all triggers and rules. So how to get something trigger-like to run on a target table and throw the serialization error?

Studying the Bucardo source code, I discovered that Bucardo itself does not disable triggers and rules. Rather, it sets the session_replica_role GUC to “replica”. This causes PostgreSQL to disable the triggers and rules — except for those that have been set to ENABLE REPLICA. The PostgreSQL ALTER TABLE docs:

The trigger firing mechanism is also affected by the configuration variable session_replication_role. Simply enabled triggers will fire when the replication role is “origin” (the default) or “local”. Triggers configured as ENABLE REPLICA will only fire if the session is in “replica” mode, and triggers configured as ENABLE ALWAYS will fire regardless of the current replication mode.

Well how cool is that? So all I needed to do was plug in a replica trigger and have it throw an exception once but not twice. Via email, Kevin Grittner pointed out that a sequence might work, and indeed it does. Because sequence values are non-transactional, sequences return different values every time they’re access.

Here’s what I came up with:

CREATE SEQUENCE serial_seq;

CREATE OR REPLACE FUNCTION mock_serial_fail(
) RETURNS trigger LANGUAGE plpgsql AS $_$
BEGIN
    IF nextval('serial_seq') % 2 = 0 THEN RETURN NEW; END IF;
    RAISE EXCEPTION 'Serialization error'
          USING ERRCODE = 'serialization_failure';
END;
$_$;

CREATE TRIGGER mock_serial_fail AFTER INSERT ON bucardo_test2
    FOR EACH ROW EXECUTE PROCEDURE mock_serial_fail();
ALTER TABLE bucardo_test2 ENABLE REPLICA TRIGGER mock_serial_fail;

The first INSERT (or, in Bucardo’s case, COPY) to bucardo_test2 will die with the serialization error. The second INSERT (or COPY) succeeds. This worked great, and I was able to write test in a few hours and get them committed. And now we can be reasonably sure that Bucardo will always properly handle serialization failures.

Localize Your Perl modules with Locale::TextDomain and Dist::Zilla

I've just released Dist::Zilla::LocaleTextDomain v0.80 to the CPAN. This module adds support for managing Locale::TextDomain-based localization and internationalization in your CPAN libraries. I wanted to make it as simple as possible for CPAN developers to do localization and to support translators in their projects, and Dist::Zilla seemed like the perfect place to do it, since it has hooks to generate the necessary binary files for distribution.

Starting out with Locale::TextDomain was decidedly non-intuitive for me, as a Perl hacker, likely because of its gettext underpinnings. Now that I've got a grip on it and created the Dist::Zilla support, I think it's pretty straight-forward. To demonstrate, I wrote the following brief tutorial, which constitutes the main documentation for the Dist::Zilla::LocaleTextDomain distribution. I hope it makes it easier for you to get started localizing your Perl libraries.

Localize Your Perl modules with Locale::TextDomain and Dist::Zilla

Locale::TextDomain provides a nice interface for localizing your Perl applications. The tools for managing translations, however, is a bit arcane. Fortunately, you can just use this plugin and get all the tools you need to scan your Perl libraries for localizable strings, create a language template, and initialize translation files and keep them up-to-date. All this is assuming that your system has the gettext utilities installed.

The Details

I put off learning how to use Locale::TextDomain for quite a while because, while the gettext tools are great for translators, the tools for the developer were a little more opaque, especially for Perlers used to Locale::Maketext. But I put in the effort while hacking Sqitch. As I had hoped, using it in my code was easy. Using it for my distribution was harder, so I decided to write Dist::Zilla::LocaleTextDomain to make life simpler for developers who manage their distributions with Dist::Zilla.

What follows is a quick tutorial on using Locale::TextDomain in your code and managing it with Dist::Zilla::LocaleTextDomain.

This is my domain

First thing to do is to start using Locale::TextDomain in your code. Load it into each module with the name of your distribution, as set by the name attribute in your dist.ini file. For example, if your dist.ini looks something like this:

name    = My-GreatApp
author  = Homer Simpson <homer@example.com>
license = Perl_5

Then, in you Perl libraries, load Locale::TextDomain like this:

use Locale::TextDomain qw(My-GreatApp);

Locale::TextDomain uses this value to find localization catalogs, so naturally Dist::Zilla::LocaleTextDomain will use it to put those catalogs in the right place.

Okay, so it's loaded, how do you use it? The documentation of the Locale::TextDomain exported functions is quite comprehensive, and I think you'll find it pretty simple once you get used to it. For example, simple strings are denoted with __:

say __ 'Hello';

If you need to specify variables, use __x:

say __x(
    'You selected the color {color}',
    color => $color
);

Need to deal with plurals? Use __n

say __n(
    'One file has been deleted',
    'All files have been deleted',
    $num_files,
);

And then you can mix variables with plurals with __nx:

say __nx(
    'One file has been deleted.',
    '{count} files have been deleted.'",
    $num_files,
    count => $num_files,
);

Pretty simple, right? Get to know these functions, and just make it a habit to use them in user-visible messages in your code. Even if you never expect to translate those messages, just by doing this you make it easier for someone else to come along and start translating for you.

The setup

Now you're localizing your code. Great! What's next? Officially, nothing. If you never do anything else, your code will always emit the messages as written. You can ship it and things will work just as if you had never done any localization.

But what's the fun in that? Let's set things up so that translation catalogs will be built and distributed once they're written. Add these lines to your dist.ini:

[ShareDir]
[LocaleTextDomain]

There are actually quite a few attributes you can set here to tell the plugin where to find language files and where to put them. For example, if you used a domain different from your distribution name, e.g.,

use Locale::TextDomain 'com.example.My-GreatApp';

Then you would need to set the textdomain attribute so that the LocaleTextDomain does the right thing with the language files:

[LocaleTextDomain]
textdomain = com.example.My-GreatApp

Consult the LocaleTextDomain configuration docs for details on all available attributes.

(Special note until this Locale::TextDomain patch is merged: set the share_dir attribute to lib instead of the default value, share. If you use Module::Build, you will also need a subclass to do the right thing with the catalog files; see "Installation" in Dist::Zilla::Plugin::LocaleTextDomain for details.)

What does this do including the plugin do? Mostly nothing. You might see this line from dzil build, though:

[LocaleTextDomain] Skipping language compilation: directory po does not exist

Now at least you know it was looking for something to compile for distribution. Let's give it something to find.

Initialize languages

To add translation files, use the msg-init command:

> dzil msg-init de
Created po/de.po.

At this point, the gettext utilities will need to be installed and visible in your path, or else you'll get errors.

This command scans all of the Perl modules gathered by Dist::Zilla and initializes a German translation file, named po/de.po. This file is now ready for your German-speaking public to start translating. Check it into your source code repository so they can find it. Create as many language files as you like:

> dzil msg-init fr ja.JIS en_US.UTF-8
Created po/fr.po.
Created po/ja.po.
Created po/en_US.po.

As you can see, each language results in the generation of the appropriate file in the po directory, sans encoding (i.e., no .UTF-8 in the en_US file name).

Now let your translators go wild with all the languages they speak, as well as the regional dialects. (Don't forget to colour your code with en_UK translations!)

Once you have translations and they're committed to your repository, when you build your distribution, the language files will automatically be compiled into binary catalogs. You'll see this line output from dzil build:

[LocaleTextDomain] Compiling language files in po
po/fr.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/ja.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/en_US.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.

You'll then find the catalogs in the shared directory of your distribution:

> find My-GreatApp-0.01/share -type f
My-GreatApp-0.01/share/LocaleData/de/LC_MESSAGES/App-Sqitch.mo
My-GreatApp-0.01/share/LocaleData/en_US/LC_MESSAGES/App-Sqitch.mo
My-GreatApp-0.01/share/LocaleData/ja/LC_MESSAGES/App-Sqitch.mo

These binary catalogs will be installed as part of the distribution just where Locale::TextDomain can find them.

Here's an optional tweak: add this line to your MANIFEST.SKIP:

^po/

This prevents the po directory and its contents from being included in the distribution. Sure, you can include them if you like, but they're not required for the running of your app; the generated binary catalog files are all you need. Might as well leave out the translation files.

Mergers and acquisitions

You've got translation files and helpful translators given them a workover. What happens when you change your code, add new messages, or modify existing ones? The translation files need to periodically be updated with those changes, so that your translators can deal with them. We got you covered with the msg-merge command:

> dzil msg-merge
extracting gettext strings
Merging gettext strings into po/de.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po

This will scan your module files again and update all of the translation files with any changes. Old messages will be commented-out and new ones added. Just commit the changes to your repository and notify the translation army that they've got more work to do.

If for some reason you need to update only a subset of language files, you can simply list them on the command-line:

> dzil msg-merge po/de.po po/en_US.po
Merging gettext strings into po/de.po
Merging gettext strings into po/en_US.po
What's the scan, man

Both the msg-init and msg-merge commands depend on a translation template file to create and merge language files. Thus far, this has been invisible: they will create a temporary template file to do their work, and then delete it when they're done.

However, it's common to also store the template file in your repository and to manage it directly, rather than implicitly. If you'd like to do this, the msg-scan command will scan the Perl module files gathered by Dist::Zilla and make it for you:

> dzil msg-scan
gettext strings into po/My-GreatApp.pot

The resulting .pot file will then be used by msg-init and msg-merge rather than scanning your code all over again. This actually then makes msg-merge a two-step process: You need to update the template before merging. Updating the template is done by exactly the same command, msg-scan:

> dzil msg-scan
extracting gettext strings into po/My-GreatApp.pot
> dzil msg-merge
Merging gettext strings into po/de.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po
Ship It!

And that's all there is to it. Go forth and localize and internationalize your Perl apps!

Acknowledgements

My thanks to Ricardo Signes for invaluable help plugging in to Dist::Zilla, to Guido Flohr for providing feedback on this tutorial and being open to my pull requests, to David Golden for I/O capturing help, and to Jérôme Quelin for his patience as I wrote code to do the same thing as Dist::Zilla::Plugin::LocaleMsgfmt without ever noticing that it already existed.

Sqitch Symbolism

It has been a while since I last blogged about Sqitch. The silence is in part due to the fact that I’ve moved from full-time Sqitch development to actually putting it to use building databases at work. This is exciting, because it needs the real-world experience to grow up.

That’s not to say that nothing has happened with Sqitch. I’ve just released v0.931 which includes a bunch of improvement since I wrote about v0.90. First a couple of the minor things:

  • Sqitch now checks dependencies before reverting, and dies if they would be broken by the revert. This change, introduced in v0.91, required that the dependencies be moved to their own table, so if you’ve been messing with an earlier version of Sqitch, you’ll have to rebuild the database. Sorry about that.
  • I fixed a bunch of Windows-related issues, including finding the current user’s full name, correctly setting the locale for displaying dates and times, executing shell commands, and passing tests. The awesome ActiveState PPM Index has been invaluable in tracking these issues down.
  • Added the bundle command. All it does is copy your project configuration file, plan, and deploy, revert, and test scripts to a directory you identify. The purpose is to be able to export the project into a directory structure suitable for distribution in a tarball, RPM, or whatever. That my not sound incredibly useful, since copying files is no big deal. However, the long-term plan is to add VCS support to Sqitch, which would entail fetching scripts from various places in VCS history. At that point, it will be essential to use bundle to do the export, so that scripts are properly exported from the VCS history. That said, I’m actually using it already to build RPMs. Useful already!

Symbolic References

And now the more immediately useful changes. First, I added new symbolic tags, @FIRST and @LAST. These represent the first and last changes currently deployed to a database, respectively. These complement the existing @ROOT and @HEAD symbolic tags, which represent the first and last changes listed in the plan. The distinction is important: The change plan vs actual deployments to a database.

The addition of @FIRST and @LAST may not sounds like much, but there’s more.

I also added forward and reverse change reference modifiers ^ and ~. The basic idea was stolen from Git Revisions, though the semantics vary. For Sqitch changes, ^ appended to a name or tag means “the change before this change,” while ~ means “the change after this change”. I find ^ most useful when doing development, where I’m constantly deploying and reverting a change as I work. Here’s how I do that revert:

sqitch revert --to @LAST^

That means “revert to the change before the last change”, or simply “revert the last change”. If I want to revert two changes, I use two ^s:

sqitch revert --to @LAST^^

To go back any further, I need to use an integer with the ^. Here’s how to revert the last four changes deployed to the database:

sqitch revert --to @LAST^4

The cool thing about this is that I don’t have to remember the name of the change to revert, as was previously required. And of course, if I just wanted to deploy two changes since the last deployment, I would use ~~:

sqitch deploy --to @LAST~~

Nice, right? One thing to bear in mind, as I was reminded while giving a Sqitch presentation to PDXPUG: Changes are deployed in a sequence. You can think of them as a linked list. So this command:

sqitch revert @LAST^^

Does not mean to revert the second-to-last change, leaving the two after it. It will revert the last change and the penultimate change. This is why I actually encourage the use of the --to option, to emphasize that you’re deploying or reverting all changes to the named point, rather than deploying or reverting the named point in isolation. Sqitch simply doesn’t do that.

Internationalize Me

One more change. With today’s release of v0.931, there is now proper internationalization support in Sqitch. The code has been localized for a long time, but there was no infrastructure for internationalizing. Now there is, and I’ve stubbed out files for translating Sqitch messages into French and German. Adding others is easy.

If you’re interested in tranlating Sqitch’s messages (only 163 of them, should be quick!), just fork Sqitch, juice up your favorite gettext editor, and start editing. Let me know if you need a language file generated; I’ve built the tools to do it easily with dzil, but haven’t released them yet. Look for a post about that later in the week.

Presentation

Oh, and that PDXPUG presentation? Here are the slides. Enjoy!

Sane SQL Change Management with Sqitch from David E. Wheeler

Medium in the Large

I’ve been reading novels the last few months, taking advantage of our time in Europe to just veg out on epic fantasy when not working or doing family stuff. A such, I had less time to clear out my Instapaper queue or to read The New Yorker. But now I’ve finished the series and turned back to The New Yorker, downloading the latest issue to my iPad the other night and greedily reading the back story on the Clinton-Obama reconciliation. I was immediately reminded of all the reasons why The The New Yorker app sucks. It has not improved at all. This naturally got me thinking again about the future of publishing and looking around for new ideas.

Medium, the new publishing platform from Obvious Corporation (whose founders were the creative force behind Blogger and Twitter), made a bit of a splash last month with its soft launch. In his post introducing the service, Ev Williams writes:

So, we’re re-imagining publishing in an attempt to make an evolutionary leap, based on everything we’ve learned in the last 13 years and the needs of today’s world.

An evolutionary approach is perhaps for the best, given that recent attempts at revolution have not proved to be a panacea. (But then maybe there are other reasons why Newsstand has not “saved the publishing industry.”) There is little to see on Medium as yet, but the idea seems simple enough: users post to “collections” of content, which are defined by topical and visual themes. Some collections are open to contributions from anyone, while others are managed by individual users. Ev writes, “Collections give people context and structure to publish their own stories, photos, and ideas.”

Collections Agency

The collections idea sounds nice, with its emphasis on quality writing and editing. I have no inside information on Medium whatsoever, but reading the tea leaves a bit, I suspect that collections are the key idea. While traditional blogging services such as Blogger and Tumblr empower the individual writer to post whatever she wants whenever she wants, Medium’s collections aim to empower editors.

Since the publishing industry is having a hell of a time trying to make the internet look like a magazine, it’s natural that some of the folks behind online publishing would be thinking about how to bring the best qualities of periodical publishing to the internet. Two of the most important of those qualities are editorial control (not coincidentally emphasized Dustin Curtis’s Svbtle Network) and topicality. Think about it. If you’re interested in American politics, you might read National Review or The Nation. For long-form journalism and in-depth reporting, there’s The New Yorker and The Atlantic. Want vegetarian-friendly recipes? Vegegarian Times or Cooking Light. Each of these magazines present themselves as well-edited authorities on particular topics, and most have an identifiable design. My guess is that Medium’s collections attempt to emulate these qualities.

Let’s say you’re interested in writing about baseball. You get a Medium account and start publishing pieces in an open sports collection. You’re doing a good job, calling out the subtleties of the latest game that others have missed, and readers start hitting the “this is good” badge on your post and writing their own posts in response. Soon, someone who edits a respected, moderated baseball collection invites you to contribute there. You get even more exposure, because the collection’s editorial oversight ensures consistent quality. Readers pay more attention to that quality, as well as its timely delivery throughout the week or month.

After a while, you start helping out with the editing, finding other folks to to contribute and providing editorial feedback. Eventually you might create your own collection, perhaps expanding into other sports, or covering the vicissitudes of the sports apparel industry. Today you are your own Roger Angell, and tomorrow, perhaps, you’re David Remnick.

Medium provides a growth path for writers to develop their craft, to collaborate, to build editorial credibility within particular topics, and perhaps create a brand. This is more than just individual publishing, where you’re just some guy with a baseball blog. In the collaborative community of Medium, you can work with others to build something greater than any of you could on your own.

I'm just speculating here. But it might work. Think about precedents. If Twitter is what happens when IRC meets RSS, then Medium is what happens when you bring RSS to Usenet. But unlike Usenet (or Newsstand for that matter), this would not be a sea of distributed content with archipelagos of collecting meaning. It’s not federated or distributed by a protocol. No, the whole damned thing will be completely owned and controlled by Medium.

Imagine that Medium succeeds, that it displaces the magazine publishing industry with its thematic collections. It also distributes official apps for all the major platforms, gratis, so readers can follow their favorite collections from anywhere. If my admittedly wild-assed conjecture is in any way the case, I don’t think it would be too much to say that Medium aims to be the primary medium of topical publishing. The name embodies the ambition.

Medium of Exchange

Which brings me to the obvious question (pun acknowledged): What’s the business model? How will Medium make money? Perhaps, as with Twitter (and Tumblr?), the idea is to get lots of users first, and then sell those users to advertisers at some later date. Imagine you’re a collection editor, who has spent two years building up a readership and a stable of writers on Medium, keeping your collection looking sharp and clean, and then Medium decides to dump ads into your collection. Maybe they would revenue share with you. Or maybe you would pack up and go somewhere else where your editorial integrity was better respected.

Alternately, Medium could sell “professional accounts” to collection editors. This might have some appeal to the next generation of publishers, as it would be much cheaper than starting a paper magazine. But how many of these folks would there be, really? A tech startup needs to show something like a 10x return on investment, and I find it hard to believe that a $50/year (or even $500/year) professional account for a few thousand editors would bring in near enough revenue. And besides, don’t editors want to get paid for their time and effort, not pay for the privilege of contributing to Medium’s collections?

And the writers, too! To attract the best writers—the folks who know how to do research, think though a topic, and write thoughtful posts on the matter—you need cash. So, better than charging publishers would be some way to make great quality content pay. Money for the folks who develop popular and widely-read collections. Money for the writers who contribute to them. Again, advertising might work, but at the expense, perhaps, of editorial integrity. Maybe that doesn’t matter, but it would be a non-starter if it didn’t happen soon. You want to get good people? Show them the money, right up front.

I have a better idea how Medium could generate revenue leading to profits while empowering its writers and editors to potentially make a living producing great content. It’s simple: bring the App Store model to publishing.

Here’s how it works. If you buy into the Medium ecosystem, agree to its terms and guidelines and play by its rules, then you, as publisher, would have an array of choices as to how to distribute your collection. You can have ads in your collections, provided by Medium’s ad network, and both you and Medium take a cut of the revenues. Alternatively, you can set a price for subscriptions to your collections, and Medium will collect payments, taking a cut. You could also decide to just distribute your collections freely, with no ads. Maybe there would be some way to create cross-promotions to your paid collections.

The point is, just as App Store developers can choose to sell their apps, release them with ads or in-app purchase options, or make them free, perhaps Medium editors could choose whether their collections include ads, require subscription payments, or are free. It’s the App Store Publishing model.

Again, this is purely speculation, but I'd love it if something like this was in the works, because I can imagine no other viable model that doesn’t sell users to advertisers and turn editors into digital sharecroppers1. And god knows the traditional topical publishers are in deep trouble. If Medium empowers the people who edit great collections to get revenue the way they want, then everyone wins.

And if Medium does not plan to do this, well then someone should.


  1. Thanks to Duncan Davidson for this term, which perfectly encapsulates the problem in a single word. ↩

Sqitch: Depend On It!

Sqitch v0.90 dropped last week (updated to v0.902 today). The focus of this release of the “sane database change management” app was cross-project dependencies. Jim Nasby first put the idea for this feature into my head, and then I discovered that our first Sqitch-using project at work needs it, so blame them.

Depend On It

Earlier versions of Sqitch allow you to declare dependencies on other changes within a project. For example, if your project has a change named users_table, you can create a new change that requires it like so:

sqitch add --requires users_table widgets_table

As of v0.90, you can also require a change from different Sqitch project. Say that you have a project that installs a bunch of utility functions, and that you want to require it in your current Sqitch project. To do so, just prepend the project name to the name of the change you want to require:

sqitch add --requires utils:uuidgen widgets_table

When you go to deploy your project, Sqitch will not deploy the widgets_table change if the uuidgen change from the utils project is not already present.

Sqitch discriminates projects simply by name, as required since v0.80. When you initialize a new Sqitch project, you have to declare its name, too:

siqtch init --name utils

I’ve wondered a bit as to whether that was sufficient. Within a small organization, it’s probably no big deal, as there is unlikely to be much namespace overlap. But thinking longer term, I could foresee folks developing and distributing interdependent open-source Sqitch projects. And without a central name registry, conflicts are likely to pop up. To a certain degree, the risks can be minimized by comparing project URIs, but that works only for project registration, not dependency specification. But perhaps it’s enough. Thoughts?

It’s All Relative

Next up I plan to implement the SQLite support and the bundle command. But first, I want to support relative change specifications. Changes have an order, both in the plan and as deployed to the database. I want to be able to specify relative changes, kind of like you can specify relative commits in Git. So, if you want to revert just one change, you could say something like this:

sqitch revert HEAD^

And that would revert one change. I also think the ability to specify later changes might be useful. So if you wanted to deploy to the change after change foo, you could say something like:

sqitch deploy foo+

You can use ^ or + any number of times, or specify numbers for them. These would both revert three changes:

sqitch revert HEAD^^^
sqitch revert HEAD^3

I like ^ because of its use in Git, although perhaps ~ is more appropriate (Sqitch does not have concepts of branching or multiple parent changes). But + is not a great complement. Maybe - and + would be better, if a bit less familiar? Or maybe there is a better complement to ^ or ~ I haven’t thought of? (I don’t want to use characters that have special meaning in the shell, like <>, if I can avoid it.) Suggestions greatly appreciated.

Oops

A discovered a bug late in the development of v0.90. Well, not so much a bug as an oversight: Sqitch does not validate dependencies in the revert command. That means it’s possible to revert a change without error when some other change depends on it. Oops. Alas, fixing this issue is not exactly trivial, but it’s something I will have to give attention to soon. While I’m at it, I will probably make dependency failures fail earlier. Watch for those fixes soon.

And You?

Still would love help getting a dzil plugin to build Local::TextDomain l01n files. I suspect this would take a knowlegeable Dist::Zilla user a couple of hours to do. (And thanks to @notbenh and @RJBS for getting Sqitch on Dist::Zilla!) And if anyone really wanted to dig into Sqitch, Implementing a bundle command would be a great place to start.

Or just give it a try! You can install it from CPAN with cpan App::Sqitch. Read the tutorial for an overview of what Sqitch is and how it works. Thanks!