Just a Theory

By David E. Wheeler

Posts about Perl

Less Tedium, More Transactions

A frequent pattern when writing database-backed applications with the DBI is to connect to the database and cache the database handle somewhere. A simplified example:

package MyApp::DB;
use DBI;
use strict;

my $DBH = DBI->connect('DBI:SQLite:dbname=myapp.db', '', '', {
    PrintError     => 0,
    RaiseError     => 1,
    AutoCommit     => 1,
    sqlite_unicode => 1,
});

sub dbh { $DBH }

Just load MyApp::DB anywhere in your app and, whenever you want to use the database, grab the handle from MyApp::DB->dbh.

This pattern is common enough that Apache::DBI was created to magically do it for you on mod_perl, and the DBI added connect_cached() so that it could cache connections itself. However, each of these solutions has some issues:

  • What happens when your program forks? Apache::DBI handles this condition, but neither the home-grown solution nor connect_cached() does, and identifying a forked database handle as the source of a crash is notoriously unintuitive.

  • What happens when your program spawns a new thread? Sure, some DBI drivers might still work, but others might not. Best to treat new threads the same as new processes and reconnect to the database. Neither Apache::DBI nor connect_cached() deal with threading issues, and of course neither does the custom solution.

  • Apache::DBI is magical and mysterious; but the magic comes with serious side-effects. Apache::DBI plugs itself right into the DBI itself, replacing its connection methods (which is why load ordering is so important to use it properly). Knowledge of Apache::DBI is actually built right into the DBI itself, meaning that the magic runs deep and both ways. These are pretty serious violations of encapsulation in both directions.

  • connect_cached() has a bit of its own unfortunate magic. Every call to connect_cached() resets the connection attributes. So if you have code in one place that starts a transaction, and code elsewhere but executed in the same scope that also fetches a connect_cached() handle, the transaction will be committed then and there, even though the code that started it might not be done with it. One can work around this issue via callbacks, but it’s a bit of a hack.

Using a custom caching solution avoids the magic, but getting fork- and thread-safety right is surprisingly non-trivial, in the same way that doing your own exception-handling is surprisingly non-trivial.

Enter DBIx::Connector, a module that efficiently manages your database connections in a thread- and fork-safe manner so that you don’t have to. If you already have a custom solution, switching to DBIx::Connector is easy. Here’s a revision of MyApp::DB that uses it:

package MyApp::DB;
use DBIx::Connector;
use strict;

my $CONN = DBIx::Connector->new('DBI:SQLite:dbname=myapp.db', '', '', {
    PrintError     => 0,
    RaiseError     => 1,
    AutoCommit     => 1,
    sqlite_unicode => 1,
});

sub conn { $CONN }
sub dbh  { $CONN->dbh }

Simple, right? You pass exactly the same parameters to DBIx::Connector->new that you passed to DBI->connect. The DBIx::Connector object simply proxies the DBI. You want the database handle itself, just call dbh() and proceed as usual, confident that if your app forks or spawns new threads, your database handle will be safe. Why? Because DBIx::Connector detects such changes, and re-connects to the database, being sure to properly dispose of the original connection. But really, you don’t have to worry about that, because DBIx::Connector does the worrying for you.

Execution Methods

DBIx::Connector is very good at eliminating the technical friction of process and thread management. But that’s not all there is to it.

Although you can just fetch the DBI handle from your DBIx::Connector object and go, a better approach is to use its execution methods. These methods scope execution to a code block. Here’s an example using run():

$conn->run(sub {
    shift->do($query);
});

That may not seem so useful, and is more to type, but the real power comes from the txn() method. txn() executes the code block within the scope of a transaction. So where you normally would write something like this:

use Try::Tiny;
use MyApp::DBH;

my $dbh = MyApp::DBH->dbh;  
try {
    $dbh->begin_work;
    # do stuff...
    $dbh->commit;
} catch {
    $dbh->rollback;
    die $_;
};

The try() method scopes the transaction for you, so that you can just focus on the work to be done and transaction management:

use Try::Tiny;
use MyApp::DBH;

try {
    MyApp::DBH->conn->txn(sub {
        # do stuff...
    }
} catch {
    die $_;
};

There’s no need to call begin_work, commit, or rollback, as txn() does all that for you. Furthermore, it improves the maintainability of your code, as the scope of the transaction is much more clearly defined as the scope of the code block. Additional calls to txn() or run() within that block are harmless, and just become part of the same transaction:

MyApp::DBH->conn->txn(sub {
    my $dbh = shift;
    $dbh->do($_) for @queries;
    $conn->run(sub {
        shift->do($expensive_query);
        $conn->txn(sub {
            shift->do($another_expensive_query);
        });
    });
});

Even cooler is the svp() method, which scopes execution of a code block to a savepoint, or subtransaction, if your database supports it (all of the drivers currently supported by DBIx::Connector do). For example, this transaction will commit the insertion of values 1 and 3, but not 2:

MyApp::DBH->conn->txn(sub {
    my $dbh = shift;
    $dbh->do('INSERT INTO table1 VALUES (1)');
    try {
        $conn->svp(sub {
            shift->do('INSERT INTO table1 VALUES (2)');
            die 'OMGWTF?';
        });
    } catch {
           warn "Savepoint failed: $_\n";
    };
    $dbh->do('INSERT INTO table1 VALUES (3)');
});

Connection Management

The recommended pattern for using a cached DBI handle is to call ping() when you fetch it from the cache, and reconnect if it returns false. Apache::DBI and connect_cached() do this for you, and so does DBIx::Connector. However, in a busy application ping() can get called a lot. We recently did some query analysis for a client, and found that 1% of the database execution time was taken up with ping() calls. That may not sound like a lot, but looking at the numbers, it amounted to 100K pings per hour. For something that just returns true 99.9*% of the time, it seems a bit silly.

Enter DBIx::Connector connection modes. The default mode is “ping”, as that’s what most installations are accustomed to. A second mode is “no_ping”, which simply disables pings. I don’t recommend that.

A better solution is to use “fixup” mode. This mode doesn’t normally call ping() either. However, if a code block passed to run() or txn() throws an exception, then DBIx::Connector will call ping(). If it returns false, DBIx::Connector reconnects to the database and executes the code block again. This configuration should handle some common situations, such as idle timeouts, without bothering you about it.

Specify “fixup” mode whenever you call an execution method, like so:

$conn->txn(fixup => sub { ... });

You can also specify that your connection always use “fixup” via the fixup() accessor. Modify the caching library like so (line 8 is new):

my $CONN = DBIx::Connector->new('DBI:SQLite:dbname=myapp.db', '', '', {
    PrintError     => 0,
    RaiseError     => 1,
    AutoCommit     => 1,
    sqlite_unicode => 1,
});

$CONN->mode('fixup'); # ⬅ ⬅ ⬅  enter fixup mode!

sub conn { $CONN }
sub dbh  { $CONN->dbh }

However, you must be more careful with fixup mode than with ping mode, because a code block can be executed twice. So you must be sure to write it such that there are no side effects to multiple executions. Don’t do this, for example:

my $count = 0;
$conn->txn(fixup => sub {
    shift->do('INSERT INTO foo (count) VALUES(?)', undef, ++$count);
});
say $count; # may be 1 or 2

Will it insert a value of 1 or 2? It’s much safer to remove non-transactional code from the block, like so:

my $count = 0;
++$count;
$conn->txn(fixup => sub {
    shift->do('INSERT INTO foo (count) VALUES(?)', undef, $count);
});
say $count; # can only be 1

An even trickier pattern to watch out for is something like this:

my $user = 'rjbs';
$conn->run(fixup => sub {
    my $dbh = shift;
    $dbh->do('INSERT INTO users (nick) VALUES (?)', undef, $user);

    # Do some other stuff...

    $dbh->do('INSERT INTO log (msg) VALUES (?)', undef, 'Created user');
});

If the database disconnects between the first and second calls to do, and DBIx::Connector manages to re-connect and run the block again, you might get a unique key violation on the first call to do. This is because we’ve used the run() method. In the fist execution of the block, user “rjbs” was inserted and autocommitted. On the second call, user “rjbs” is already there, and because it’s a username, we get a unique key violation.

The rule of thumb here is to use run() only for database reads, and to use txn() (and svp()) for writes. txn() will ensure that the transaction is rolled back, so the second execution of the code block will be side-effect-free.

Pedigree

DBIx::Connector is derived from patterns originally implemented for DBIx::Class, though it’s nearly all original code. The upside for those of us who don’t use ORMs is that we get this independent piece of ORM-like behavior without its ORMishness. So if you’re a database geek like me, DBIx::Connector is a great way to reduce technical friction without buying into the whole idea of an ORM.

As it turns out, DBIx::Connector is good not just for straight-to-database users, but also for ORMs. Both DBIx::Class and Rose::DB have plans to replace their own caching and transaction-handling implementations with DBIx::Connector under the hood. That will be great for everyone, as the problems will all be solved in this one place.

This post originally appeared on the Perl Advent Calendar 2011.

DBIx::Connector and Serializable Snapshot Isolation

I was at Postgres Open week before last. This was a great conference, very welcoming atmosphere and lots of great talks. One of the more significant, for me, was the session on serializable transactions by Kevin Grittner, who developed SSI for PostgreSQL 9.1. I hadn’t paid much attention to this feature before now, but it became clear to me, during the talk, that it’s time.

So what is SSI? Well, serializable transactions are almost certainly how you think of transactions already. Here’s how Kevin describes them:

True serializable transactions can simplify software development. Because any transaction which will do the right thing if it is the only transaction running will also do the right thing in any mix of serializable transactions, the programmer need not understand and guard against all possible conflicts. If this feature is used consistently, there is no need to ever take an explicit lock or SELECT FOR UPDATE/SHARE.

This is, in fact, generally how I’ve thought about transactions. But I’ve certainly run into cases where it wasn’t true. Back in 2006, I wrote an article on managing many-to-many relationships with PL/pgSQL which demonstrated a race condition one might commonly find when using an ORM. The solution I offered was to always use a PL/pgSQL function that does the work, and that function executes a SELECT...FOR UPDATE statement to overcome the race condition. This creates a lock that forces conflicting transactions to be performed serially.

Naturally, this is something one would rather not have to think about. Hence SSI. When you identify a transaction as serializable, it will be executed in a truly serializable fashion. So I could actually do away with the SELECT...FOR UPDATE workaround — not to mention any other race conditions I might have missed — simply by telling PostgreSQL to enforce transaction isolation. This essentially eliminates the possibility of unexpected side-effects.

This comes at a cost, however. Not in terms of performance so much, since the SSI implementation uses some fancy, recently-developed algorithms to keep things efficient. (Kevin tells me via IRC: “Usually the rollback and retry work is the bulk of the additional cost in an SSI load, in my testing so far. A synthetic load to really stress the LW locking, with a fully-cached database doing short read-only transactions will have no serialization failures, but can run up some CPU time in LW lock contention.”) No, the cost is actually in increased chance of transaction rollback. Because SSI will catch more transaction conflicts than the traditional “read committed” isolation level, frameworks that expect to work with SSI need to be prepared to handle more transaction failures. From the fine manual:

The Serializable isolation level provides the strictest transaction isolation. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. However, like the Repeatable Read level, applications using this level must be prepared to retry transactions due to serialization failures.

And that brings me to DBIx::Connector, my Perl module for safe connection and transaction management. It currently has no such retry smarts built into it. The feature closest to that is the “fixup” connection mode, wherein if a execution of a code block fails due to a connection failure, DBIx::Connector will re-connect to the database and execute the code reference again.

I think I should extend DBIx::Connector to take isolation failures and deadlocks into account. That is, fixup mode would retry a code block not only on connection failure but also on serialization failure (SQLSTATE 40001) and deadlocks (SQLSTATE 40P01). I would also add a new attribute, retries, to specify the number of times to retry such execution, with a default of three (which likely will cover the vast majority of cases). This has actually been an oft-requested feature, and I’m glad to have a new reason to add it.

There are a few design issues to overcome, however:

  • Fixup mode is supported not just by txn(), which scopes the execution of a code reference to a single transaction, but also run(), which does no transaction handling. Should the new retry support be added there, too? I could see it either way (a single SQL statement executed in run() is implicitly transaction-scoped).
  • Fixup mode is also supported by svp(), which scopes the execution of a code reference to a savepoint (a.k.a. a subtransaction). Should the rollback and retry be supported there, too, or would the whole transaction have to be retried? I’m thinking the latter, since that’s currently the behavior for connection failures.
  • Given these issues, will it make more sense to perhaps create a new mode? Maybe it would be supported only by txn().

This is do-able, will likely just take some experimentation to figure it out and settle on the appropriate API. I’ll need to find the tuits for that soon.

In the meantime, given currently in-progress changes, I’ve just released a new version of DBIx::Connector with a single change: All uses of the deprecated catch syntax now throw warnings. The previous version threw warnings only the first time the syntax was used in a particular context, to keep error logs from getting clogged up. Hopefully most folks have changed their code in the two months since the previous release and switched to Try::Tiny or some other model for exception handling. The catch syntax will be completely removed in the next release of DBIx::Connector, likely around the end of the year. Hopefully the new SSI-aware retry functionality will have been integrated by then, too.

In a future post I’ll likely chew over whether or not to add an API to set the transaction isolation level within a call to txn() and friends.

Looking for the comments? Try the old layout.

Up for Adoption: SVN::Notify

I’ve kept my various Perl modules in a Subversion server run by my Bricolage support company, Kineticode, for many years. However, I’m having to shut down the server I’ve used for all my services, including Subversion, so I’ve moved them all to GitHub. As such, I no longer use Subversion in my day-to-day work.

It no longer seems appropriate that I maintain SVN::Notify. This has probably been my most popular modules, and I know that it’s used a lot. It’s also relatively stable, with few bug reports or complaints. Nevertheless, there certainly could be some things that folks want to add, like TLS support, I18N, and inline CSS.

Therefore, SVN::Notify is formally up for adoption. If you’re a Subversion users, it’s a great tool. Just look at this sample output. If you’d like to take over maintenance, make it even better, please get in touch. Leave a comment on this post, or @theory me on Twitter, or send an email.

PS: Would love it if someone also could take over activitymail, the CVS notification script from which SVN::Notify was derived — and which I have even less right to maintain, given that I haven’t used CVS in years.

Looking for the comments? Try the old layout.

More about…

DBIx::Connector Exception Handling Design

In response to a bug report, I removed the documentation suggesting that one use the catch function exported by Try::Tiny to specify an exception-handling function to the DBIx::Connector execution methods. When I wrote those docs, Try::Tiny’s catch method just returned the subroutine. It was later changed to return an object, and that didn’t work very well. It seemed a much better idea not to depend on an external function that could change its behavior when there is no direct dependency on Try::Tiny in DBIx::Connector. I removed that documentation in 0.43. So instead of this:

$conn->run(fixup => sub {
    ...
}, catch {
    ...
});

It now recommends this:

$conn->run(fixup => sub {
    ...
}, catch => sub {
    ...
});

Which frankly is better balanced anyway.

But in discussion with Mark Lawrence in the ticket, it has become clear that there’s a bit of a design problem with this approach. And that problem is that there is no try keyword, only catch. The fixup in the above example does not try, but the inclusion of the catch implicitly makes it behave like try. That also means that if you use the default mode (which can be set via the mode method), then there will usually be no leading keyword, either. So we get something like this:

$conn->run(sub {
    ...
}, catch => sub {
    ...
});

So it starts with a sub {} and no fixup keyword, but there is a catch keyword, which implicitly wraps that first sub {} in a try-like context. And aesthetically, it’s unbalanced.

So I’m trying to decide what to do about these facts:

  • The catch implicitly makes the first sub be wrapped in a try-type context but without a try-like keyword.
  • If one specifies no mode for the first sub but has a catch, then it looks unbalanced.

At one level, I’m beginning to think that it was a mistake to add the exception-handling code at all. Really, that should be the domain of another module like Try::Tiny or, better, the language. In that case, the example would become:

use Try::Tiny;
try {
    $conn->run(sub {
        ...
    });
} catch {
    ....
}

And maybe that really should be the recommended approach. It seems silly to have replicated most of Try::Tiny inside DBIx::Connector just to cut down on the number of anonymous subs and indentation levels. The latter can be got round with some semi-hinky nesting:

try { $conn->run(sub {
    ...
}) } catch {
    ...
}

Kind of ugly, though. The whole reason the catch stuff was added to DBIx::Connector was to make it all nice and integrated (as discussed here). But perhaps it was not a valid tradeoff. I’m not sure.

So I’m trying to decide how to solve these problems. The options as I see them are:

  1. Add another keyword to use before the first sub that means “the default mode”. I’m not keen on the word “default”, but it would look something like this:

    $conn->run(default => sub {
        ...
    }, catch => sub {
        ...
    });

    This would provide the needed balance, but the catch would still implicitly execute the first sub in a try context. Which isn’t a great idea.

  2. Add a try keyword. So then one could do this:

    $conn->run(try => sub {
        ...
    }, catch => sub {
        ...
    });

    This makes it explicit that the first sub executes in a try context. I’d also have to add companion try_fixup, try_ping, and try_no_ping keywords. Which are ugly. And furthermore, if there was no try keyword, would a catch be ignored? That’s what would be expected, but it changes the current behavior.

  3. Deprecate the try/catch stuff in DBIx::Connector and eventually remove it. This would simplify the code and leave the responsibility for exception handling to other modules where it’s more appropriate. But it would also be at the expense of formatting; it’s just a little less aesthetically pleasing to have the try/catch stuff outside the method calls. But maybe it’s just more appropriate.

I’m leaning toward #3, but perhaps might do #1 anyway, as it’d be nice to be more explicit and one would get the benefit of the balance with catch blocks for as long as they’re retained. But I’m not sure yet. I want your feedback on this. How do you want to use exception-handling with DBIx::Connector? Leave me a comment here or on the ticket.

Looking for the comments? Try the old layout.

Encoding is a Headache

I have to spend way too much of my programming time worrying about character encodings. Take my latest module, Text::Markup for example. The purpose of the module is very simple: give in the name of a file, and it will figure out the markup it uses (HTML, Markdown, Textile, whatever) and return a string containing the HTML generated from the file. Simple, right?

But, hang on. Should the HTML it returns be decoded to Perl’s internal form? I’m thinking not, because the HTML itself might declare the encoding, either in a XML declaration or via something like

<meta http-equiv="Content-type" content="text/html;charset=Big5" />

And as you can see, it’s not UTF-8. So decoded it would be lying. So it should be encoded, right? Parsers like XML::LibXML::Parser are smart enough to see such declarations and decode as appropriate.

But wait a minute! Some markup languages, like Markdown, don’t have XML declarations or headers. They’re HTML fragments. So there’s no wait to tell the encoding of the resulting HTML unless it’s decoded. So maybe it should be decoded. Or perhaps it should be decoded, and then given an XML declaration that declares the encoding as UTF-8 and encoded it as UTF-8 before returning it.

But, hold the phone! When reading in a markup file, should it be decoded before it’s passed to the parser? Does Text::Markdown know or care about encodings? And if it should be decoded, what encoding should one assume the source file uses? Unless it uses a BOM, how do you know what its encoding is?

Text::Markup is a dead simple idea, but virtually all of my time is going into thinking about this stuff. It drives me nuts. When will the world cease to be this way?

Oh, and you have answers to any of these questions, please do feel free to leave a comment. I hate having to spend so much time on this, but I’d much rather do so and get things right (or close to right) than wrong.

Looking for the comments? Try the old layout.

More about…

Serious Exception-Handling Bug Fixed in DBIx::Connector 0.42

I’ve just released DBIx::Connector 0.42 to CPAN. This release fixes a serious bug with catch blocks. In short, if you threw an exception from inside a catch block, it would not be detectable from outside. For example, given this code:

eval {
    $conn->run(sub { die 'WTF' }, catch => sub { die 'OMG!' });
};
if (my $err = $@) {
    say "We got an error: $@\n";
}

With DBIx::Connector 0.41 and lower, the if block would never be called, because even though the catch block threw an exception, $@ was not set. In other words, the exception would not be propagated to its caller. This could be terribly annoying, as you can imagine. I was being a bit too clever about localizing $@, with the scope much too broad. 0.42 uses a much tighter scope to localize $@, so now it should propagate properly everywhere.

So if you’re using DBIx::Connector catch blocks, please upgrade ASAP. Sorry for the hassle.

Looking for the comments? Try the old layout.

Adding a Git SHA1 to your Xcode Project

I found a decent Perl script for adding a Git SHA1 to an Xcode project last week. However, since by day I’m actually a Perl hacker, I couldn’t help finessing it a bit. Here are the steps to add it to your project:

  1. Open your project settings (Project –> Edit Project Settings) and in the “Build” tab, make sure that “Info.plist Output Encoding” is set to “xml”. I lost about an hour of fiddling before I realized that my plist file was binary, which of course couldn’t really be parsed by the simple Perl script.

  2. Edit your app’s Info.plist file. If you want the version to be the SHA1, set “Bundle Version” to “0x000”. I personally prefer to have a separate key for the SHA1, so I created the “LTGitSHA1” key and set its value to “0x000”. This is the placeholder value that will be replaced when your app runs.

  3. Right-click your target app and select Add –> New Build Phase –> New Run Script Build Phase. For the shell command, enter:

    /usr/bin/env perl -wpi -0777
    
  4. Paste this into the “Script” field:

    #!/usr/bin/env perl -wpi -0777
    # Xcode auto-versioning script for Subversion by Axel Andersson
    # Updated for git by Marcus S. Zarra and Matt Long
    # Refactored by David E. Wheeler.
    
    BEGIN {
        @ARGV = ("$ENV{BUILT_PRODUCTS_DIR}/$ENV{INFOPLIST_PATH}");
        ($sha1 = `git rev-parse --short HEAD`) =~ s/\s+$//;
    }
    
    s{0x000}{$sha1};
  5. Fetch the value in your code from your bundle, something like this:

    NSLog(
        @"SHA1: %@",
        [[[NSBundle mainBundle]infoDictionary]objectForKey:@"LTGitSHA1"]
    );

That’s it. If you want to use a placeholder other than “0x0000”, just change it on the last line of the script.

Looking for the comments? Try the old layout.

More about…

Handling Multiple Exceptions

I ran into an issue with DBIx::Connector tonight: SQLite started throwing an exception from within a call to rollback(): “DBD::SQLite::db rollback failed: cannot rollback transaction – SQL statements in progress”. This is rather annoying, as it ate the underlying exception that led to the rollback.

So I’ve added a test to DBIx::Connector that looks like this:

my $dmock = Test::MockModule->new($conn->driver);
$dmock->mock(rollback => sub { die 'Rollback WTF' });

eval { $conn->txn(sub {
    my $sth = shift->prepare("select * from t");
    die 'Transaction WTF';
}) };

ok my $err = $@, 'We should have died';
like $err, qr/Transaction WTF/, 'Should have the transaction error';

It fails as expected: the error is “Rollback WTF”. So far so good. Now the question is, how should I go about fixing it? Ideally I’d be able to access both exceptions in whatever exception handling I do. How to go about that?

I see three options. The first is that taken by Bricolage and DBIx::Class: create a new exception that combines both the transaction exception and the rollback exception into one. DBIx::Class does it like this:

$self->throw_exception(
    "Transaction aborted: ${exception}. "
    . "Rollback failed: ${rollback_exception}"
);

That’s okay as far as it goes. But what if $exception is an Exception::Class::DBI object, or some other exception object? It would get stringified and the exception handler would lose the advantages of the object. But maybe that doesn’t matter so much, since the rollback exception is kind of important to address first?

The second option is to throw a new exception object with the original exceptions as attributes. Something like (pseudo-code):

DBIx::Connector::RollbackException->new(
    txn_exception      => $exception,
    rollback_exception => $rollback_exception,
);

This has the advantage of keeping the original exception as an object, although the exception handler would have to expect this exception and go digging for it. So far in DBIx::Connector, I’ve left DBI exception construction up to the DBI and to the consumer, so I’m hesitant to add a one-off special-case exception object like this.

The third option is to use a special variable, @@, and put both exceptions into it. Something like:

@@ = ($exception, $rollback_exception);
die $rollback_exception;

This approach doesn’t require a dependency like the previous approach, but the user would still have to know to dig into @@ if they caught the rollback exception. But then I might as well have thrown a custom exception object that’s easier to interrogate than an exception string. Oh, and is it appropriate to use @@? I seem to recall seeing some discussion of this variable on the perl5-porters mail list, but it’s not documented or supported. Or something. Right?

What would you do?

Looking for the comments? Try the old layout.

More about…

Fuck Typing LWP

I’m working on a project that fetches various files from the Internet via LWP. I wanted to make sure that I was a polite user, such that my app would pay attention to Last-Modified/If-Modified-Since and ETag/If-None-Match headers. And in most contexts I also want to respect the robots.txt file on the hosts to which I’m sending requests. So I was very interested to read chromatic’s hack for this very issue. I happily implemented two classes for my app, MyApp::UA, which inherits from LWP::UserAgent::WithCache, and MyApp::UA::Robot, which inherits from MyApp::UA but changes LWP::UserAgent::WithCache to inherit from LWP::UARobot:

@LWP::UserAgent::WithCache::ISA = ('LWP::RobotUA');

So far so good, right? Well, no. What I didn’t think about, stupidly, is that by changing LWP::UserAgent::WithCache’s base class, I was doing so globally. So now both MyApp::UA and MyApp::UA::Robot were getting the LWP::RobotUA behavior. Urk.

So my work around is to use a little fuck typing to ensure that MyApp::UA::Robot has the robot behavior but MyApp::UA does not. Here’s what it looks like (BEWARE: black magic ahead!):

package MYApp::UA::Robot;

use 5.12.0;
use utf8;
use parent 'MyApp::UA';
use LWP::RobotUA;

do {
    # Import the RobotUA interface. This way we get its behavior without
    # having to change LWP::UserAgent::WithCache's inheritance.
    no strict 'refs';
    while ( my ($k, $v) = each %{'LWP::RobotUA::'} ) {
        *{$k} = *{$v}{CODE} if *{$v}{CODE} && $k ne 'new';
    }
};

sub new {
    my ($class, $app) = (shift, shift);
    # Force RobotUA configuration.
    local @LWP::UserAgent::WithCache::ISA = ('LWP::RobotUA');
    return $class->SUPER::new(
        $app,
        delay => 1, # be very nice -- max one hit per minute.
    );
}

The do block is where I do the fuck typing. It iterates over all the symbols in LWP::RobotUA, inserts a reference to all subroutines into the current package. Except for new, which I implement myself. This is so that I can keep my inheritance from MyApp::UA intact. But in order for it to properly configure the LWP::RobotUA interface, new must temporarily fool Perl into thinking that LWP::UserAgent::WithCache inherits from LWP::RobotUA.

Pure evil, right? Wait, it gets worse. I’ve also overridden LWP::RoboUA’s host_wait method, because if it’s the second request to a given host, I don’t want it to sleep (the first request is for the robots.txt, and I see no reason to sleep after that). So I had to modify the do block to skip both new and host_wait:

    while ( my ($k, $v) = each %{'LWP::RobotUA::'} ) {
        *{$k} = *{$v}{CODE} if *{$v}{CODE} && $k !~ /^(?:new|host_wait)$/;
    }

If I “override” any other LWP::RobotUA methods, I’ll need to remember to add them to that regex. Of course, since I’m not actually inheriting from LWP::RobotUA, in order to dispatch to its host_wait method, I can’t use SUPER, but must dispatch directly:

sub host_wait {
    my ($self, $netloc) = @_;
    # First visit is for robots.txt, so let it be free.
    return if !$netloc || $self->no_visits($netloc) < 2;
    $self->LWP::RobotUA::host_wait($netloc);
}

Ugly, right? Yes, I am an evil bastard. “Fuck typing” is right, yo! At least it’s all encapsulated.

This just reinforces chromatic’s message in my mind. I’d sure love to see LWP reworked to use roles!

Looking for the comments? Try the old layout.

Defend Against Programmer Mistakes?

I get email:

Hey David,

I ran in to an issue earlier today in production that, while it is an error in my code, DBIx::Connector could easily handle the issue better. Here’s the use case:

package Con;
use Moose;
sub txn {
    my ($self, $code) = @_;
    my @ret;
    warn "BEGIN EVAL\n";
    eval{ @ret = $code->() };
    warn "END EVAL\n";
    die "DIE: $@" if $@;
    return @ret;
}
package main;
my $c = Con->new();
foreach (1..2) {
    $c->txn(sub{ next; });
}

The result of this is:

BEGIN EVAL
Exiting subroutine via next at test.pl line 16.
Exiting eval via next at test.pl line 16.
Exiting subroutine via next at test.pl line 16.
BEGIN EVAL
Exiting subroutine via next at test.pl line 16.
Exiting eval via next at test.pl line 16.
Exiting subroutine via next at test.pl line 16.

This means that any code after the eval block is not executed. And, in the case of DBIx::Connector, means the transaction is not committed or rolled back, and the next call to is txn() mysteriously combined with the previous txn() call. A quick fix for this is to just add a curly brace in to the eval:

eval{ { @ret = $code->() } };

Then the results are more what we’d expect:

BEGIN EVAL
Exiting subroutine via next at test.pl line 16.
END EVAL
BEGIN EVAL
Exiting subroutine via next at test.pl line 16.
END EVAL

I’ve fixed my code to use return; instead of next;, but I think this would be a useful fix for DBIx::Connector so that it doesn’t act in such an unexpected fashion when the developer accidentally calls next.

The fix here is pretty simple, but I’m not sure I want to get into the business of defending against programmer mistakes like this in DBIx::Connector or any module.

What do you think?

Looking for the comments? Try the old layout.

More about…

Execute SQL Code on Connect

I’ve been writing a fair bit of PL/Perl for a client, and one of the things I’ve been doing is eliminating a ton of duplicate code by creating utility functions in the %_SHARED hash. This is great, as long as the code that creates those functions gets executed at the beginning of every database connection. So I put the utility generation code into a single function, called prepare_perl_utils(). It looks something like this:

CREATE OR REPLACE FUNCTION prepare_perl_utils(
) RETURNS bool LANGUAGE plperl IMMUTABLE AS $$
    # Don't bother if we've already loaded.
    return 1 if $_SHARED{escape_literal};

    $_SHARED{escape_literal} = sub {
        $_[0] =~ s/'/''/g; $_[0] =~ s/\\/\\\\/g; $_[0];
    };

    # Create other code refs in %_SHARED…
$$;

So now all I have to do is make sure that all the client’s apps execute this function as soon as they connect, so that the utilities will all be loaded up and ready to go. Here’s how I did it.

First, for the Perl app, I just took advantage of the DBI’s callbacks to execute the SQL I need when the DBI connects to the database. That link might not work just yet, as the DBI’s callbacks have only just been documented and that documentation appears only in dev releases so far. Once 1.611 drops, the link should work. At any rate, the use of callbacks I’m exploiting here has been in the DBI since 1.49, which was released in November 2005.

The approach is the same as I’ve described before: Just specify the Callbacks parameter to DBI->connect, like so:

my $dbh = DBI->connect_cached($dsn, $user, $pass, {
    PrintError     => 0,
    RaiseError     => 1,
    AutoCommit     => 1,
    Callbacks      => {
        connected => sub { shift->do('SELECT prepare_perl_utils()' },
    },
});

That’s it. The connected method is a no-op in the DBI that gets called to alert subclasses that they can do any post-connection initialization. Even without a subclass, we can take advantage of it to do our own initialization.

It was a bit trickier to make the same thing happen for the client’s Rails app. Rails, alas, provides no on-connection callbacks. So we instead have to monkey-patch Rails to do what we want. With some help from “dfr|mac” on #rubyonrails (I haven’t touched Rails in 3 years!), I got it worked down to this:

class ActiveRecord::ConnectionAdapters::PostgreSQLAdapter
    def initialize_with_perl_utils(*args)
    returning(initialize_without_perl_utils(*args)) do
        execute('SELECT prepare_perl_utils()')
    end
    end
    alias_method_chain :initialize, :perl_utils
end

Basically, we overpower the PostgreSQL adapter’s initialize method and have it call initialize_with_perl_utils before it returns. It’s a neat trick; if you’re going to practice fuck typing, alias_method_chain makes it about as clean as can be, albeit a little too magical for my tastes.

Anyway, recorded here for posterity (my blog is my other brain!).

Looking for the comments? Try the old layout.

Pod: Now with Sane Web Links

A couple months ago, RJBS and I collaborated on adding a new feature to Pod: sane URL links. For, well, ever, the case has been that to link to URLs or any other scheme: links in Pod, You had to do something like this:

For more information, consult the pgTAP documentation:
L<http://pgtap.projects.postgresql.org/documentation.html>

The reasons why you couldn’t include text in the link to server as the link text has never been really well spelled-out. Sean Burke, the most recent author of the Pod spec, had only said that the support wasn’t there “for various reasons.”

Meanwhile, I accidentally discovered that Pod::Simple has in fact supported such formats for a long time. At some point Sean added it, but didn’t update the spec. Maybe he thought it was fragile. I have no idea. But since the support was already there, and most of the other Pod tools already support it or want to, it was a simple change to make to the spec, and it was released in Perl 5.11.3 and Pod::Simple 3.11. It’s now officially a part of the spec. The above Pod can now be written as:

For more information, consult the
L<pgTAP documentation|http://pgtap.projects.postgresql.org/documentation.html>.

So much better! And to show it off, I’ve just updated all the links in SVN::Notify and released a new version. Check it out on CPAN Search. See how the links such as to “HookStart.exe” and “Windows Subversion + Apache + TortoiseSVN + SVN::Notify HOWTO” are nice links? They no longer use the URL for the link text. Contrast with the previous version.

And as of yesterday, the last piece to allow this went into place. Andy gave me maintenance of Test::Pod, and I immediately released a new version to allow the new syntax. So update your t/pod.t file to require Test::Pod 1.41, update your links, and celebrate the arrival of sane links in Pod documentation.

Looking for the comments? Try the old layout.

More about…

RFC: PostgreSQL Add-on Network

I’ve posted a plan to implement PGAN, a CPAN for PostgreSQL extensions. I’ve tried to closely follow the CPAN philosophy to come up with a plan that requires a minimum-work implementation that builds on the existing PostgreSQL tools and the examples of the CPAN and JSAN. My hope is that it’s full of JFDI! I would be very grateful for feedback and suggestions.

Looking for the comments? Try the old layout.

Quest for PostgreSQL Project Hosting

The pgTAP project is currently hosted by pgFoundry. This is an old version of GForge, and from what I understand, highly modified for the PostgreSQL project. That’s fine, except that it apparently makes it impossible for anyone to find the tuits to upgrade it to newer versions.

And it needs upgrading. One annoying thing I noticed is that the URLs for release files include an integer in them. For example, the URL to download pgTAP 0.23 is http://pgfoundry.org/frs/download.php/2511/pgtap-0.23.tar.bz2. See the “25111” there? It appears to be a primary key value or something, but is completely irrelevant for a release URL. I would much prefer that the URL be something like http://pgfoundry.org/frs/download.php/pgtap-0.23.tar.bz2 or, even better, http://pgfoundry.org/projects/pgtap/frs/pgtap-0.23.tar.bz2. But such is not the case now.

Another issue is hosting. I’ve registered pgtap.org to use for hosting the pgTAP Web site, but there is no support for pointing a hostname at a pgFoundry/GForge site.

These issues could of course be worked out if someone had the tuits to take them on, but apparently there is no one. So I’m looking to move.

The question is, where to? I could get a paid GitHub account (the pgTAP source is already on GitHub) and be able to have a pgTAP site on pgtap.org from there, so that’s a plus. And I can do file releases, too, in which case the URL format would be something like http://cloud.github.com/downloads/theory/pgtap/pgtap-0.23.tar.bz2, which isn’t ideal, but is a hell of a lot better than a URL with a sequence number in it. I could put them on the hosted site, too, in which case they’d have whatever URL I wanted them to have.

There are only two downsides I can think of to moving to GitHub:

  1. No mail list support. The pgTAP mail list has next to no traffic so far, so I’m not sure this is a big deal. I could also set up a list elsewhere, like Librelist, if I really needed one. I’d prefer to have @pgtap.org mail lists, but it’s not a big deal.

  2. I would lose whatever community presence I gain from hosting on pgFoundry. I know that when I release a Perl module to CPAN that it will be visible to lots of people in the Perl community, and automatically searchable via search.cpan.org and other tools. A CPAN release is a release to the Perl community.

    There is nothing like this for PostgreSQL. pgFoundry is the closest thing, and, frankly, nowhere near as good (pgFoundry’s search rankings have always stunk). So if I were to remove my projects from pgFoundry, how could I make them visible to the community? Is there any other central repository of or searchable list of third-party PostgreSQL offerings?

So I’m looking for advice. Does having an email list matter? If I can get pgTAP announcements included in the PostgreSQL Weekly News, is that enough community visibility? Do you know of a nice project hosting site that offers hosting, mail lists, download mirroring and custom domain handling?

I’ll follow up with a summary of what I’ve found in a later post.

Looking for the comments? Try the old layout.

Make the Pragmas Stop!

I’ve been following the development of a few things in the Perl community lately, and it’s leaving me very frustrated. For years now, I’ve written modules that start with the same incantation:

package My::Module;

use strict;
our $VERSION = '0.01';

Pretty simple: declare the module name and version, and turn on strictures to make sure I’m not doing anything stupid. More recently I’ve added use warnings; as a best practice. And even more recently, I’ve started adding use utf8;, too, because I like to write my code in UTF-8. And I like to turn on all of the Perl 5.10 features. It’s mildly annoying to have the same incantation at the start of every module, but I could deal with it:

package My::Module;

use strict;
use warnings;
use feature ':5.10';
use utf8;

our $VERSION = '0.01';

Until now that is. Last year, chromatic started something with his Modern::Perl module. It was a decent idea for newbies to help them get started with Perl by having to have only one declaration at the tops of their modules:

package My::Module;

use Modern::Perl;
our $VERSION = '0.01';

Alas, it wasn’t really designed for me, but for more casual users of Perl, so that they don’t have to think about the pragmas they need to use. The fact that it doesn’t include the utf8 pragma also made it a non-starter for me. Or did it? Someone recently suggested that the utf8 pragma has problems (I can’t find the Perl Monks thread at the moment). Others report that the encoding pragma has issues, too. So what’s the right thing to do with regard to assuming everything is UTF8 in my program and its inputs (unless I say otherwise)? I’m not at all sure.

Not only that, but Modern::Perl has lead to an explosion of other pragma-like modules on CPAN that promise best pragma practices. There’s common::sense, which loads utf8 but only some of of the features of strict, warnings, and feature. uni::perl looks almost exactly the same. There’s also Damian Conway’s Toolkit, which allows you to write your own pragma-like loader module. There’s even Acme::Very::Modern::Perl, which is meant to be a joke, but is it really?

If I want to simplify the incantation at the top of every file, what do I use?

And now it’s getting worse. In addition to feature, Perl 5.11 introduces the legacy pragma, which allows one to get back behaviors from older Perls. For example, to get back the old Unicode semantics, you’d use legacy 'unicode8bit';. I mean, WTF?

I’ve had it. Please make the pragma explosion stop! Make it so that the best practices known at the time of the release of any given version of Perl can automatically imported if I just write:

package My::Module '0.01';
use 5.12;

That’s it. Nothing more. Whatever has been deemed the best practice at the time 5.12 is released will simply be used. If the best practices change in 5.14, I can switch to use 5.14; and get them, or just leave it at use 5.12 and keep what was the best practices in 5.12 (yay future-proofing!).

What should the best practices be? My list would include:

  • strict
  • warnings
  • features — all of them
  • UTF-8 — all input and output to the scope, as well as the source code

Maybe you disagree with that list. Maybe I’d disagree with what Perl 5 Porters settles on. But then you can I can read what’s included and just add or removed pragmas as necessary. But surely there’s a core list of likely candidates that should be included the vast majority of the time, including for all novices.

In personal communication, chromatic tells me, with regard to Modern::Perl, “Experienced Perl programmers know the right incantations to get the behavior they want. Novices don’t, and I think we can provide them much better defaults without several lines of incantations.â€? I’m fine with the second assertion, but disagree with the first. I’ve been hacking Perl for almost 15 years, and I no longer have any fucking idea what incantation is best to use in my modules. Do help the novices, and make the power tools available to experienced hackers, but please make life easier for the experienced hackers, too.

I think that declaring the semantics of a particular version of Perl is where the Perl 5 Porters are headed. I just hope that includes handling all of the likely pragmas too, so that I don’t have to.

Looking for the comments? Try the old layout.

More about…

Test Everything with TAP Source Handlers

I’ve just arrived in Japan with my family. We’re going to be spending several days in Tokyo, during which time I’ll be at the JPUG 10th Anniversary PostgreSQL Conference for a couple of days (giving the usual talk), but mainly I’ll be on vacation. We’ll be visiting Kyoto, too. We’re really excited about this trip; it’ll be a great experience for Anna. I’ll be back in the saddle in December, so for those of you anxiously awaiting the next installment of my Catalyst tutorial, I’m afraid you’ll have to wait a bit longer.

In the meantime, I wanted to write about a little something that’s been cooking for a while. Over the last several months, Steve Purkis has been working on a new feature for TAP::Parser: source handlers. The idea is to make it easier for developers to add support for TAP emitters other than Perl. The existing implementation did a decent job of handling Perl test scripts, of course, and executable files (useful for compiled tests in C using libtap, for example), but anything else was difficult.

As the author of pgTAP, I was of course greatly interested in this work, because I had to bend over backwards to get pg_prove to work nicely. It’s even uglier to get a Module::Build-based distribution to run pgTAP and Perl tests all at once in during ./Build test: You had to subclass Module::Build to do it.

Steve wanted to solve this problem, and he did. Then he was kind enough to listen to my bitching an moaning and rewrite his fix so that it was simpler for third parties (read: me) to add new source handlers. What’s a source handler, you ask? Check out the latest dev release of Test::Harness and you’ll see it: TAP::Parser::SourceHandler. As soon as Steve committed it, I jumped in and implemented a new handler for pgTAP. The cool thing is that it took me only three hours to do, including tests. And here’s how you use it in a Build.PL, so that you can have pgTAP tests named *.pg run at the same time as your *.t Perl tests:

Module::Build->new(
    module_name        => 'MyApp',
    test_file_exts     => [qw(.t .pg)],
    use_tap_harness    => 1,
    tap_harness_args   => {
        sources => {
            Perl  => undef,
            pgTAP => {
                dbname   => 'try',
                username => 'postgres',
                suffix   => '.pg',
            },
        }
    },
    build_requires     => {
        'Module::Build'                      => '0.30',
        'TAP::Parser::SourceHandler::pgTAP' => '3.19',
    },
)->create_build_script;

To summarize, you just have to:

  • Tell Module::Build the extensions of your test scripts (that’s qw(.t .pg) here)
  • Specify the Perl source with its defaults (that’s what the undef does)
  • Specify the pgTAP options (database name, username, suffix, and lots of other potential settings)

And that’s it. You’re done! Run your tests with the usual incantation:

perl Build.PL
./Build test

You can use pgTAP and its options with prove, too, via the --source and --pgtap-option options:

prove --source pgTAP --pgtap-option dbname=try \
                     --pgtap-option username=postgres \
                     --pgtap-option suffix=.pg \
                     t/sometest.pg

It’s great that it’s now so much easier to support pgTAP tests, but what if you want to have Ruby tests? Or PHP? Well, it’s a simple process to write your own source handler. Here’s how:

  • Subclass TAP::Parser::SourceHandler. The final part of the package name is the name of the source. Thus if you wrote TAP::Parser::SourceHandler::Ruby, the name of your source would be “Ruby”.

  • Load the necessary modules and register your source handler. For a Ruby source handler, it might look like this:

    package TAP::Parser::SourceHandler::Ruby;
    use strict;
    use warnings;
    
    use TAP::Parser::IteratorFactory   ();
    use TAP::Parser::Iterator::Process ();
    TAP::Parser::IteratorFactory->register_handler(__PACKAGE__);
  • Implement the can_handle() method. The task of this method is to return a score between 0 and 1 for how likely it is that your source handler can handle a given source. A bunch of information is passed in a hash to the method, so you can check it all out. For example, if you wanted to run Ruby tests ending in .rb, you might write something like this:

    sub can_handle {
        my ( $class, $source ) = @_;
        my $meta = $source->meta;
    
        # If it's not a file (test script), we're not interested.
        return 0 unless $meta->{is_file};
    
        # Get the file suffix, if any.
        my $suf = $meta->{file}{lc_ext};
    
        # If the config specifies a suffix, it's required.
        if ( my $config = $source->config_for('Ruby') ) {
            if ( defined $config->{suffix} ) {
                # Return 1 for a perfect score.
                return $suf eq $config->{suffix} ? 1 : 0;
            }
        }
    
        # Otherwise, return a score for our supported suffix.
        return $suf eq '.rb' ? 0.8 : 0;
    }

    The last line is the most important: it returns 0.8 if the suffix is .rb, saying that it’s likely that this handler can handle the test. But the middle bit is interesting, too. The $source->config_for('Ruby') call is seeing if the user specified a suffix, either via the command-line or in the options. So in a Build.PL, that might be:

        tap_harness_args => {
            sources => {
                Perl => undef,
                Ruby => { suffix => '.rub' },
            }
        },

    Meaning that the user wanted to run tests ending in .rub as Ruby tests. It can also be done on the command-line with prove:

    prove --source Ruby --ruby-option suffix=.rub
    

    Cool, eh? We have a reasonable default for Ruby tests, .rb, but the user can override however she likes.

  • And finally, implement the make_iterator() method. The job of this method is simply to create a TAP::Parser::Iterator object to actually run the test. It might look something like this:

    sub make_iterator {
        my ( $class, $source ) = @_;
        my $config = $source->config_for('Ruby');
    
        my $fn = ref $source->raw ? ${ $source->raw } : $source->raw;
        $class->_croak(
            'No such file or directory: ' . defined $fn ? $fn : ''
        ) unless $fn && -e $fn;
    
        return TAP::Parser::Iterator::Process->new({
            command => [$config->{ruby} || 'ruby', $fn ],
            merge   => $source->merge
        });
    }

    Simple, right? Just make sure we have a valid file to execute, then instantiate and return a TAP::Parser::Iterator::Process object to actually run the test.

That’s it. Just two methods and you’re ready to go. I’ve even added support for a suffix option and a ruby option (so that you can point to the ruby executable in case it’s not in your path). Using it is easy. I wrote a quick TAP-emitting Ruby script like so:

puts 'ok 1 - This is a test'
puts 'ok 2 - This is another test'
puts 'not ok 3 - This is a failed test'

And to run this test (assuming that TAP::Parser::SourceHandler::Ruby has been installed somewhere where Perl can find it), it’s just:

% prove --source Ruby ~/try.rb --verbose
/Users/david/try.rb .. 
ok 1 - This is a test
ok 2 - This is another test
not ok 3 - This is a failed test
Failed 1/3 subtests 

Test Summary Report
-------------------
/Users/david/try.rb (Wstat: 0 Tests: 3 Failed: 1)
  Failed test:  3
  Parse errors: No plan found in TAP output
Files=1, Tests=3,  0 wallclock secs ( 0.02 usr +  0.01 sys =  0.03 CPU)
Result: FAIL

It’s so easy to create new source handlers now, especially if all you have to do is support a new dynamic language. I’ve put the simple Ruby example over here; feel free to take it and run with it!

Looking for the comments? Try the old layout.