Keep DBI's connect_cached From Horking Transactions

I've been on a bit of a Perl hacking tear lately. In addition to knocking out Test::XPath last week, I've been experimenting with TAP::Harness sources, Template::Declare, Catalyst views, a new Module::Build subclass for building database-backed applications, and, last but not least, an IRC logging bot. Oh, and that application I'm working on for PGX with Quinn Weaver. So much is crowding my mind these days that I'm having trouble sleeping. Tonight I'm up late hacking to try to get some of this stuff out of my head.

But before I crash, I wanted to share a useful DBI hack. I use connect_cached in a lot of my applications, because it's a nice way to reuse database handles without having to figure out my own caching algorithm. The code I have to use it looks like this:

sub dbh {
    my $self = shift;
    DBI->connect_cached( @{ $self->_dbi }{qw(dsn username password)}, {
        PrintError     => 0,
        RaiseError     => 0,
        HandleError    => Exception::Class::DBI->handler,
        AutoCommit     => 1,
    });
}

Very simple. I just call the dbh() method whenever I need to talk to the database, and I'm set. Except for one problem: transactions.

Say I have a method that grabs the handle, starts a transaction with the begin_work method, and then inserts a row. Then another method grabs the handle from dbh() on the assumption that it's in the same transaction, and does its own work. Only, it's not the same transaction, because, unfortunately, DBI sets the attributes passed to connect_cached every single time it's called!. So even though that second method may think it's in the middle of a transaction, it's really not, because when connect_cached sets AutoCommit back to 1, the transaction gets committed.

Oops.

This really fucks with my tests, where I'm often fetching the database handle to start a transaction, running some tests, and then wanting to rollback the transaction when I'm done. It's irritating as all hell to discover that data has been inserted into the database. And the DBI, alas, just gives me this warning:

rollback ineffective with AutoCommit enabled at t/botinst.t line 67.

I'm likely not to notice until I get a duplicate key error the next time I run the tests.

As I was dealing with this today, my memory started poking at me, telling me that I've dealt with this before. And sure enough, a quick Google shows that Tim Bunce and I had an extensive conversation on this very topic -- over four years ago. If you're patient enough to dig through that thread, you'll note that this issue is due to some architectural difficulties in the DBI, to be worked out in DBI 2.

Over the last four years, I've implemented a couple of solutions to this problem, all involving my code tracking the transaction state and modifying the AutoCommit attribute in the appropriate places. It's all rather fragile. But as I dug through the thread, I discovered a much cleaner fix, using a little-known and so-far undocumented feature of the DBI: callbacks. This is actually I feature I half-way implemented in the DBI years ago, getting it just far enough that Tim was willing to finish it. And it's just sitting there, waiting to be used.

So here's the trick: Specify a callback for the connect_cached() method that's used only when an existing file handle is retrieved from the cache. A bunch of stuff is passed to the callback, but the important one is the fifth argument, the attributes. All it has to do is delete the AutoCommit attribute. Since this callback is called before the DBI looks at the attributes to set them on the handle, the callback effectively prevents the DBI from horking up your transactions.

Here's the modified code:

my $cb = {
    'connect_cached.reused' => sub { delete $_[4]->{AutoCommit} },
};

sub dbh {
    my $self = shift;
    DBI->connect_cached( @{ $self->_dbi }{qw(dsn username password)}, {
        PrintError     => 0,
        RaiseError     => 0,
        HandleError    => Exception::Class::DBI->handler,
        AutoCommit     => 1,
        Callbacks      => $cb,
    });
}

Callbacks are passed as a hash reference, with the keys being the names of the DBI methods that should trigger the callbacks, such as ping, data_sources, or connect_cached. The values are, of course, code references. When the DBI calls the code callbacks, it passes in stuff relevant to the method.

In the case of connect_cached, there are two additional special-case callbacks, connect_cached.new and connect_cached.reused, so that you can have different callbacks execute depending on whether connect_cached used a cached database handle or had to create a new one. Here, I've used connect_cached.reused, of course, and all I do is kill off the AutoCommit attribute before the DBI gets its greedy hands on it. Et voilà, problem solved!

And before you ask, no, you can't simply omit AutoCommit from the attributes passed to connect_cached, because the DBI helpfully adds it for you.

So now this is here so I can find it again when it bites me next year, and I hope it helps you, too. Meanwhile, perhaps someone could take it upon themselves to document DBI's callbacks? At this point, the closest thing to documentation is in the tests. (Hrm. I think I might have had a hand in writing them.) Check 'em out.

Backtalk

Quinn Weaver wrote:

So why connect_cached instead of Apache::DBI?

Just curious.

Theory wrote:

This post from Matt Sergeant firmed it up for me. Bricolage started out doing its own thing, before I'd heard of Apache::DBI. See this post in the same thread for when I was able to solve whatever problem I had using connect_cached().

In short: connect_cached() is a better per-process caching algorithm than Apache::DBI.

—Theory