Just a Theory

By David E. Wheeler

Posts about DBIx::Connector

Today on the Perl Advent Calendar

Hey look everybody, I wrote today’s Perl Advent Calendar post, Less Tedium, More Transactions. Go read it! (Update: moved here.)

Looking for the comments? Try the old layout.

Less Tedium, More Transactions

A frequent pattern when writing database-backed applications with the DBI is to connect to the database and cache the database handle somewhere. A simplified example:

package MyApp::DB;
use DBI;
use strict;

my $DBH = DBI->connect('DBI:SQLite:dbname=myapp.db', '', '', {
    PrintError     => 0,
    RaiseError     => 1,
    AutoCommit     => 1,
    sqlite_unicode => 1,
});

sub dbh { $DBH }

Just load MyApp::DB anywhere in your app and, whenever you want to use the database, grab the handle from MyApp::DB->dbh.

This pattern is common enough that Apache::DBI was created to magically do it for you on mod_perl, and the DBI added connect_cached() so that it could cache connections itself. However, each of these solutions has some issues:

  • What happens when your program forks? Apache::DBI handles this condition, but neither the home-grown solution nor connect_cached() does, and identifying a forked database handle as the source of a crash is notoriously unintuitive.

  • What happens when your program spawns a new thread? Sure, some DBI drivers might still work, but others might not. Best to treat new threads the same as new processes and reconnect to the database. Neither Apache::DBI nor connect_cached() deal with threading issues, and of course neither does the custom solution.

  • Apache::DBI is magical and mysterious; but the magic comes with serious side-effects. Apache::DBI plugs itself right into the DBI itself, replacing its connection methods (which is why load ordering is so important to use it properly). Knowledge of Apache::DBI is actually built right into the DBI itself, meaning that the magic runs deep and both ways. These are pretty serious violations of encapsulation in both directions.

  • connect_cached() has a bit of its own unfortunate magic. Every call to connect_cached() resets the connection attributes. So if you have code in one place that starts a transaction, and code elsewhere but executed in the same scope that also fetches a connect_cached() handle, the transaction will be committed then and there, even though the code that started it might not be done with it. One can work around this issue via callbacks, but it’s a bit of a hack.

Using a custom caching solution avoids the magic, but getting fork- and thread-safety right is surprisingly non-trivial, in the same way that doing your own exception-handling is surprisingly non-trivial.

Enter DBIx::Connector, a module that efficiently manages your database connections in a thread- and fork-safe manner so that you don’t have to. If you already have a custom solution, switching to DBIx::Connector is easy. Here’s a revision of MyApp::DB that uses it:

package MyApp::DB;
use DBIx::Connector;
use strict;

my $CONN = DBIx::Connector->new('DBI:SQLite:dbname=myapp.db', '', '', {
    PrintError     => 0,
    RaiseError     => 1,
    AutoCommit     => 1,
    sqlite_unicode => 1,
});

sub conn { $CONN }
sub dbh  { $CONN->dbh }

Simple, right? You pass exactly the same parameters to DBIx::Connector->new that you passed to DBI->connect. The DBIx::Connector object simply proxies the DBI. You want the database handle itself, just call dbh() and proceed as usual, confident that if your app forks or spawns new threads, your database handle will be safe. Why? Because DBIx::Connector detects such changes, and re-connects to the database, being sure to properly dispose of the original connection. But really, you don’t have to worry about that, because DBIx::Connector does the worrying for you.

Execution Methods

DBIx::Connector is very good at eliminating the technical friction of process and thread management. But that’s not all there is to it.

Although you can just fetch the DBI handle from your DBIx::Connector object and go, a better approach is to use its execution methods. These methods scope execution to a code block. Here’s an example using run():

$conn->run(sub {
    shift->do($query);
});

That may not seem so useful, and is more to type, but the real power comes from the txn() method. txn() executes the code block within the scope of a transaction. So where you normally would write something like this:

use Try::Tiny;
use MyApp::DBH;

my $dbh = MyApp::DBH->dbh;  
try {
    $dbh->begin_work;
    # do stuff...
    $dbh->commit;
} catch {
    $dbh->rollback;
    die $_;
};

The try() method scopes the transaction for you, so that you can just focus on the work to be done and transaction management:

use Try::Tiny;
use MyApp::DBH;

try {
    MyApp::DBH->conn->txn(sub {
        # do stuff...
    }
} catch {
    die $_;
};

There’s no need to call begin_work, commit, or rollback, as txn() does all that for you. Furthermore, it improves the maintainability of your code, as the scope of the transaction is much more clearly defined as the scope of the code block. Additional calls to txn() or run() within that block are harmless, and just become part of the same transaction:

MyApp::DBH->conn->txn(sub {
    my $dbh = shift;
    $dbh->do($_) for @queries;
    $conn->run(sub {
        shift->do($expensive_query);
        $conn->txn(sub {
            shift->do($another_expensive_query);
        });
    });
});

Even cooler is the svp() method, which scopes execution of a code block to a savepoint, or subtransaction, if your database supports it (all of the drivers currently supported by DBIx::Connector do). For example, this transaction will commit the insertion of values 1 and 3, but not 2:

MyApp::DBH->conn->txn(sub {
    my $dbh = shift;
    $dbh->do('INSERT INTO table1 VALUES (1)');
    try {
        $conn->svp(sub {
            shift->do('INSERT INTO table1 VALUES (2)');
            die 'OMGWTF?';
        });
    } catch {
           warn "Savepoint failed: $_\n";
    };
    $dbh->do('INSERT INTO table1 VALUES (3)');
});

Connection Management

The recommended pattern for using a cached DBI handle is to call ping() when you fetch it from the cache, and reconnect if it returns false. Apache::DBI and connect_cached() do this for you, and so does DBIx::Connector. However, in a busy application ping() can get called a lot. We recently did some query analysis for a client, and found that 1% of the database execution time was taken up with ping() calls. That may not sound like a lot, but looking at the numbers, it amounted to 100K pings per hour. For something that just returns true 99.9*% of the time, it seems a bit silly.

Enter DBIx::Connector connection modes. The default mode is “ping”, as that’s what most installations are accustomed to. A second mode is “no_ping”, which simply disables pings. I don’t recommend that.

A better solution is to use “fixup” mode. This mode doesn’t normally call ping() either. However, if a code block passed to run() or txn() throws an exception, then DBIx::Connector will call ping(). If it returns false, DBIx::Connector reconnects to the database and executes the code block again. This configuration should handle some common situations, such as idle timeouts, without bothering you about it.

Specify “fixup” mode whenever you call an execution method, like so:

$conn->txn(fixup => sub { ... });

You can also specify that your connection always use “fixup” via the fixup() accessor. Modify the caching library like so (line 8 is new):

my $CONN = DBIx::Connector->new('DBI:SQLite:dbname=myapp.db', '', '', {
    PrintError     => 0,
    RaiseError     => 1,
    AutoCommit     => 1,
    sqlite_unicode => 1,
});

$CONN->mode('fixup'); # ⬅ ⬅ ⬅  enter fixup mode!

sub conn { $CONN }
sub dbh  { $CONN->dbh }

However, you must be more careful with fixup mode than with ping mode, because a code block can be executed twice. So you must be sure to write it such that there are no side effects to multiple executions. Don’t do this, for example:

my $count = 0;
$conn->txn(fixup => sub {
    shift->do('INSERT INTO foo (count) VALUES(?)', undef, ++$count);
});
say $count; # may be 1 or 2

Will it insert a value of 1 or 2? It’s much safer to remove non-transactional code from the block, like so:

my $count = 0;
++$count;
$conn->txn(fixup => sub {
    shift->do('INSERT INTO foo (count) VALUES(?)', undef, $count);
});
say $count; # can only be 1

An even trickier pattern to watch out for is something like this:

my $user = 'rjbs';
$conn->run(fixup => sub {
    my $dbh = shift;
    $dbh->do('INSERT INTO users (nick) VALUES (?)', undef, $user);

    # Do some other stuff...

    $dbh->do('INSERT INTO log (msg) VALUES (?)', undef, 'Created user');
});

If the database disconnects between the first and second calls to do, and DBIx::Connector manages to re-connect and run the block again, you might get a unique key violation on the first call to do. This is because we’ve used the run() method. In the fist execution of the block, user “rjbs” was inserted and autocommitted. On the second call, user “rjbs” is already there, and because it’s a username, we get a unique key violation.

The rule of thumb here is to use run() only for database reads, and to use txn() (and svp()) for writes. txn() will ensure that the transaction is rolled back, so the second execution of the code block will be side-effect-free.

Pedigree

DBIx::Connector is derived from patterns originally implemented for DBIx::Class, though it’s nearly all original code. The upside for those of us who don’t use ORMs is that we get this independent piece of ORM-like behavior without its ORMishness. So if you’re a database geek like me, DBIx::Connector is a great way to reduce technical friction without buying into the whole idea of an ORM.

As it turns out, DBIx::Connector is good not just for straight-to-database users, but also for ORMs. Both DBIx::Class and Rose::DB have plans to replace their own caching and transaction-handling implementations with DBIx::Connector under the hood. That will be great for everyone, as the problems will all be solved in this one place.

This post originally appeared on the Perl Advent Calendar 2011.

DBIx::Connector and Serializable Snapshot Isolation

I was at Postgres Open week before last. This was a great conference, very welcoming atmosphere and lots of great talks. One of the more significant, for me, was the session on serializable transactions by Kevin Grittner, who developed SSI for PostgreSQL 9.1. I hadn’t paid much attention to this feature before now, but it became clear to me, during the talk, that it’s time.

So what is SSI? Well, serializable transactions are almost certainly how you think of transactions already. Here’s how Kevin describes them:

True serializable transactions can simplify software development. Because any transaction which will do the right thing if it is the only transaction running will also do the right thing in any mix of serializable transactions, the programmer need not understand and guard against all possible conflicts. If this feature is used consistently, there is no need to ever take an explicit lock or SELECT FOR UPDATE/SHARE.

This is, in fact, generally how I’ve thought about transactions. But I’ve certainly run into cases where it wasn’t true. Back in 2006, I wrote an article on managing many-to-many relationships with PL/pgSQL which demonstrated a race condition one might commonly find when using an ORM. The solution I offered was to always use a PL/pgSQL function that does the work, and that function executes a SELECT...FOR UPDATE statement to overcome the race condition. This creates a lock that forces conflicting transactions to be performed serially.

Naturally, this is something one would rather not have to think about. Hence SSI. When you identify a transaction as serializable, it will be executed in a truly serializable fashion. So I could actually do away with the SELECT...FOR UPDATE workaround — not to mention any other race conditions I might have missed — simply by telling PostgreSQL to enforce transaction isolation. This essentially eliminates the possibility of unexpected side-effects.

This comes at a cost, however. Not in terms of performance so much, since the SSI implementation uses some fancy, recently-developed algorithms to keep things efficient. (Kevin tells me via IRC: “Usually the rollback and retry work is the bulk of the additional cost in an SSI load, in my testing so far. A synthetic load to really stress the LW locking, with a fully-cached database doing short read-only transactions will have no serialization failures, but can run up some CPU time in LW lock contention.”) No, the cost is actually in increased chance of transaction rollback. Because SSI will catch more transaction conflicts than the traditional “read committed” isolation level, frameworks that expect to work with SSI need to be prepared to handle more transaction failures. From the fine manual:

The Serializable isolation level provides the strictest transaction isolation. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. However, like the Repeatable Read level, applications using this level must be prepared to retry transactions due to serialization failures.

And that brings me to DBIx::Connector, my Perl module for safe connection and transaction management. It currently has no such retry smarts built into it. The feature closest to that is the “fixup” connection mode, wherein if a execution of a code block fails due to a connection failure, DBIx::Connector will re-connect to the database and execute the code reference again.

I think I should extend DBIx::Connector to take isolation failures and deadlocks into account. That is, fixup mode would retry a code block not only on connection failure but also on serialization failure (SQLSTATE 40001) and deadlocks (SQLSTATE 40P01). I would also add a new attribute, retries, to specify the number of times to retry such execution, with a default of three (which likely will cover the vast majority of cases). This has actually been an oft-requested feature, and I’m glad to have a new reason to add it.

There are a few design issues to overcome, however:

  • Fixup mode is supported not just by txn(), which scopes the execution of a code reference to a single transaction, but also run(), which does no transaction handling. Should the new retry support be added there, too? I could see it either way (a single SQL statement executed in run() is implicitly transaction-scoped).
  • Fixup mode is also supported by svp(), which scopes the execution of a code reference to a savepoint (a.k.a. a subtransaction). Should the rollback and retry be supported there, too, or would the whole transaction have to be retried? I’m thinking the latter, since that’s currently the behavior for connection failures.
  • Given these issues, will it make more sense to perhaps create a new mode? Maybe it would be supported only by txn().

This is do-able, will likely just take some experimentation to figure it out and settle on the appropriate API. I’ll need to find the tuits for that soon.

In the meantime, given currently in-progress changes, I’ve just released a new version of DBIx::Connector with a single change: All uses of the deprecated catch syntax now throw warnings. The previous version threw warnings only the first time the syntax was used in a particular context, to keep error logs from getting clogged up. Hopefully most folks have changed their code in the two months since the previous release and switched to Try::Tiny or some other model for exception handling. The catch syntax will be completely removed in the next release of DBIx::Connector, likely around the end of the year. Hopefully the new SSI-aware retry functionality will have been integrated by then, too.

In a future post I’ll likely chew over whether or not to add an API to set the transaction isolation level within a call to txn() and friends.

Looking for the comments? Try the old layout.

DBIx::Connector Exception Handling Design

In response to a bug report, I removed the documentation suggesting that one use the catch function exported by Try::Tiny to specify an exception-handling function to the DBIx::Connector execution methods. When I wrote those docs, Try::Tiny’s catch method just returned the subroutine. It was later changed to return an object, and that didn’t work very well. It seemed a much better idea not to depend on an external function that could change its behavior when there is no direct dependency on Try::Tiny in DBIx::Connector. I removed that documentation in 0.43. So instead of this:

$conn->run(fixup => sub {
    ...
}, catch {
    ...
});

It now recommends this:

$conn->run(fixup => sub {
    ...
}, catch => sub {
    ...
});

Which frankly is better balanced anyway.

But in discussion with Mark Lawrence in the ticket, it has become clear that there’s a bit of a design problem with this approach. And that problem is that there is no try keyword, only catch. The fixup in the above example does not try, but the inclusion of the catch implicitly makes it behave like try. That also means that if you use the default mode (which can be set via the mode method), then there will usually be no leading keyword, either. So we get something like this:

$conn->run(sub {
    ...
}, catch => sub {
    ...
});

So it starts with a sub {} and no fixup keyword, but there is a catch keyword, which implicitly wraps that first sub {} in a try-like context. And aesthetically, it’s unbalanced.

So I’m trying to decide what to do about these facts:

  • The catch implicitly makes the first sub be wrapped in a try-type context but without a try-like keyword.
  • If one specifies no mode for the first sub but has a catch, then it looks unbalanced.

At one level, I’m beginning to think that it was a mistake to add the exception-handling code at all. Really, that should be the domain of another module like Try::Tiny or, better, the language. In that case, the example would become:

use Try::Tiny;
try {
    $conn->run(sub {
        ...
    });
} catch {
    ....
}

And maybe that really should be the recommended approach. It seems silly to have replicated most of Try::Tiny inside DBIx::Connector just to cut down on the number of anonymous subs and indentation levels. The latter can be got round with some semi-hinky nesting:

try { $conn->run(sub {
    ...
}) } catch {
    ...
}

Kind of ugly, though. The whole reason the catch stuff was added to DBIx::Connector was to make it all nice and integrated (as discussed here). But perhaps it was not a valid tradeoff. I’m not sure.

So I’m trying to decide how to solve these problems. The options as I see them are:

  1. Add another keyword to use before the first sub that means “the default mode”. I’m not keen on the word “default”, but it would look something like this:

    $conn->run(default => sub {
        ...
    }, catch => sub {
        ...
    });

    This would provide the needed balance, but the catch would still implicitly execute the first sub in a try context. Which isn’t a great idea.

  2. Add a try keyword. So then one could do this:

    $conn->run(try => sub {
        ...
    }, catch => sub {
        ...
    });

    This makes it explicit that the first sub executes in a try context. I’d also have to add companion try_fixup, try_ping, and try_no_ping keywords. Which are ugly. And furthermore, if there was no try keyword, would a catch be ignored? That’s what would be expected, but it changes the current behavior.

  3. Deprecate the try/catch stuff in DBIx::Connector and eventually remove it. This would simplify the code and leave the responsibility for exception handling to other modules where it’s more appropriate. But it would also be at the expense of formatting; it’s just a little less aesthetically pleasing to have the try/catch stuff outside the method calls. But maybe it’s just more appropriate.

I’m leaning toward #3, but perhaps might do #1 anyway, as it’d be nice to be more explicit and one would get the benefit of the balance with catch blocks for as long as they’re retained. But I’m not sure yet. I want your feedback on this. How do you want to use exception-handling with DBIx::Connector? Leave me a comment here or on the ticket.

Looking for the comments? Try the old layout.

Serious Exception-Handling Bug Fixed in DBIx::Connector 0.42

I’ve just released DBIx::Connector 0.42 to CPAN. This release fixes a serious bug with catch blocks. In short, if you threw an exception from inside a catch block, it would not be detectable from outside. For example, given this code:

eval {
    $conn->run(sub { die 'WTF' }, catch => sub { die 'OMG!' });
};
if (my $err = $@) {
    say "We got an error: $@\n";
}

With DBIx::Connector 0.41 and lower, the if block would never be called, because even though the catch block threw an exception, $@ was not set. In other words, the exception would not be propagated to its caller. This could be terribly annoying, as you can imagine. I was being a bit too clever about localizing $@, with the scope much too broad. 0.42 uses a much tighter scope to localize $@, so now it should propagate properly everywhere.

So if you’re using DBIx::Connector catch blocks, please upgrade ASAP. Sorry for the hassle.

Looking for the comments? Try the old layout.

Defend Against Programmer Mistakes?

I get email:

Hey David,

I ran in to an issue earlier today in production that, while it is an error in my code, DBIx::Connector could easily handle the issue better. Here’s the use case:

package Con;
use Moose;
sub txn {
    my ($self, $code) = @_;
    my @ret;
    warn "BEGIN EVAL\n";
    eval{ @ret = $code->() };
    warn "END EVAL\n";
    die "DIE: $@" if $@;
    return @ret;
}
package main;
my $c = Con->new();
foreach (1..2) {
    $c->txn(sub{ next; });
}

The result of this is:

BEGIN EVAL
Exiting subroutine via next at test.pl line 16.
Exiting eval via next at test.pl line 16.
Exiting subroutine via next at test.pl line 16.
BEGIN EVAL
Exiting subroutine via next at test.pl line 16.
Exiting eval via next at test.pl line 16.
Exiting subroutine via next at test.pl line 16.

This means that any code after the eval block is not executed. And, in the case of DBIx::Connector, means the transaction is not committed or rolled back, and the next call to is txn() mysteriously combined with the previous txn() call. A quick fix for this is to just add a curly brace in to the eval:

eval{ { @ret = $code->() } };

Then the results are more what we’d expect:

BEGIN EVAL
Exiting subroutine via next at test.pl line 16.
END EVAL
BEGIN EVAL
Exiting subroutine via next at test.pl line 16.
END EVAL

I’ve fixed my code to use return; instead of next;, but I think this would be a useful fix for DBIx::Connector so that it doesn’t act in such an unexpected fashion when the developer accidentally calls next.

The fix here is pretty simple, but I’m not sure I want to get into the business of defending against programmer mistakes like this in DBIx::Connector or any module.

What do you think?

Looking for the comments? Try the old layout.

More about…

Tutorial on GitHub

Following a very good suggestion from Pedro Melo, I’ve created a Git repository for this tutorial and put it on GitHub. I replayed each step, making each into its own commit, and tagged the state of the code for each entry:

So as I continue to make modifications, I’ll keep this repository up-to-date, and tag things as of each blog entry. This will make it easy for you to follow along; you can simply clone the repository and git pull for each post.

More soon.

Looking for the comments? Try the old layout.

Catalyst with DBIx::Connector and Template::Declare

Following up on my post yesterday introducing Catalyst::View::TD, today I’d like to continue with the next step in chapter 3 of the Catalyst tutorial. The twist here is that I’m going to use PostgreSQL for the database back-end and start introducing some database best practices. I’m also going to make use of my DBIx::Connector module to interact with the database.

Create the Database

Picking up with the database creation section of the tutorial, the first change I’d like to make is to use a natural key for the books table. All books have unique identifiers, known as ISBNs, so creating a surrogate key (the typical id column in ORM-managed applications) is redundant. One of the nice things about PostgreSQL is that it ships with a contributed library, isn, which validates ISBN and other international identifiers. So we use this contrib module (usually included in package-installed PostgreSQL servers) for the primary key for books. If you need to install it from source, it’s pretty easy:

cd postgresql-8.4.1/contrib/isn
make
make install

Ideally I’d use a natural key for the authors table too, but despite some attempts to create universal identifiers for authors, nothing has really caught on as far as I know. So I’ll just stick to a surrogate key for now.

First step: create the database and install isn if it’s not already included in the template database:

createdb -U postgres myapp
psql -U postgres -d myapp -f /usr/local/pgsql/share/contrib/isn.sql

The isn.sql file may be somewhere else on your system. Now let’s create the database. Create sql/001-books.sql in the MyApp directory and paste this into it:

BEGIN;

CREATE TABLE books (
    isbn   ISBN13   PRIMARY KEY,
    title  TEXT     NOT NULL DEFAULT '',
    rating SMALLINT NOT NULL DEFAULT 0 CHECK (rating BETWEEN 0 AND 5)
);

CREATE TABLE authors (
    id         BIGSERIAL PRIMARY KEY,
    surname    TEXT NOT NULL DEFAULT '',
    given_name TEXT NOT NULL DEFAULT ''
);

CREATE TABLE book_author (
    isbn       ISBN13 REFERENCES books(isbn),
    author_id  BIGINT REFERENCES authors(id),
    PRIMARY KEY (isbn, author_id)
);

INSERT INTO books
VALUES ('1587201534',        'CCSP SNRS Exam Certification Guide', 5),
       ('978-0201633467',    'TCP/IP Illustrated, Volume 1',       5),
       ('978-0130183804',    'Internetworking with TCP/IP Vol.1',  4),
       ('978-1-56592-243-3', 'Perl Cookbook',                      5),
       ('978-0735712010',    'Designing with Web Standards',       5)
;

INSERT INTO authors
VALUES (1, 'Greg',      'Bastien'),
       (2, 'Sara',      'Nasseh'),
       (3, 'Christian', 'Degu'),
       (4, 'Richard',   'Stevens'),
       (5, 'Douglas',   'Comer'),
       (6, 'Tom',       'Christiansen'),
       (7, 'Nathan',    'Torkington'),
       (8, 'Jeffrey',   'Zeldman')
;

INSERT INTO book_author
VALUES ('1587201534',        1),
       ('1587201534',        2),
       ('1587201534',        3),
       ('978-0201633467',    4),
       ('978-0130183804',    5),
       ('978-1-56592-243-3', 6),
       ('978-1-56592-243-3', 7),
       ('978-0735712010',    8)
;

COMMIT;

Yeah, I Googled for the ISBNs for those books. I found the ISBN-13 number for most of them, but it handles the old ISBN-10 format, too, automatically upgrading it to ISBN-13. I also added a CHECK constraint for the rating column, to be sure that the value is always BETWEEN 0 AND 5. I also like to include default values where it’s sensible to do so, and that syntax for inserting multiple rows at once is pretty nice to have.

Go ahead and run this against your database:

psql -U postgres -d myapp -f sql/001-books.sql

Now if you connect to the server, you should be able to query things like so:

$ psql -U postgres myapp
psql (8.4.1)
Type "help" for help.

myapp=# select * from books;
       isbn        |               title                | rating 
-------------------+------------------------------------+--------
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5
 978-0-201-63346-7 | TCP/IP Illustrated, Volume 1       |      5
 978-0-13-018380-4 | Internetworking with TCP/IP Vol.1  |      4
 978-1-56592-243-3 | Perl Cookbook                      |      5
 978-0-7357-1201-0 | Designing with Web Standards       |      5
(5 rows)

Setup the Database Connection

Great! The database is set. Now we need a way for the app to talk to it. I’ve not yet decided how I’m going to integrate DBIx::Connector into a Catalyst model class; maybe I’ll figure it out as I write these posts. But since my mantra is “the database is the model,â€? for now I won’t bother with a model at all. Instead, I’ll create a simple accessor in MyApp so we can easily get at the database connection wherever we need it. To do that, add these lines to lib/MyApp.pm:

use Moose;
use DBIx::Connector;
use Exception::Class::DBI;

has conn => (is => 'ro', lazy => 1, default => sub {
    DBIx::Connector->new( 'dbi:Pg:dbname=myapp', 'postgres', '', {
        PrintError     => 0,
        RaiseError     => 0,
        HandleError    => Exception::Class::DBI->handler,
        AutoCommit     => 1,
        pg_enable_utf8 => 1,
    });
});

We load Moose to get the has keyword, the officially sanctioned interface for defining attributes in Catalyst classes. Then I use that keyword to create the conn attribute. This attribute is read-only and has a DBIx::Connector object for its default value. The nice thing about this is that the DBIx::Connector object won’t be instantiated until it’s actually needed, and then it will be kept forever. We never have to do anything else to use it.

Oh, and I like to make sure that text data coming back from PostgreSQL is properly encoded as UTF-8, and I like to use Exception::Class::DBI to turn DBI errors into exception objects.

Now it’s time to update our controller and template to fetch actual data from the database. Edit lib/MyApp/Controller/Books.pm and change the list method to:

sub list : Local {
    my ($self, $c) = @_;
    $c->stash->{books} = $c->conn->run(fixup => sub {
        my $sth = $_->prepare('SELECT isbn, title, rating FROM books');
        $sth->execute;
        $sth;
    });
}

All we’re doing here is creating a statement handle for the query, executing the query, and storing the statement handle in the stash. Now we need to update the template to use the statement handle. Open up lib/MyApp/Templates/HTML/Books.pm and change the list template to:

template list => sub {
    my ($self, $args) = @_;
    table {
        row {
            th { 'Title'  };
            th { 'Rating' };
            th { 'Author' };
        };
        my $sth = $args->{books};
        while (my $book = $sth->fetchrow_hashref) {
            row {
                cell { $book->{title}  };
                cell { $book->{rating} };
                cell { $book->{author} };
            };
        };
    };
};

All we do is fetch each row from the statement handle and output it. The only thing that’s changed is the use of the statement handle as an iterator rather than an array reference.

And now we’re set! Restart your server with script/myapp_server.pl and point your browser at http://localhost:3000/books/list. Now you should see the book titles and ratings, though the authors still aren’t present. We’ll fix that in a later post.

Takeaway

The takeaway from this post: Use PostgreSQL’s support for custom data types to create validated natural keys for your data, and use a stable, persistent database connection to talk directly to the database. No need for an ORM here, as the DBI provides a very Perlish access to a very capable DSL for models called SQL.

More soon.

Looking for the comments? Try the old layout.

DBIx::Connector Updated

After much gnashing of teeth, heated arguments with @robkinon and @mst, lots of deep thesaurus spelunking, and three or four iterations, I finally came up with an an improved API for DBIx::Connector that I believe is straight-forward and easy to explain.

Following up on my post last week, I explored, oh I dunno, a hundred different terms for the various methods? I’ve never spent so much time on thesaurus.com in my life. Part of what added to the difficulty was that @mst seemed to think that there should actually be three modes for each block method: one that pings, one that doesn’t, and one that tries again if a block dies and the connection is down. So I went from six to nine methods with that assertion.

What I finally came up with was to name the three basic methods run(), txn_run(), and svp_run(), and these would neither ping nor retry in the event of failure. Then I added variations on these methods that would ping and that would try to fix failures. I called these “ping runs” and “fixup runs,” respectively. It was the latter term, “fixup,” that had been so hard for me to nail down, as “retry” seemed to say that the method was a retry, while “fixup” more accurately reflects that the method would try to fix up the connection in the event of a failure.

Once I’d implemented this interface, I now had nine methods:

  • run()
  • txn_run()
  • svp_run()
  • ping_run()
  • txn_ping_run()
  • svp_ping_run()
  • fixup_run()
  • txn_fixup_run()
  • svp_fixup_run()

This worked great. Then I went about documenting it. Jesus Christ what a pain! I realized that all these similarly-named methods would require a lot of explanation. I duly wrote up said explanation, and just wasn’t happy with it. It just felt to me like all the explanation made it too difficult to decide what methods to use and when. Such confusion would make the module less likely to be used – and certainly less likely to be used efficiently.

So I went back to the API drawing board and, reflecting on @robkinyon’s browbeating about decorating methods and @mst’s coming to that conclusion as well, I finally came up with just three methods:

  • run()
  • txn()
  • svp()

For any one of these, you can call it by passing a block, of course:

$conn->txn( sub { $_->do('SELECT some_function()') } );

In addition, you can now have any one of them run in one of three modes: the default (no ping), “ping”, or “fixup”:

$conn->txn( fixup => sub { $_->do('SELECT some_function()') } );

It’s much easier to explain the three methods in terms of how the block is transactionally encapsulated, as that’s the only difference between them. Once that’s understood, it’s pretty easy to explain how to change the “connection mode” of each by passing in a leading string. It even looks pretty nice. I’m really happy with this

One thing that increased the difficulty in coming up with this API was that @mst felt that by default the methods should neither ping nor try to fix up a failure. I was resistant to this because it’s not how Apache::DBI or connect_cached() work: they always ping. It turns out that DBIx::Class doesn’t cache connections at all. I thought it had. Rather, it creates a connection and simply hangs onto it as a scalar variable. It handles the connection for as long as it’s in scope, but includes no magic global caching. This reduces the action-at-a-distance issues common with caching while maintaining proper fork- and thread-safety.

At this point, I took a baseball bat to my desk.

Figuratively, anyway. I did at least unleash a mountain of curses upon @mst and various family relations. Because it took me a few minutes to see it: It turns out that DBIx::Class is right to do it this way. So I ripped out the global caching from DBIx::Connector, and suddenly it made much more sense not to ping by default – just as you wouldn’t ping if you created a DBI handle yourself.

DBIx::Connector is no longer a caching layer over the DBI. It’s now a proxy for a connection. That’s it. There is no magic, no implicit behavior, so it’s easier to use. And because it ensures fork- and thread-safety, you can instantiate a connector and hold onto it for whenever you need it, unlike using the DBI itself.

And one more thing: I also added a new method, with(). For those who always want to use the same connection mode, you can use this method to create a proxy object that has a different default mode. (Yes, a proxy for a proxy for a database handle. Whatever!) Use it like this:

$conn->with('fixup')->run( sub { ... } );

And if you always want to use the same mode, hold onto the proxy instead of the connection object:

my $proxy = DBIx::Connector->(@args)->with('fixup');

# later ...
$proxy->txn( sub { ... } ); # always in fixup mode

So while fixup mode is no longer the default, as Tim requested, but it can optionally be made the default, as DBIx::Class requires. The with() method will also be the place to add other global behavioral modifications, such as DBIx::Class’s auto_savepoint feature.

So for those of you who were interested in the first iteration of this module, my apologies for changing things so dramatically in this release (ripping out the global caching, deprecating methods, adding a new block method API, etc.). But I think that, for all the pain I went through to come up with the new API – all the arguing on IRC, all the thesaurus spelunking – that this is a really good API, easy to explain and understand, and easy to use. And I don’t expect to change it again. I might improve exceptions (use objects instead of strings?) add block method exception handling (perhaps adding a catch keyword?), but the basics are finally nailed down and here to stay.

Thanks to @mst, @robkinyon, and @ribasushi, in particular, for bearing with me and continuing to hammer on me when I was being dense.

Looking for the comments? Try the old layout.

Suggest Method Names for DBIx::Connector

Thanks to feedback from Tim Bunce and Peter Rabbitson in a DBIx::Class bug report, I’ve been reworking DBIx::Connector’s block-handling methods. Tim’s objection is that the the feature of do() and txn_do() that executes the code reference a second time in the event of a connection failure can be dangerous. That is, it can lead to action-at-a-distance bugs that are hard to find and fix. Tim suggested renaming the methods do_with_retry() and txn_do_with_retry() in order to make explicit what’s going on, and to have non-retry versions of the methods.

I’ve made this change in the repository. But I wasn’t happy with the method names; even though they’re unambiguous, they are also overly long and not very friendly. I want people to use the retrying methods, but felt that the long names make the non-retrying preferable to users. While I was at it, I also wanted to get rid of do(), since it quickly became clear that it could cause some confusion with the DBI’s do() method.

I’ve been thesaurus spelunking for the last few days, and have come up with a few options, but would love to hear other suggestions. I like using run instead of do to avoid confusion with the DBI, but otherwise I’m not really happy with what I’ve come up with. There are basically five different methods (using Tim’s suggestions for the moment):

run( sub {} )
Just run a block of code.
txn_run( sub {} )
Run a block of code inside a transaction.
run_with_retry( sub {} )
Run a block of code without pinging the database, and re-run the code if it throws an exception and the database turned out to be disconnected.
txn_run_with_rerun( sub {} )
Like run_with_retry(), but run the block inside a transaction.
svp_run( sub {} )
Run a block of code inside a savepoint (no retry for savepoints).

Here are some of the names I’ve come up with so far:

Run block Run in txn Run in savepoint Run with retry Run in txn with retry Retry Mnemonic
run txn_run svp_run runup txn_runup Run assuming the db is up, retry if not.
run txn_run svp_run run_up txn_run_up Same as above.
run txn_run svp_run rerun txn_rerun Run assuming the db is up, rerun if not.
run txn_run svp_run run::retry txn_run::retry :: means “with”

That last one is a cute hack suggested by Rob Kinyon on IRC. As you can see, I’m pretty consistent with the non-retrying method names; it’s the methods that retry that I’m not satisfied with. A approach I’ve avoided is to use an adverb for the non-retry methods, mainly because there is no retry possible for the savepoint methods, so it seemed silly to have svp_run_safely() to complement do_safely() and txn_do_safely().

Brilliant suggestions warmly appreciated.

Looking for the comments? Try the old layout.

Database Handle and Transaction Management with DBIx::Connector

As part of my ongoing effort to wrestle Catalyst into working the way that I think it should work, I’ve just uploaded DBIx::Connector to the CPAN. See, I was using Catalyst::Model::DBI, but it turned out that I wanted to use the database handle in places other than the Catalyst parts of my app. I was bitching about this to mst on #catalyst, and he said that Catalyst::Model::DBI was actually a fork of DBIx::Class’s handle caching, and quite out of date. I said, “But this already exists. It’s called connect_cached().” I believe his response was, “OH FUCK OFF!”

So I started digging into what Catalyst::Model::DBI and DBIx::Class do to cache their database handles, and how it differs from connect_cached(). It turns out that they were pretty smart, in terms of checking to see if the process had forked or a new thread had been spawned, and if so, deactivating the old handle and then returning a new one. Otherwise, things are just cached. This approach works well in Web environments, including under mod_perl; in forking applications, like POE apps; and in plain Perl scripts. Matt said he’d always wanted to pull that functionality out of DBIx::Class and then make DBIx::Class depend on the external implementation. That way everyone could take advantage of the functionality, including people like me who don’t want to use an ORM.

So I did it. Maybe it was crazy (mmmmm…yak meat), but I can now use the same database interface in the Catalyst and POE parts of my application without worry:

my $dbh = DBIx::Connector->connect(
    'dbi:Pg:dbname=circle', 'postgres', '', {
        PrintError     => 0,
        RaiseError     => 0,
        AutoCommit     => 1,
        HandleError    => Exception::Class::DBI->handler,
        pg_enable_utf8 => 1,
    },
);

$dbh->do($sql);

But it’s not just database handle caching that I’ve included in DBIx::Connector; no, I’ve also stolen some of the transaction management stuff from DBIx::Class. All you have to do is grab the connector object which encapsulates the database handle, and take advantage of its txn_do() method:

my $conn = DBIx::Connector->new(@args);
$conn->txn_do(sub {
    my $dbh = shift;
    $dbh->do($_) for @queries;
});

The transaction is scoped to the code reference passed to txn_do(). Not only that, it avoids the overhead of calling ping() on the database handle unless something goes wrong. Most of the time, nothing goes wrong, the database is there, so you can proceed accordingly. If it is gone, however, txn_do() will re-connect and execute the code reference again. The cool think is that you will never notice that the connection was dropped – unless it’s still gone after the second execution of the code reference.

And finally, thanks to some pushback from mst, ribasushi, and others, I added savepoint support. It’s a little different than that provided by DBIx::Class; instead of relying on a magical auto_savepoint attribute that subtly changes the behavior of txn_do(), you just use the svp_do() method from within txn_do(). The scoping of subtransactions is thus nicely explicit:

$conn->txn_do(sub {
    my $dbh = shift;
    $dbh->do('INSERT INTO table1 VALUES (1)');
    eval {
        $conn->svp_do(sub {
            shift->do('INSERT INTO table1 VALUES (2)');
            die 'OMGWTF?';
        });
    };
    warn "Savepoint failed\n" if $@;
    $dbh->do('INSERT INTO table1 VALUES (3)');
});

This transaction will insert the values 1 and 3, but not 2. If you call svp_do() outside of txn_do(), it will call txn_do() for you, with the savepoint scoped to the entire transaction:

$conn->svp_do(sub {
    my $dbh = shift;
    $dbh->do('INSERT INTO table1 VALUES (4)');
    $conn->svp_do(sub {
        shift->do('INSERT INTO table1 VALUES (5)');
    });
});

This transaction will insert both 3 and 4. And note that you can nest savepoints as deeply as you like. All this is dependent on whether the database supports savepoints; so far, PostgreSQL, MySQL (InnoDB), Oracle, MSSQL, and SQLite do. If you know of others, fork the repository, commit changes to a branch, and send me a pull request!

Overall I’m very happy with this module, and I’ll probably use it in all my Perl database projects from here on in. Perhaps later I’ll build a model class on it (something like Catalyst::Model::DBI, only better!), but next up, I plan to finish documenting Template::Declare and writing some views with it. More on that soon.

"Keep DBI's connect_cached From Horking
Transactions"

Looking for the comments? Try the old layout.