Today on the Perl Advent Calendar
Hey look everybody, I wrote today’s Perl Advent Calendar post, Less Tedium, More Transactions. Go read it! (Update: moved here.)
Looking for the comments? Try the old layout.
Hey look everybody, I wrote today’s Perl Advent Calendar post, Less Tedium, More Transactions. Go read it! (Update: moved here.)
Looking for the comments? Try the old layout.
A frequent pattern when writing database-backed applications with the DBI is to connect to the database and cache the database handle somewhere. A simplified example:
package MyApp::DB;
use DBI;
use strict;
my $DBH = DBI->connect('DBI:SQLite:dbname=myapp.db', '', '', {
PrintError => 0,
RaiseError => 1,
AutoCommit => 1,
sqlite_unicode => 1,
});
sub dbh { $DBH }
Just load MyApp::DB anywhere in your app and, whenever you want to use the
database, grab the handle from MyApp::DB->dbh
.
This pattern is common enough that Apache::DBI was created to magically do it
for you on mod_perl, and the DBI added connect_cached()
so that it could
cache connections itself. However, each of these solutions has some issues:
What happens when your program forks? Apache::DBI handles this condition,
but neither the home-grown solution nor connect_cached()
does, and
identifying a forked database handle as the source of a crash is notoriously
unintuitive.
What happens when your program spawns a new thread? Sure, some DBI drivers
might still work, but others might not. Best to treat new threads the same
as new processes and reconnect to the database. Neither Apache::DBI nor
connect_cached()
deal with threading issues, and of course neither does
the custom solution.
Apache::DBI is magical and mysterious; but the magic comes with serious side-effects. Apache::DBI plugs itself right into the DBI itself, replacing its connection methods (which is why load ordering is so important to use it properly). Knowledge of Apache::DBI is actually built right into the DBI itself, meaning that the magic runs deep and both ways. These are pretty serious violations of encapsulation in both directions.
connect_cached()
has a bit of its own unfortunate magic. Every call to
connect_cached()
resets the connection attributes. So if you have code in
one place that starts a transaction, and code elsewhere but executed in the
same scope that also fetches a connect_cached()
handle, the transaction
will be committed then and there, even though the code that started it might
not be done with it. One can work around this issue via callbacks, but
it’s a bit of a hack.
Using a custom caching solution avoids the magic, but getting fork- and thread-safety right is surprisingly non-trivial, in the same way that doing your own exception-handling is surprisingly non-trivial.
Enter DBIx::Connector, a module that efficiently manages your database connections in a thread- and fork-safe manner so that you don’t have to. If you already have a custom solution, switching to DBIx::Connector is easy. Here’s a revision of MyApp::DB that uses it:
package MyApp::DB;
use DBIx::Connector;
use strict;
my $CONN = DBIx::Connector->new('DBI:SQLite:dbname=myapp.db', '', '', {
PrintError => 0,
RaiseError => 1,
AutoCommit => 1,
sqlite_unicode => 1,
});
sub conn { $CONN }
sub dbh { $CONN->dbh }
Simple, right? You pass exactly the same parameters to DBIx::Connector->new
that you passed to DBI->connect
. The DBIx::Connector object simply proxies the
DBI. You want the database handle itself, just call dbh()
and proceed as
usual, confident that if your app forks or spawns new threads, your database
handle will be safe. Why? Because DBIx::Connector detects such changes, and
re-connects to the database, being sure to properly dispose of the original
connection. But really, you don’t have to worry about that, because
DBIx::Connector does the worrying for you.
DBIx::Connector is very good at eliminating the technical friction of process and thread management. But that’s not all there is to it.
Although you can just fetch the DBI handle from your DBIx::Connector object and
go, a better approach is to use its execution methods. These methods scope
execution to a code block. Here’s an example using run()
:
$conn->run(sub {
shift->do($query);
});
That may not seem so useful, and is more to type, but the real power comes from
the txn()
method. txn()
executes the code block within the scope of a
transaction. So where you normally would write something like this:
use Try::Tiny;
use MyApp::DBH;
my $dbh = MyApp::DBH->dbh;
try {
$dbh->begin_work;
# do stuff...
$dbh->commit;
} catch {
$dbh->rollback;
die $_;
};
The try()
method scopes the transaction for you, so that you can just focus on
the work to be done and transaction management:
use Try::Tiny;
use MyApp::DBH;
try {
MyApp::DBH->conn->txn(sub {
# do stuff...
}
} catch {
die $_;
};
There’s no need to call begin_work
, commit
, or rollback
, as txn()
does
all that for you. Furthermore, it improves the maintainability of your code, as
the scope of the transaction is much more clearly defined as the scope of the
code block. Additional calls to txn()
or run()
within that block are
harmless, and just become part of the same transaction:
MyApp::DBH->conn->txn(sub {
my $dbh = shift;
$dbh->do($_) for @queries;
$conn->run(sub {
shift->do($expensive_query);
$conn->txn(sub {
shift->do($another_expensive_query);
});
});
});
Even cooler is the svp()
method, which scopes execution of a code block to a
savepoint, or subtransaction, if your database supports it (all of the drivers
currently supported by DBIx::Connector do). For example, this transaction will
commit the insertion of values 1 and 3, but not 2:
MyApp::DBH->conn->txn(sub {
my $dbh = shift;
$dbh->do('INSERT INTO table1 VALUES (1)');
try {
$conn->svp(sub {
shift->do('INSERT INTO table1 VALUES (2)');
die 'OMGWTF?';
});
} catch {
warn "Savepoint failed: $_\n";
};
$dbh->do('INSERT INTO table1 VALUES (3)');
});
The recommended pattern for using a cached DBI handle is to call ping()
when
you fetch it from the cache, and reconnect if it returns false. Apache::DBI and
connect_cached()
do this for you, and so does DBIx::Connector. However, in a
busy application ping()
can get called a lot. We recently did some query
analysis for a client, and found that 1% of the database execution time was
taken up with ping()
calls. That may not sound like a lot, but looking at the
numbers, it amounted to 100K pings per hour. For something that just returns
true 99.9*% of the time, it seems a bit silly.
Enter DBIx::Connector connection modes. The default mode is “ping”, as that’s what most installations are accustomed to. A second mode is “no_ping”, which simply disables pings. I don’t recommend that.
A better solution is to use “fixup” mode. This mode doesn’t normally call
ping()
either. However, if a code block passed to run()
or txn()
throws an
exception, then DBIx::Connector will call ping()
. If it returns false,
DBIx::Connector reconnects to the database and executes the code block again.
This configuration should handle some common situations, such as idle timeouts,
without bothering you about it.
Specify “fixup” mode whenever you call an execution method, like so:
$conn->txn(fixup => sub { ... });
You can also specify that your connection always use “fixup” via the fixup()
accessor. Modify the caching library like so (line 8 is new):
my $CONN = DBIx::Connector->new('DBI:SQLite:dbname=myapp.db', '', '', {
PrintError => 0,
RaiseError => 1,
AutoCommit => 1,
sqlite_unicode => 1,
});
$CONN->mode('fixup'); # ⬅ ⬅ ⬅ enter fixup mode!
sub conn { $CONN }
sub dbh { $CONN->dbh }
However, you must be more careful with fixup mode than with ping mode, because a code block can be executed twice. So you must be sure to write it such that there are no side effects to multiple executions. Don’t do this, for example:
my $count = 0;
$conn->txn(fixup => sub {
shift->do('INSERT INTO foo (count) VALUES(?)', undef, ++$count);
});
say $count; # may be 1 or 2
Will it insert a value of 1
or 2
? It’s much safer to remove
non-transactional code from the block, like so:
my $count = 0;
++$count;
$conn->txn(fixup => sub {
shift->do('INSERT INTO foo (count) VALUES(?)', undef, $count);
});
say $count; # can only be 1
An even trickier pattern to watch out for is something like this:
my $user = 'rjbs';
$conn->run(fixup => sub {
my $dbh = shift;
$dbh->do('INSERT INTO users (nick) VALUES (?)', undef, $user);
# Do some other stuff...
$dbh->do('INSERT INTO log (msg) VALUES (?)', undef, 'Created user');
});
If the database disconnects between the first and second calls to do
, and
DBIx::Connector manages to re-connect and run the block again, you might get a
unique key violation on the first call to do
. This is because we’ve used the
run()
method. In the fist execution of the block, user “rjbs” was inserted and
autocommitted. On the second call, user “rjbs” is already there, and because
it’s a username, we get a unique key violation.
The rule of thumb here is to use run()
only for database reads, and to use
txn()
(and svp()
) for writes. txn()
will ensure that the transaction is
rolled back, so the second execution of the code block will be side-effect-free.
DBIx::Connector is derived from patterns originally implemented for DBIx::Class, though it’s nearly all original code. The upside for those of us who don’t use ORMs is that we get this independent piece of ORM-like behavior without its ORMishness. So if you’re a database geek like me, DBIx::Connector is a great way to reduce technical friction without buying into the whole idea of an ORM.
As it turns out, DBIx::Connector is good not just for straight-to-database users, but also for ORMs. Both DBIx::Class and Rose::DB have plans to replace their own caching and transaction-handling implementations with DBIx::Connector under the hood. That will be great for everyone, as the problems will all be solved in this one place.
This post originally appeared on the Perl Advent Calendar 2011.
I was at Postgres Open week before last. This was a great conference, very welcoming atmosphere and lots of great talks. One of the more significant, for me, was the session on serializable transactions by Kevin Grittner, who developed SSI for PostgreSQL 9.1. I hadn’t paid much attention to this feature before now, but it became clear to me, during the talk, that it’s time.
So what is SSI? Well, serializable transactions are almost certainly how you think of transactions already. Here’s how Kevin describes them:
True serializable transactions can simplify software development. Because any transaction which will do the right thing if it is the only transaction running will also do the right thing in any mix of serializable transactions, the programmer need not understand and guard against all possible conflicts. If this feature is used consistently, there is no need to ever take an explicit lock or SELECT FOR UPDATE/SHARE.
This is, in fact, generally how I’ve thought about transactions. But I’ve
certainly run into cases where it wasn’t true. Back in 2006, I wrote an article
on managing many-to-many relationships with PL/pgSQL which demonstrated a race
condition one might commonly find when using an ORM. The solution I offered was
to always use a PL/pgSQL function that does the work, and that function
executes a SELECT...FOR UPDATE
statement to overcome the race condition. This
creates a lock that forces conflicting transactions to be performed serially.
Naturally, this is something one would rather not have to think about. Hence
SSI. When you identify a transaction as serializable, it will be executed in a
truly serializable fashion. So I could actually do away with the
SELECT...FOR UPDATE
workaround — not to mention any other race conditions I
might have missed — simply by telling PostgreSQL to enforce transaction
isolation. This essentially eliminates the possibility of unexpected
side-effects.
This comes at a cost, however. Not in terms of performance so much, since the SSI implementation uses some fancy, recently-developed algorithms to keep things efficient. (Kevin tells me via IRC: “Usually the rollback and retry work is the bulk of the additional cost in an SSI load, in my testing so far. A synthetic load to really stress the LW locking, with a fully-cached database doing short read-only transactions will have no serialization failures, but can run up some CPU time in LW lock contention.”) No, the cost is actually in increased chance of transaction rollback. Because SSI will catch more transaction conflicts than the traditional “read committed” isolation level, frameworks that expect to work with SSI need to be prepared to handle more transaction failures. From the fine manual:
The Serializable isolation level provides the strictest transaction isolation. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. However, like the Repeatable Read level, applications using this level must be prepared to retry transactions due to serialization failures.
And that brings me to DBIx::Connector, my Perl module for safe connection and transaction management. It currently has no such retry smarts built into it. The feature closest to that is the “fixup” connection mode, wherein if a execution of a code block fails due to a connection failure, DBIx::Connector will re-connect to the database and execute the code reference again.
I think I should extend DBIx::Connector to take isolation failures and deadlocks
into account. That is, fixup
mode would retry a code block not only on
connection failure but also on serialization failure (SQLSTATE 40001) and
deadlocks (SQLSTATE 40P01). I would also add a new attribute, retries
, to
specify the number of times to retry such execution, with a default of three
(which likely will cover the vast majority of cases). This has actually been an
oft-requested feature, and I’m glad to have a new reason to add it.
There are a few design issues to overcome, however:
txn()
, which scopes the execution of a
code reference to a single transaction, but also run()
, which does no
transaction handling. Should the new retry support be added there, too? I
could see it either way (a single SQL statement executed in run()
is
implicitly transaction-scoped).svp()
, which scopes the execution of a
code reference to a savepoint (a.k.a. a subtransaction). Should the rollback
and retry be supported there, too, or would the whole transaction have to be
retried? I’m thinking the latter, since that’s currently the behavior for
connection failures.txn()
.This is do-able, will likely just take some experimentation to figure it out and settle on the appropriate API. I’ll need to find the tuits for that soon.
In the meantime, given currently in-progress changes, I’ve just released a new
version of DBIx::Connector with a single change: All uses of the deprecated
catch
syntax now throw warnings. The previous version threw warnings only the
first time the syntax was used in a particular context, to keep error logs from
getting clogged up. Hopefully most folks have changed their code in the two
months since the previous release and switched to Try::Tiny or some other
model for exception handling. The catch
syntax will be completely removed in
the next release of DBIx::Connector, likely around the end of the year.
Hopefully the new SSI-aware retry functionality will have been integrated by
then, too.
In a future post I’ll likely chew over whether or not to add an API to set the
transaction isolation level within a call to txn()
and friends.
Looking for the comments? Try the old layout.
In response to a bug report, I removed the documentation suggesting that one
use the catch
function exported by Try::Tiny to specify an
exception-handling function to the DBIx::Connector execution methods. When I
wrote those docs, Try::Tiny’s catch
method just returned the subroutine. It
was later changed to return an object, and that didn’t work very well. It seemed
a much better idea not to depend on an external function that could change its
behavior when there is no direct dependency on Try::Tiny in DBIx::Connector. I
removed that documentation in 0.43. So instead of this:
$conn->run(fixup => sub {
...
}, catch {
...
});
It now recommends this:
$conn->run(fixup => sub {
...
}, catch => sub {
...
});
Which frankly is better balanced anyway.
But in discussion with Mark Lawrence in the ticket, it has become
clear that there’s a bit of a design problem with this approach. And that
problem is that there is no try
keyword, only catch
. The fixup
in the
above example does not try
, but the inclusion of the catch
implicitly
makes it behave like try
. That also means that if you use the default mode
(which can be set via the mode
method), then there will usually be no
leading keyword, either. So we get something like this:
$conn->run(sub {
...
}, catch => sub {
...
});
So it starts with a sub {}
and no fixup
keyword, but there is a catch
keyword, which implicitly wraps that first sub {}
in a try
-like context. And
aesthetically, it’s unbalanced.
So I’m trying to decide what to do about these facts:
catch
implicitly makes the first sub be wrapped in a try
-type
context but without a try
-like keyword.catch
, then it looks
unbalanced.At one level, I’m beginning to think that it was a mistake to add the exception-handling code at all. Really, that should be the domain of another module like Try::Tiny or, better, the language. In that case, the example would become:
use Try::Tiny;
try {
$conn->run(sub {
...
});
} catch {
....
}
And maybe that really should be the recommended approach. It seems silly to have replicated most of Try::Tiny inside DBIx::Connector just to cut down on the number of anonymous subs and indentation levels. The latter can be got round with some semi-hinky nesting:
try { $conn->run(sub {
...
}) } catch {
...
}
Kind of ugly, though. The whole reason the catch
stuff was added to
DBIx::Connector was to make it all nice and integrated (as discussed here).
But perhaps it was not a valid tradeoff. I’m not sure.
So I’m trying to decide how to solve these problems. The options as I see them are:
Add another keyword to use before the first sub that means “the default mode”. I’m not keen on the word “default”, but it would look something like this:
$conn->run(default => sub {
...
}, catch => sub {
...
});
This would provide the needed balance, but the catch
would still
implicitly execute the first sub in a try
context. Which isn’t a great
idea.
Add a try
keyword. So then one could do this:
$conn->run(try => sub {
...
}, catch => sub {
...
});
This makes it explicit that the first sub executes in a try
context. I’d
also have to add companion try_fixup
, try_ping
, and try_no_ping
keywords. Which are ugly. And furthermore, if there was no try
keyword,
would a catch
be ignored? That’s what would be expected, but it changes
the current behavior.
Deprecate the try
/catch
stuff in DBIx::Connector and eventually remove
it. This would simplify the code and leave the responsibility for exception
handling to other modules where it’s more appropriate. But it would also be
at the expense of formatting; it’s just a little less aesthetically pleasing
to have the try
/catch
stuff outside the method calls. But maybe it’s
just more appropriate.
I’m leaning toward #3, but perhaps might do #1 anyway, as it’d be nice to be
more explicit and one would get the benefit of the balance with catch
blocks
for as long as they’re retained. But I’m not sure yet. I want your feedback on
this. How do you want to use exception-handling with DBIx::Connector? Leave me a
comment here or on the ticket.
Looking for the comments? Try the old layout.
I’ve just released DBIx::Connector 0.42 to CPAN. This release fixes a serious
bug with catch
blocks. In short, if you threw an exception from inside a catch
block, it would not be detectable from outside. For example, given this code:
eval {
$conn->run(sub { die 'WTF' }, catch => sub { die 'OMG!' });
};
if (my $err = $@) {
say "We got an error: $@\n";
}
With DBIx::Connector 0.41 and lower, the if
block would never be called,
because even though the catch block threw an exception, $@
was not set. In
other words, the exception would not be propagated to its caller. This could be
terribly annoying, as you can imagine. I was being a bit too clever about
localizing $@
, with the scope much too broad. 0.42 uses a much tighter scope
to localize $@
, so now it should propagate properly everywhere.
So if you’re using DBIx::Connector catch
blocks, please upgrade ASAP. Sorry
for the hassle.
Looking for the comments? Try the old layout.
I get email:
Hey David,
I ran in to an issue earlier today in production that, while it is an error in my code, DBIx::Connector could easily handle the issue better. Here’s the use case:
package Con; use Moose; sub txn { my ($self, $code) = @_; my @ret; warn "BEGIN EVAL\n"; eval{ @ret = $code->() }; warn "END EVAL\n"; die "DIE: $@" if $@; return @ret; } package main; my $c = Con->new(); foreach (1..2) { $c->txn(sub{ next; }); }
The result of this is:
BEGIN EVAL Exiting subroutine via next at test.pl line 16. Exiting eval via next at test.pl line 16. Exiting subroutine via next at test.pl line 16. BEGIN EVAL Exiting subroutine via next at test.pl line 16. Exiting eval via next at test.pl line 16. Exiting subroutine via next at test.pl line 16.
This means that any code after the eval block is not executed. And, in the case of DBIx::Connector, means the transaction is not committed or rolled back, and the next call to is
txn()
mysteriously combined with the previoustxn()
call. A quick fix for this is to just add a curly brace in to the eval:eval{ { @ret = $code->() } };
Then the results are more what we’d expect:
BEGIN EVAL Exiting subroutine via next at test.pl line 16. END EVAL BEGIN EVAL Exiting subroutine via next at test.pl line 16. END EVAL
I’ve fixed my code to use
return;
instead ofnext;
, but I think this would be a useful fix for DBIx::Connector so that it doesn’t act in such an unexpected fashion when the developer accidentally calls next.
The fix here is pretty simple, but I’m not sure I want to get into the business of defending against programmer mistakes like this in DBIx::Connector or any module.
What do you think?
Looking for the comments? Try the old layout.
Following a very good suggestion from Pedro Melo, I’ve created a Git repository for this tutorial and put it on GitHub. I replayed each step, making each into its own commit, and tagged the state of the code for each entry:
So as I continue to make modifications, I’ll keep this repository up-to-date,
and tag things as of each blog entry. This will make it easy for you to follow
along; you can simply clone the repository and git pull
for each post.
More soon.
Looking for the comments? Try the old layout.
Following up on my post yesterday introducing Catalyst::View::TD, today I’d like to continue with the next step in chapter 3 of the Catalyst tutorial. The twist here is that I’m going to use PostgreSQL for the database back-end and start introducing some database best practices. I’m also going to make use of my DBIx::Connector module to interact with the database.
Picking up with the database creation section of the tutorial, the first
change I’d like to make is to use a natural key for the books table. All books
have unique identifiers, known as ISBNs, so creating a surrogate key (the
typical id
column in ORM-managed applications) is redundant. One of the nice
things about PostgreSQL is that it ships with a contributed library, isn,
which validates ISBN and other international identifiers. So we use this contrib
module (usually included in package-installed PostgreSQL servers) for the
primary key for books. If you need to install it from source, it’s pretty easy:
cd postgresql-8.4.1/contrib/isn
make
make install
Ideally I’d use a natural key for the authors table too, but despite some attempts to create universal identifiers for authors, nothing has really caught on as far as I know. So I’ll just stick to a surrogate key for now.
First step: create the database and install isn if it’s not already included in the template database:
createdb -U postgres myapp
psql -U postgres -d myapp -f /usr/local/pgsql/share/contrib/isn.sql
The isn.sql
file may be somewhere else on your system. Now let’s create the
database. Create sql/001-books.sql
in the MyApp
directory and paste this
into it:
BEGIN;
CREATE TABLE books (
isbn ISBN13 PRIMARY KEY,
title TEXT NOT NULL DEFAULT '',
rating SMALLINT NOT NULL DEFAULT 0 CHECK (rating BETWEEN 0 AND 5)
);
CREATE TABLE authors (
id BIGSERIAL PRIMARY KEY,
surname TEXT NOT NULL DEFAULT '',
given_name TEXT NOT NULL DEFAULT ''
);
CREATE TABLE book_author (
isbn ISBN13 REFERENCES books(isbn),
author_id BIGINT REFERENCES authors(id),
PRIMARY KEY (isbn, author_id)
);
INSERT INTO books
VALUES ('1587201534', 'CCSP SNRS Exam Certification Guide', 5),
('978-0201633467', 'TCP/IP Illustrated, Volume 1', 5),
('978-0130183804', 'Internetworking with TCP/IP Vol.1', 4),
('978-1-56592-243-3', 'Perl Cookbook', 5),
('978-0735712010', 'Designing with Web Standards', 5)
;
INSERT INTO authors
VALUES (1, 'Greg', 'Bastien'),
(2, 'Sara', 'Nasseh'),
(3, 'Christian', 'Degu'),
(4, 'Richard', 'Stevens'),
(5, 'Douglas', 'Comer'),
(6, 'Tom', 'Christiansen'),
(7, 'Nathan', 'Torkington'),
(8, 'Jeffrey', 'Zeldman')
;
INSERT INTO book_author
VALUES ('1587201534', 1),
('1587201534', 2),
('1587201534', 3),
('978-0201633467', 4),
('978-0130183804', 5),
('978-1-56592-243-3', 6),
('978-1-56592-243-3', 7),
('978-0735712010', 8)
;
COMMIT;
Yeah, I Googled for the ISBNs for those books. I found the ISBN-13 number for
most of them, but it handles the old ISBN-10 format, too, automatically
upgrading it to ISBN-13. I also added a CHECK
constraint for the rating
column, to be sure that the value is always BETWEEN 0 AND 5
. I also like to
include default values where it’s sensible to do so, and that syntax for
inserting multiple rows at once is pretty nice to have.
Go ahead and run this against your database:
psql -U postgres -d myapp -f sql/001-books.sql
Now if you connect to the server, you should be able to query things like so:
$ psql -U postgres myapp
psql (8.4.1)
Type "help" for help.
myapp=# select * from books;
isbn | title | rating
-------------------+------------------------------------+--------
978-1-58720-153-0 | CCSP SNRS Exam Certification Guide | 5
978-0-201-63346-7 | TCP/IP Illustrated, Volume 1 | 5
978-0-13-018380-4 | Internetworking with TCP/IP Vol.1 | 4
978-1-56592-243-3 | Perl Cookbook | 5
978-0-7357-1201-0 | Designing with Web Standards | 5
(5 rows)
Great! The database is set. Now we need a way for the app to talk to it. I’ve
not yet decided how I’m going to integrate DBIx::Connector into a Catalyst
model class; maybe I’ll figure it out as I write these posts. But since my
mantra is “the database is the model,â€? for now I won’t bother with a model
at all. Instead, I’ll create a simple accessor in MyApp
so we can easily get
at the database connection wherever we need it. To do that, add these lines to
lib/MyApp.pm
:
use Moose;
use DBIx::Connector;
use Exception::Class::DBI;
has conn => (is => 'ro', lazy => 1, default => sub {
DBIx::Connector->new( 'dbi:Pg:dbname=myapp', 'postgres', '', {
PrintError => 0,
RaiseError => 0,
HandleError => Exception::Class::DBI->handler,
AutoCommit => 1,
pg_enable_utf8 => 1,
});
});
We load Moose to get the has
keyword, the officially sanctioned interface
for defining attributes in Catalyst classes. Then I use that keyword to create
the conn
attribute. This attribute is read-only and has a DBIx::Connector
object for its default value. The nice thing about this is that the
DBIx::Connector object won’t be instantiated until it’s actually needed, and
then it will be kept forever. We never have to do anything else to use it.
Oh, and I like to make sure that text data coming back from PostgreSQL is properly encoded as UTF-8, and I like to use Exception::Class::DBI to turn DBI errors into exception objects.
Now it’s time to update our controller and template to fetch actual data from
the database. Edit lib/MyApp/Controller/Books.pm
and change the list
method
to:
sub list : Local {
my ($self, $c) = @_;
$c->stash->{books} = $c->conn->run(fixup => sub {
my $sth = $_->prepare('SELECT isbn, title, rating FROM books');
$sth->execute;
$sth;
});
}
All we’re doing here is creating a statement handle for the query, executing the
query, and storing the statement handle in the stash. Now we need to update the
template to use the statement handle. Open up
lib/MyApp/Templates/HTML/Books.pm
and change the list
template to:
template list => sub {
my ($self, $args) = @_;
table {
row {
th { 'Title' };
th { 'Rating' };
th { 'Author' };
};
my $sth = $args->{books};
while (my $book = $sth->fetchrow_hashref) {
row {
cell { $book->{title} };
cell { $book->{rating} };
cell { $book->{author} };
};
};
};
};
All we do is fetch each row from the statement handle and output it. The only thing that’s changed is the use of the statement handle as an iterator rather than an array reference.
And now we’re set! Restart your server with script/myapp_server.pl
and point
your browser at http://localhost:3000/books/list
. Now you should see the book
titles and ratings, though the authors still aren’t present. We’ll fix that in a
later post.
The takeaway from this post: Use PostgreSQL’s support for custom data types to create validated natural keys for your data, and use a stable, persistent database connection to talk directly to the database. No need for an ORM here, as the DBI provides a very Perlish access to a very capable DSL for models called SQL.
More soon.
Looking for the comments? Try the old layout.
After much gnashing of teeth, heated arguments with @robkinon and @mst, lots of deep thesaurus spelunking, and three or four iterations, I finally came up with an an improved API for DBIx::Connector that I believe is straight-forward and easy to explain.
Following up on my post last week, I explored, oh I dunno, a hundred different terms for the various methods? I’ve never spent so much time on thesaurus.com in my life. Part of what added to the difficulty was that @mst seemed to think that there should actually be three modes for each block method: one that pings, one that doesn’t, and one that tries again if a block dies and the connection is down. So I went from six to nine methods with that assertion.
What I finally came up with was to name the three basic methods run()
,
txn_run()
, and svp_run()
, and these would neither ping nor retry in the
event of failure. Then I added variations on these methods that would ping and
that would try to fix failures. I called these “ping runs” and “fixup runs,”
respectively. It was the latter term, “fixup,” that had been so hard for me to
nail down, as “retry” seemed to say that the method was a retry, while “fixup”
more accurately reflects that the method would try to fix up the connection in
the event of a failure.
Once I’d implemented this interface, I now had nine methods:
run()
txn_run()
svp_run()
ping_run()
txn_ping_run()
svp_ping_run()
fixup_run()
txn_fixup_run()
svp_fixup_run()
This worked great. Then I went about documenting it. Jesus Christ what a pain! I realized that all these similarly-named methods would require a lot of explanation. I duly wrote up said explanation, and just wasn’t happy with it. It just felt to me like all the explanation made it too difficult to decide what methods to use and when. Such confusion would make the module less likely to be used – and certainly less likely to be used efficiently.
So I went back to the API drawing board and, reflecting on @robkinyon’s browbeating about decorating methods and @mst’s coming to that conclusion as well, I finally came up with just three methods:
run()
txn()
svp()
For any one of these, you can call it by passing a block, of course:
$conn->txn( sub { $_->do('SELECT some_function()') } );
In addition, you can now have any one of them run in one of three modes: the default (no ping), “ping”, or “fixup”:
$conn->txn( fixup => sub { $_->do('SELECT some_function()') } );
It’s much easier to explain the three methods in terms of how the block is transactionally encapsulated, as that’s the only difference between them. Once that’s understood, it’s pretty easy to explain how to change the “connection mode” of each by passing in a leading string. It even looks pretty nice. I’m really happy with this
One thing that increased the difficulty in coming up with this API was that @mst
felt that by default the methods should neither ping nor try to fix up a
failure. I was resistant to this because it’s not how Apache::DBI or
connect_cached()
work: they always ping. It turns out that DBIx::Class
doesn’t cache connections at all. I thought it had. Rather, it creates a
connection and simply hangs onto it as a scalar variable. It handles the
connection for as long as it’s in scope, but includes no magic global caching.
This reduces the action-at-a-distance issues common with caching while
maintaining proper fork
- and thread-safety.
At this point, I took a baseball bat to my desk.
Figuratively, anyway. I did at least unleash a mountain of curses upon @mst and various family relations. Because it took me a few minutes to see it: It turns out that DBIx::Class is right to do it this way. So I ripped out the global caching from DBIx::Connector, and suddenly it made much more sense not to ping by default – just as you wouldn’t ping if you created a DBI handle yourself.
DBIx::Connector is no longer a caching layer over the DBI. It’s now a proxy
for a connection. That’s it. There is no magic, no implicit behavior, so it’s
easier to use. And because it ensures fork
- and thread-safety, you can
instantiate a connector and hold onto it for whenever you need it, unlike using
the DBI itself.
And one more thing: I also added a new method, with()
. For those who always
want to use the same connection mode, you can use this method to create a proxy
object that has a different default mode. (Yes, a proxy for a proxy for a
database handle. Whatever!) Use it like this:
$conn->with('fixup')->run( sub { ... } );
And if you always want to use the same mode, hold onto the proxy instead of the connection object:
my $proxy = DBIx::Connector->(@args)->with('fixup');
# later ...
$proxy->txn( sub { ... } ); # always in fixup mode
So while fixup mode is no longer the default, as Tim requested, but it can
optionally be made the default, as DBIx::Class requires. The with()
method
will also be the place to add other global behavioral modifications, such as
DBIx::Class’s auto_savepoint
feature.
So for those of you who were interested in the first iteration of this module,
my apologies for changing things so dramatically in this release (ripping out
the global caching, deprecating methods, adding a new block method API, etc.).
But I think that, for all the pain I went through to come up with the new API –
all the arguing on IRC, all the thesaurus spelunking – that this is a really
good API, easy to explain and understand, and easy to use. And I don’t expect to
change it again. I might improve exceptions (use objects instead of strings?)
add block method exception handling (perhaps adding a catch
keyword?), but the
basics are finally nailed down and here to stay.
Thanks to @mst, @robkinyon, and @ribasushi, in particular, for bearing with me and continuing to hammer on me when I was being dense.
Looking for the comments? Try the old layout.
Thanks to feedback from Tim Bunce and Peter Rabbitson in a DBIx::Class bug
report, I’ve been reworking DBIx::Connector’s block-handling methods. Tim’s
objection is that the the feature of do()
and txn_do()
that executes the
code reference a second time in the event of a connection failure can be
dangerous. That is, it can lead to action-at-a-distance bugs that are hard to
find and fix. Tim suggested renaming the methods do_with_retry()
and
txn_do_with_retry()
in order to make explicit what’s going on, and to have
non-retry versions of the methods.
I’ve made this change in the repository. But I wasn’t happy with the method
names; even though they’re unambiguous, they are also overly long and not very
friendly. I want people to use the retrying methods, but felt that the long
names make the non-retrying preferable to users. While I was at it, I also
wanted to get rid of do()
, since it quickly became clear that it could cause
some confusion with the DBI’s do()
method.
I’ve been thesaurus spelunking for the last few days, and have come up with a
few options, but would love to hear other suggestions. I like using run
instead of do
to avoid confusion with the DBI, but otherwise I’m not really
happy with what I’ve come up with. There are basically five different methods
(using Tim’s suggestions for the moment):
run( sub {} )
txn_run( sub {} )
run_with_retry( sub {} )
txn_run_with_rerun( sub {} )
run_with_retry()
, but run the block inside a transaction.svp_run( sub {} )
Here are some of the names I’ve come up with so far:
Run block | Run in txn | Run in savepoint | Run with retry | Run in txn with retry | Retry Mnemonic |
---|---|---|---|---|---|
run |
txn_run |
svp_run |
runup |
txn_runup |
Run assuming the db is up, retry if not. |
run |
txn_run |
svp_run |
run_up |
txn_run_up |
Same as above. |
run |
txn_run |
svp_run |
rerun |
txn_rerun |
Run assuming the db is up, rerun if not. |
run |
txn_run |
svp_run |
run::retry |
txn_run::retry |
:: means “with” |
That last one is a cute hack suggested by Rob Kinyon on IRC. As you can see,
I’m pretty consistent with the non-retrying method names; it’s the methods that
retry that I’m not satisfied with. A approach I’ve avoided is to use an adverb
for the non-retry methods, mainly because there is no retry possible for the
savepoint methods, so it seemed silly to have svp_run_safely()
to complement
do_safely()
and txn_do_safely()
.
Brilliant suggestions warmly appreciated.
Looking for the comments? Try the old layout.
As part of my ongoing effort to wrestle Catalyst into working the way that I
think it should work, I’ve just uploaded DBIx::Connector to the CPAN. See, I
was using Catalyst::Model::DBI, but it turned out that I wanted to use the
database handle in places other than the Catalyst parts of my app. I was
bitching about this to mst on #catalyst, and he said that
Catalyst::Model::DBI was actually a fork of DBIx::Class’s handle caching, and
quite out of date. I said, “But this already exists. It’s called
connect_cached()
.” I believe his response was, “OH FUCK OFF!”
So I started digging into what Catalyst::Model::DBI and DBIx::Class do to cache
their database handles, and how it differs from connect_cached()
. It turns out
that they were pretty smart, in terms of checking to see if the process had
forked or a new thread had been spawned, and if so, deactivating the old handle
and then returning a new one. Otherwise, things are just cached. This approach
works well in Web environments, including under mod_perl; in forking
applications, like POE apps; and in plain Perl scripts. Matt said he’d always
wanted to pull that functionality out of DBIx::Class and then make DBIx::Class
depend on the external implementation. That way everyone could take advantage of
the functionality, including people like me who don’t want to use an ORM.
So I did it. Maybe it was crazy (mmmmm…yak meat), but I can now use the same database interface in the Catalyst and POE parts of my application without worry:
my $dbh = DBIx::Connector->connect(
'dbi:Pg:dbname=circle', 'postgres', '', {
PrintError => 0,
RaiseError => 0,
AutoCommit => 1,
HandleError => Exception::Class::DBI->handler,
pg_enable_utf8 => 1,
},
);
$dbh->do($sql);
But it’s not just database handle caching that I’ve included in DBIx::Connector;
no, I’ve also stolen some of the transaction management stuff from DBIx::Class.
All you have to do is grab the connector object which encapsulates the database
handle, and take advantage of its txn_do()
method:
my $conn = DBIx::Connector->new(@args);
$conn->txn_do(sub {
my $dbh = shift;
$dbh->do($_) for @queries;
});
The transaction is scoped to the code reference passed to txn_do()
. Not only
that, it avoids the overhead of calling ping()
on the database handle unless
something goes wrong. Most of the time, nothing goes wrong, the database is
there, so you can proceed accordingly. If it is gone, however, txn_do()
will
re-connect and execute the code reference again. The cool think is that you will
never notice that the connection was dropped – unless it’s still gone after the
second execution of the code reference.
And finally, thanks to some pushback from mst, ribasushi, and others, I added
savepoint support. It’s a little different than that provided by DBIx::Class;
instead of relying on a magical auto_savepoint
attribute that subtly changes
the behavior of txn_do()
, you just use the svp_do()
method from within
txn_do()
. The scoping of subtransactions is thus nicely explicit:
$conn->txn_do(sub {
my $dbh = shift;
$dbh->do('INSERT INTO table1 VALUES (1)');
eval {
$conn->svp_do(sub {
shift->do('INSERT INTO table1 VALUES (2)');
die 'OMGWTF?';
});
};
warn "Savepoint failed\n" if $@;
$dbh->do('INSERT INTO table1 VALUES (3)');
});
This transaction will insert the values 1 and 3, but not 2. If you call
svp_do()
outside of txn_do()
, it will call txn_do()
for you, with the
savepoint scoped to the entire transaction:
$conn->svp_do(sub {
my $dbh = shift;
$dbh->do('INSERT INTO table1 VALUES (4)');
$conn->svp_do(sub {
shift->do('INSERT INTO table1 VALUES (5)');
});
});
This transaction will insert both 3 and 4. And note that you can nest savepoints as deeply as you like. All this is dependent on whether the database supports savepoints; so far, PostgreSQL, MySQL (InnoDB), Oracle, MSSQL, and SQLite do. If you know of others, fork the repository, commit changes to a branch, and send me a pull request!
Overall I’m very happy with this module, and I’ll probably use it in all my Perl database projects from here on in. Perhaps later I’ll build a model class on it (something like Catalyst::Model::DBI, only better!), but next up, I plan to finish documenting Template::Declare and writing some views with it. More on that soon.
Looking for the comments? Try the old layout.