Just a Theory

By David E. Wheeler

Posts about DBI

Use of DBI in Sqitch

Sqitch uses the native database client applications (psql, sqlite3, mysql, etc.). So for tracking metadata about the state of deployments, I have been trying to stick to using them. I’m first targeting PostgreSQL, and as a result need to open a connection to psql, start a transaction, and be able to read and write stuff to it as migrations go along. The IPC is a huge PITA. Furthermore, getting things properly quoted is also pretty annoying — and it will be worse for SQLite and MySQL, I expect (psql’s --set support is pretty slick).

If, on the other hand, I used the DBI, on the other hand, all this would be very easy. There is no IPC, just a direct connection to the database. It would save me a ton of time doing development, and be robust and safer to use (e.g., exception handling rather than platform-dependent signal handling (or not, in the case of Windows)). I am quite tempted to just so that.

However, I have been trying to be sensitive to dependencies. I had planned to make Sqitch simple to install on any system, and if you had the command-line client for your preferred database, it would just work. If I used the DBI instead, then Sqitch would not work at all unless you installed the appropriate DBI driver for your database of choice. This is no big deal for Perl people, of course, but I don’t want this to be a Perl people tool. I want it to be dead simple for anyone to use for any database. Ideally, there will be RPMs and Ubuntu packages, so one can just install it and go, and not have to worry about figuring out what additional Perl DBD to install for your database of choice. It should be transparent.

That is still my goal, but at this point the IPC requirements for controlling the clients is driving me a little crazy. Should I just give up and use the DBI (at least for now)? Or persevere with the IPC stuff and get it to work? Opinions wanted!

Looking for the comments? Try the old layout.

More about…

DBIx::Connector and Serializable Snapshot Isolation

I was at Postgres Open week before last. This was a great conference, very welcoming atmosphere and lots of great talks. One of the more significant, for me, was the session on serializable transactions by Kevin Grittner, who developed SSI for PostgreSQL 9.1. I hadn’t paid much attention to this feature before now, but it became clear to me, during the talk, that it’s time.

So what is SSI? Well, serializable transactions are almost certainly how you think of transactions already. Here’s how Kevin describes them:

True serializable transactions can simplify software development. Because any transaction which will do the right thing if it is the only transaction running will also do the right thing in any mix of serializable transactions, the programmer need not understand and guard against all possible conflicts. If this feature is used consistently, there is no need to ever take an explicit lock or SELECT FOR UPDATE/SHARE.

This is, in fact, generally how I’ve thought about transactions. But I’ve certainly run into cases where it wasn’t true. Back in 2006, I wrote an article on managing many-to-many relationships with PL/pgSQL which demonstrated a race condition one might commonly find when using an ORM. The solution I offered was to always use a PL/pgSQL function that does the work, and that function executes a SELECT...FOR UPDATE statement to overcome the race condition. This creates a lock that forces conflicting transactions to be performed serially.

Naturally, this is something one would rather not have to think about. Hence SSI. When you identify a transaction as serializable, it will be executed in a truly serializable fashion. So I could actually do away with the SELECT...FOR UPDATE workaround — not to mention any other race conditions I might have missed — simply by telling PostgreSQL to enforce transaction isolation. This essentially eliminates the possibility of unexpected side-effects.

This comes at a cost, however. Not in terms of performance so much, since the SSI implementation uses some fancy, recently-developed algorithms to keep things efficient. (Kevin tells me via IRC: “Usually the rollback and retry work is the bulk of the additional cost in an SSI load, in my testing so far. A synthetic load to really stress the LW locking, with a fully-cached database doing short read-only transactions will have no serialization failures, but can run up some CPU time in LW lock contention.”) No, the cost is actually in increased chance of transaction rollback. Because SSI will catch more transaction conflicts than the traditional “read committed” isolation level, frameworks that expect to work with SSI need to be prepared to handle more transaction failures. From the fine manual:

The Serializable isolation level provides the strictest transaction isolation. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. However, like the Repeatable Read level, applications using this level must be prepared to retry transactions due to serialization failures.

And that brings me to DBIx::Connector, my Perl module for safe connection and transaction management. It currently has no such retry smarts built into it. The feature closest to that is the “fixup” connection mode, wherein if a execution of a code block fails due to a connection failure, DBIx::Connector will re-connect to the database and execute the code reference again.

I think I should extend DBIx::Connector to take isolation failures and deadlocks into account. That is, fixup mode would retry a code block not only on connection failure but also on serialization failure (SQLSTATE 40001) and deadlocks (SQLSTATE 40P01). I would also add a new attribute, retries, to specify the number of times to retry such execution, with a default of three (which likely will cover the vast majority of cases). This has actually been an oft-requested feature, and I’m glad to have a new reason to add it.

There are a few design issues to overcome, however:

  • Fixup mode is supported not just by txn(), which scopes the execution of a code reference to a single transaction, but also run(), which does no transaction handling. Should the new retry support be added there, too? I could see it either way (a single SQL statement executed in run() is implicitly transaction-scoped).
  • Fixup mode is also supported by svp(), which scopes the execution of a code reference to a savepoint (a.k.a. a subtransaction). Should the rollback and retry be supported there, too, or would the whole transaction have to be retried? I’m thinking the latter, since that’s currently the behavior for connection failures.
  • Given these issues, will it make more sense to perhaps create a new mode? Maybe it would be supported only by txn().

This is do-able, will likely just take some experimentation to figure it out and settle on the appropriate API. I’ll need to find the tuits for that soon.

In the meantime, given currently in-progress changes, I’ve just released a new version of DBIx::Connector with a single change: All uses of the deprecated catch syntax now throw warnings. The previous version threw warnings only the first time the syntax was used in a particular context, to keep error logs from getting clogged up. Hopefully most folks have changed their code in the two months since the previous release and switched to Try::Tiny or some other model for exception handling. The catch syntax will be completely removed in the next release of DBIx::Connector, likely around the end of the year. Hopefully the new SSI-aware retry functionality will have been integrated by then, too.

In a future post I’ll likely chew over whether or not to add an API to set the transaction isolation level within a call to txn() and friends.

Looking for the comments? Try the old layout.

DBIx::Connector Exception Handling Design

In response to a bug report, I removed the documentation suggesting that one use the catch function exported by Try::Tiny to specify an exception-handling function to the DBIx::Connector execution methods. When I wrote those docs, Try::Tiny’s catch method just returned the subroutine. It was later changed to return an object, and that didn’t work very well. It seemed a much better idea not to depend on an external function that could change its behavior when there is no direct dependency on Try::Tiny in DBIx::Connector. I removed that documentation in 0.43. So instead of this:

$conn->run(fixup => sub {
    ...
}, catch {
    ...
});

It now recommends this:

$conn->run(fixup => sub {
    ...
}, catch => sub {
    ...
});

Which frankly is better balanced anyway.

But in discussion with Mark Lawrence in the ticket, it has become clear that there’s a bit of a design problem with this approach. And that problem is that there is no try keyword, only catch. The fixup in the above example does not try, but the inclusion of the catch implicitly makes it behave like try. That also means that if you use the default mode (which can be set via the mode method), then there will usually be no leading keyword, either. So we get something like this:

$conn->run(sub {
    ...
}, catch => sub {
    ...
});

So it starts with a sub {} and no fixup keyword, but there is a catch keyword, which implicitly wraps that first sub {} in a try-like context. And aesthetically, it’s unbalanced.

So I’m trying to decide what to do about these facts:

  • The catch implicitly makes the first sub be wrapped in a try-type context but without a try-like keyword.
  • If one specifies no mode for the first sub but has a catch, then it looks unbalanced.

At one level, I’m beginning to think that it was a mistake to add the exception-handling code at all. Really, that should be the domain of another module like Try::Tiny or, better, the language. In that case, the example would become:

use Try::Tiny;
try {
    $conn->run(sub {
        ...
    });
} catch {
    ....
}

And maybe that really should be the recommended approach. It seems silly to have replicated most of Try::Tiny inside DBIx::Connector just to cut down on the number of anonymous subs and indentation levels. The latter can be got round with some semi-hinky nesting:

try { $conn->run(sub {
    ...
}) } catch {
    ...
}

Kind of ugly, though. The whole reason the catch stuff was added to DBIx::Connector was to make it all nice and integrated (as discussed here). But perhaps it was not a valid tradeoff. I’m not sure.

So I’m trying to decide how to solve these problems. The options as I see them are:

  1. Add another keyword to use before the first sub that means “the default mode”. I’m not keen on the word “default”, but it would look something like this:

    $conn->run(default => sub {
        ...
    }, catch => sub {
        ...
    });

    This would provide the needed balance, but the catch would still implicitly execute the first sub in a try context. Which isn’t a great idea.

  2. Add a try keyword. So then one could do this:

    $conn->run(try => sub {
        ...
    }, catch => sub {
        ...
    });

    This makes it explicit that the first sub executes in a try context. I’d also have to add companion try_fixup, try_ping, and try_no_ping keywords. Which are ugly. And furthermore, if there was no try keyword, would a catch be ignored? That’s what would be expected, but it changes the current behavior.

  3. Deprecate the try/catch stuff in DBIx::Connector and eventually remove it. This would simplify the code and leave the responsibility for exception handling to other modules where it’s more appropriate. But it would also be at the expense of formatting; it’s just a little less aesthetically pleasing to have the try/catch stuff outside the method calls. But maybe it’s just more appropriate.

I’m leaning toward #3, but perhaps might do #1 anyway, as it’d be nice to be more explicit and one would get the benefit of the balance with catch blocks for as long as they’re retained. But I’m not sure yet. I want your feedback on this. How do you want to use exception-handling with DBIx::Connector? Leave me a comment here or on the ticket.

Looking for the comments? Try the old layout.

Serious Exception-Handling Bug Fixed in DBIx::Connector 0.42

I’ve just released DBIx::Connector 0.42 to CPAN. This release fixes a serious bug with catch blocks. In short, if you threw an exception from inside a catch block, it would not be detectable from outside. For example, given this code:

eval {
    $conn->run(sub { die 'WTF' }, catch => sub { die 'OMG!' });
};
if (my $err = $@) {
    say "We got an error: $@\n";
}

With DBIx::Connector 0.41 and lower, the if block would never be called, because even though the catch block threw an exception, $@ was not set. In other words, the exception would not be propagated to its caller. This could be terribly annoying, as you can imagine. I was being a bit too clever about localizing $@, with the scope much too broad. 0.42 uses a much tighter scope to localize $@, so now it should propagate properly everywhere.

So if you’re using DBIx::Connector catch blocks, please upgrade ASAP. Sorry for the hassle.

Looking for the comments? Try the old layout.

Execute SQL Code on Connect

I’ve been writing a fair bit of PL/Perl for a client, and one of the things I’ve been doing is eliminating a ton of duplicate code by creating utility functions in the %_SHARED hash. This is great, as long as the code that creates those functions gets executed at the beginning of every database connection. So I put the utility generation code into a single function, called prepare_perl_utils(). It looks something like this:

CREATE OR REPLACE FUNCTION prepare_perl_utils(
) RETURNS bool LANGUAGE plperl IMMUTABLE AS $$
    # Don't bother if we've already loaded.
    return 1 if $_SHARED{escape_literal};

    $_SHARED{escape_literal} = sub {
        $_[0] =~ s/'/''/g; $_[0] =~ s/\\/\\\\/g; $_[0];
    };

    # Create other code refs in %_SHARED…
$$;

So now all I have to do is make sure that all the client’s apps execute this function as soon as they connect, so that the utilities will all be loaded up and ready to go. Here’s how I did it.

First, for the Perl app, I just took advantage of the DBI’s callbacks to execute the SQL I need when the DBI connects to the database. That link might not work just yet, as the DBI’s callbacks have only just been documented and that documentation appears only in dev releases so far. Once 1.611 drops, the link should work. At any rate, the use of callbacks I’m exploiting here has been in the DBI since 1.49, which was released in November 2005.

The approach is the same as I’ve described before: Just specify the Callbacks parameter to DBI->connect, like so:

my $dbh = DBI->connect_cached($dsn, $user, $pass, {
    PrintError     => 0,
    RaiseError     => 1,
    AutoCommit     => 1,
    Callbacks      => {
        connected => sub { shift->do('SELECT prepare_perl_utils()' },
    },
});

That’s it. The connected method is a no-op in the DBI that gets called to alert subclasses that they can do any post-connection initialization. Even without a subclass, we can take advantage of it to do our own initialization.

It was a bit trickier to make the same thing happen for the client’s Rails app. Rails, alas, provides no on-connection callbacks. So we instead have to monkey-patch Rails to do what we want. With some help from “dfr|mac” on #rubyonrails (I haven’t touched Rails in 3 years!), I got it worked down to this:

class ActiveRecord::ConnectionAdapters::PostgreSQLAdapter
    def initialize_with_perl_utils(*args)
    returning(initialize_without_perl_utils(*args)) do
        execute('SELECT prepare_perl_utils()')
    end
    end
    alias_method_chain :initialize, :perl_utils
end

Basically, we overpower the PostgreSQL adapter’s initialize method and have it call initialize_with_perl_utils before it returns. It’s a neat trick; if you’re going to practice fuck typing, alias_method_chain makes it about as clean as can be, albeit a little too magical for my tastes.

Anyway, recorded here for posterity (my blog is my other brain!).

Looking for the comments? Try the old layout.

My Catalyst Tutorial: Add Authors to the View

Another post in my ongoing series of posts on using Catalyst with Template::Declare and DBIx::Connector. This will be the last post covering material from chapter 3, I promise. This is a fun one, though, because we continue to use this really nice DSL called “SQL,” which I think is more expressive than an ORM would be.

To whit, the next task is to add the missing list of authors to the book list. The thing is, the more I work with databases, the more I’m inclined to think about them not only as the “M” in “MVC”, but also the “V”. I’ll show you what I mean.

A Quick Fix

But first, a diversion. In the second post in this series, I created an SQL statement to insert book authors, but I made a mistake: the values for surnames and given names were reversed. Oops. Furthermore, I included explicit author IDs, even though the id column uses a sequence for it’s default value. So first we need to fix these issues. Change the INSERT INTO authors statement in sql/001-books.sql to:

INSERT INTO authors (surname, given_name)
VALUES ('Bastien',      'Greg'),
       ('Nasseh',       'Sara'),
       ('Degu',         'Christian'),
       ('Stevens',      'Richard'),
       ('Comer',        'Douglas'),
       ('Christiansen', 'Tom'),
       ('Torkington',   'Nathan'),
       ('Zeldman',      'Jeffrey')
;

This time, we’re letting the sequence populate the id column. Fortunately, it starts from 1 just like we did, so we don’t need to update the values in the INSERT INTO book_author statement. Now let’s fix the database:

DELETE FROM book_author;
DELETE FROM authors;

Then run the above SQL query to restore the authors with their proper names, and then run the INSERT INTO book_author statement. That will get us back in business.

Constructing our Query

Now it’s time for the fun. The original SQL query we wrote to get the list of books was:

SELECT isbn, title, rating FROM books;

Nothing unusual there. But to get at the authors, we need to join to book_author and from there to authors. Our first cut looks like this:

SELECT b.isbn, b.title, b.rating, a.surname
  FROM books       b
  JOIN book_author ba ON b.isbn       = ba.isbn
  JOIN authors     a  ON ba.author_id = a.id;

Which yields this data:

       isbn        |               title                | rating |   surname    
-------------------+------------------------------------+--------+--------------
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | Bastien
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | Nasseh
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | Degu
 978-0-201-63346-7 | TCP/IP Illustrated, Volume 1       |      5 | Stevens
 978-0-13-018380-4 | Internetworking with TCP/IP Vol.1  |      4 | Comer
 978-1-56592-243-3 | Perl Cookbook                      |      5 | Christiansen
 978-1-56592-243-3 | Perl Cookbook                      |      5 | Torkington
 978-0-7357-1201-0 | Designing with Web Standards       |      5 | Zeldman

Good start, but note how we now have three rows for “CCSP SNRS Exam Certification Guideâ€? and two for “Perl Cookbookâ€?. We could of course modify our Perl code to look at the ISBN in each row and combine as appropriate, but it’s better to get the database to do that work, since it’s designed for that sort of thing. So let’s use an aggregate function to combine the values over multiple rows into a single row. All we have to do is use the column that changes (surname) in an aggregate function and tell PostgreSQL to use the other columns to group rows into one. PostgreSQL 8.4 introduces a really nice aggregate function, array_agg(), for pulling a series of strings together into an array. Let’s put it to use:

SELECT b.isbn, b.title, b.rating, array_agg(a.surname) as authors
  FROM books       b
  JOIN book_author ba ON b.isbn     = ba.isbn
  JOIN authors     a  ON ba.author_id = a.id
 GROUP BY b.isbn, b.title, b.rating;

Now the output is:

       isbn        |               title                | rating |          authors         
-------------------+------------------------------------+--------+--------------------------
 978-0-201-63346-7 | TCP/IP Illustrated, Volume 1       |      5 | {Stevens}
 978-0-13-018380-4 | Internetworking with TCP/IP Vol.1  |      4 | {Comer}
 978-1-56592-243-3 | Perl Cookbook                      |      5 | {Christiansen,Torkington}
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | {Bastien,Nasseh,Degu}
 978-0-7357-1201-0 | Designing with Web Standards       |      5 | {Zeldman}

Much better. We now have a single row for each book, and the authors are all grouped into a single column. Cool. But we can go one step further. Although we could use Perl to turn the array of author surnames into a comma-delimited string, there’s a PostgreSQL function for that, too: array_to_string(). Check it out:

SELECT b.isbn, b.title, b.rating,
       array_to_string(array_agg(a.surname), ', ') as authors
  FROM books       b
  JOIN book_author ba ON b.isbn     = ba.isbn
  JOIN authors     a  ON ba.author_id = a.id
 GROUP BY b.isbn, b.title, b.rating;

Now the rows will be:

       isbn        |               title                | rating |          authors          
-------------------+------------------------------------+--------+--------------------------
 978-0-201-63346-7 | TCP/IP Illustrated, Volume 1       |      5 | Stevens
 978-0-13-018380-4 | Internetworking with TCP/IP Vol.1  |      4 | Comer
 978-1-56592-243-3 | Perl Cookbook                      |      5 | Christiansen, Torkington
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | Bastien, Nasseh, Degu
 978-0-7357-1201-0 | Designing with Web Standards       |      5 | Zeldman

Create a Database View

Cool! All the formatting work is done! But since it’s likely what we’ll often need to fetch book titles along with their authors, let’s create an SQL view for this query. That way, we don’t have to write the same SQL in different places in the application: we can just use the view. So create a new file, sql/002-books_with_authors.sql, and add this SQL:

CREATE VIEW books_with_authors AS
SELECT b.isbn, b.title, b.rating,
       array_to_string(array_agg(a.surname), ', ') as authors
  FROM books       b
  JOIN book_author ba ON b.isbn     = ba.isbn
  JOIN authors     a  ON ba.author_id = a.id
 GROUP BY b.isbn, b.title, b.rating;

Now install this view in the database:

psql -U postgres -d myapp -f sql/002-books_with_authors.sql

And now we can make use of the view any time we want and get the results of the full query. It’s time to do that in our controller. Edit lib/MyApp/Controller/Books.pm and change this line in the list action:

my $sth = $_->prepare('SELECT isbn, title, rating FROM books');

To:

my $sth = $_->prepare(q{
    SELECT isbn, title, rating, authors FROM books_with_authors
});

The use of the q{} operator is a style I use for SQL queries in Perl code; you can use whatever style you like. Since this is a very short SQL statement (thanks to the view), it’s not really necessary to have it on multiple lines, but I like to be fairly consistent about this sort of thing.

The last thing we need to do is a a very simple change to the list template in lib/MyApp/Templates/HTML/Books.pm. In previous posts, I was referring to the non-existent “author” key in the each hash reference fetched from the database. In the new view, however, I’ve named that column “authors”. So change this line:

cell { $book->{author} };

To

cell { $book->{authors} };

And that’s it. Restart the server and reload http://localhost:3000/books/list and you should now see all of the books listed with their authors.

Notes

I think you can appreciate why, to a certain degree, I’m starting to think of the database as handling both the “M” and the “V” in “MVC”. It’s no mistake that the database object we created is known as a “view”. It was written in such a way that it not only expressed the relationship between books and authors in a compact but clear way, but it formatted the appropriate data for publishing on the site—all in a single, efficient query. All the Template::Declare view does is wrap it all up in the appropriate HTML.

PostgreSQL isn’t the only database to support feature such as this, by the way. All of the databases I’ve used support views, and many offer useful aggregate functions, as well. Among the MySQL aggregates, for example, is group_concat(), which sort of combines the array_to_string(array_agg()) PostgreSQL syntax into a single function. And I’ve personally written a custom aggregate for SQLite in Perl. So although I use PostgreSQL for these examples and make use of its functionality, you can do much the same thing in most other databases.

Either way, I find this to be a lot less work than using an ORM or other abstraction layer between my app and the database. Frankly, SQL provides just the right level of abstraction.

Looking for the comments? Try the old layout.

DBIx::Connector Updated

After much gnashing of teeth, heated arguments with @robkinon and @mst, lots of deep thesaurus spelunking, and three or four iterations, I finally came up with an an improved API for DBIx::Connector that I believe is straight-forward and easy to explain.

Following up on my post last week, I explored, oh I dunno, a hundred different terms for the various methods? I’ve never spent so much time on thesaurus.com in my life. Part of what added to the difficulty was that @mst seemed to think that there should actually be three modes for each block method: one that pings, one that doesn’t, and one that tries again if a block dies and the connection is down. So I went from six to nine methods with that assertion.

What I finally came up with was to name the three basic methods run(), txn_run(), and svp_run(), and these would neither ping nor retry in the event of failure. Then I added variations on these methods that would ping and that would try to fix failures. I called these “ping runs” and “fixup runs,” respectively. It was the latter term, “fixup,” that had been so hard for me to nail down, as “retry” seemed to say that the method was a retry, while “fixup” more accurately reflects that the method would try to fix up the connection in the event of a failure.

Once I’d implemented this interface, I now had nine methods:

  • run()
  • txn_run()
  • svp_run()
  • ping_run()
  • txn_ping_run()
  • svp_ping_run()
  • fixup_run()
  • txn_fixup_run()
  • svp_fixup_run()

This worked great. Then I went about documenting it. Jesus Christ what a pain! I realized that all these similarly-named methods would require a lot of explanation. I duly wrote up said explanation, and just wasn’t happy with it. It just felt to me like all the explanation made it too difficult to decide what methods to use and when. Such confusion would make the module less likely to be used – and certainly less likely to be used efficiently.

So I went back to the API drawing board and, reflecting on @robkinyon’s browbeating about decorating methods and @mst’s coming to that conclusion as well, I finally came up with just three methods:

  • run()
  • txn()
  • svp()

For any one of these, you can call it by passing a block, of course:

$conn->txn( sub { $_->do('SELECT some_function()') } );

In addition, you can now have any one of them run in one of three modes: the default (no ping), “ping”, or “fixup”:

$conn->txn( fixup => sub { $_->do('SELECT some_function()') } );

It’s much easier to explain the three methods in terms of how the block is transactionally encapsulated, as that’s the only difference between them. Once that’s understood, it’s pretty easy to explain how to change the “connection mode” of each by passing in a leading string. It even looks pretty nice. I’m really happy with this

One thing that increased the difficulty in coming up with this API was that @mst felt that by default the methods should neither ping nor try to fix up a failure. I was resistant to this because it’s not how Apache::DBI or connect_cached() work: they always ping. It turns out that DBIx::Class doesn’t cache connections at all. I thought it had. Rather, it creates a connection and simply hangs onto it as a scalar variable. It handles the connection for as long as it’s in scope, but includes no magic global caching. This reduces the action-at-a-distance issues common with caching while maintaining proper fork- and thread-safety.

At this point, I took a baseball bat to my desk.

Figuratively, anyway. I did at least unleash a mountain of curses upon @mst and various family relations. Because it took me a few minutes to see it: It turns out that DBIx::Class is right to do it this way. So I ripped out the global caching from DBIx::Connector, and suddenly it made much more sense not to ping by default – just as you wouldn’t ping if you created a DBI handle yourself.

DBIx::Connector is no longer a caching layer over the DBI. It’s now a proxy for a connection. That’s it. There is no magic, no implicit behavior, so it’s easier to use. And because it ensures fork- and thread-safety, you can instantiate a connector and hold onto it for whenever you need it, unlike using the DBI itself.

And one more thing: I also added a new method, with(). For those who always want to use the same connection mode, you can use this method to create a proxy object that has a different default mode. (Yes, a proxy for a proxy for a database handle. Whatever!) Use it like this:

$conn->with('fixup')->run( sub { ... } );

And if you always want to use the same mode, hold onto the proxy instead of the connection object:

my $proxy = DBIx::Connector->(@args)->with('fixup');

# later ...
$proxy->txn( sub { ... } ); # always in fixup mode

So while fixup mode is no longer the default, as Tim requested, but it can optionally be made the default, as DBIx::Class requires. The with() method will also be the place to add other global behavioral modifications, such as DBIx::Class’s auto_savepoint feature.

So for those of you who were interested in the first iteration of this module, my apologies for changing things so dramatically in this release (ripping out the global caching, deprecating methods, adding a new block method API, etc.). But I think that, for all the pain I went through to come up with the new API – all the arguing on IRC, all the thesaurus spelunking – that this is a really good API, easy to explain and understand, and easy to use. And I don’t expect to change it again. I might improve exceptions (use objects instead of strings?) add block method exception handling (perhaps adding a catch keyword?), but the basics are finally nailed down and here to stay.

Thanks to @mst, @robkinyon, and @ribasushi, in particular, for bearing with me and continuing to hammer on me when I was being dense.

Looking for the comments? Try the old layout.

Suggest Method Names for DBIx::Connector

Thanks to feedback from Tim Bunce and Peter Rabbitson in a DBIx::Class bug report, I’ve been reworking DBIx::Connector’s block-handling methods. Tim’s objection is that the the feature of do() and txn_do() that executes the code reference a second time in the event of a connection failure can be dangerous. That is, it can lead to action-at-a-distance bugs that are hard to find and fix. Tim suggested renaming the methods do_with_retry() and txn_do_with_retry() in order to make explicit what’s going on, and to have non-retry versions of the methods.

I’ve made this change in the repository. But I wasn’t happy with the method names; even though they’re unambiguous, they are also overly long and not very friendly. I want people to use the retrying methods, but felt that the long names make the non-retrying preferable to users. While I was at it, I also wanted to get rid of do(), since it quickly became clear that it could cause some confusion with the DBI’s do() method.

I’ve been thesaurus spelunking for the last few days, and have come up with a few options, but would love to hear other suggestions. I like using run instead of do to avoid confusion with the DBI, but otherwise I’m not really happy with what I’ve come up with. There are basically five different methods (using Tim’s suggestions for the moment):

run( sub {} )
Just run a block of code.
txn_run( sub {} )
Run a block of code inside a transaction.
run_with_retry( sub {} )
Run a block of code without pinging the database, and re-run the code if it throws an exception and the database turned out to be disconnected.
txn_run_with_rerun( sub {} )
Like run_with_retry(), but run the block inside a transaction.
svp_run( sub {} )
Run a block of code inside a savepoint (no retry for savepoints).

Here are some of the names I’ve come up with so far:

Run block Run in txn Run in savepoint Run with retry Run in txn with retry Retry Mnemonic
run txn_run svp_run runup txn_runup Run assuming the db is up, retry if not.
run txn_run svp_run run_up txn_run_up Same as above.
run txn_run svp_run rerun txn_rerun Run assuming the db is up, rerun if not.
run txn_run svp_run run::retry txn_run::retry :: means “with”

That last one is a cute hack suggested by Rob Kinyon on IRC. As you can see, I’m pretty consistent with the non-retrying method names; it’s the methods that retry that I’m not satisfied with. A approach I’ve avoided is to use an adverb for the non-retry methods, mainly because there is no retry possible for the savepoint methods, so it seemed silly to have svp_run_safely() to complement do_safely() and txn_do_safely().

Brilliant suggestions warmly appreciated.

Looking for the comments? Try the old layout.

Database Handle and Transaction Management with DBIx::Connector

As part of my ongoing effort to wrestle Catalyst into working the way that I think it should work, I’ve just uploaded DBIx::Connector to the CPAN. See, I was using Catalyst::Model::DBI, but it turned out that I wanted to use the database handle in places other than the Catalyst parts of my app. I was bitching about this to mst on #catalyst, and he said that Catalyst::Model::DBI was actually a fork of DBIx::Class’s handle caching, and quite out of date. I said, “But this already exists. It’s called connect_cached().” I believe his response was, “OH FUCK OFF!”

So I started digging into what Catalyst::Model::DBI and DBIx::Class do to cache their database handles, and how it differs from connect_cached(). It turns out that they were pretty smart, in terms of checking to see if the process had forked or a new thread had been spawned, and if so, deactivating the old handle and then returning a new one. Otherwise, things are just cached. This approach works well in Web environments, including under mod_perl; in forking applications, like POE apps; and in plain Perl scripts. Matt said he’d always wanted to pull that functionality out of DBIx::Class and then make DBIx::Class depend on the external implementation. That way everyone could take advantage of the functionality, including people like me who don’t want to use an ORM.

So I did it. Maybe it was crazy (mmmmm…yak meat), but I can now use the same database interface in the Catalyst and POE parts of my application without worry:

my $dbh = DBIx::Connector->connect(
    'dbi:Pg:dbname=circle', 'postgres', '', {
        PrintError     => 0,
        RaiseError     => 0,
        AutoCommit     => 1,
        HandleError    => Exception::Class::DBI->handler,
        pg_enable_utf8 => 1,
    },
);

$dbh->do($sql);

But it’s not just database handle caching that I’ve included in DBIx::Connector; no, I’ve also stolen some of the transaction management stuff from DBIx::Class. All you have to do is grab the connector object which encapsulates the database handle, and take advantage of its txn_do() method:

my $conn = DBIx::Connector->new(@args);
$conn->txn_do(sub {
    my $dbh = shift;
    $dbh->do($_) for @queries;
});

The transaction is scoped to the code reference passed to txn_do(). Not only that, it avoids the overhead of calling ping() on the database handle unless something goes wrong. Most of the time, nothing goes wrong, the database is there, so you can proceed accordingly. If it is gone, however, txn_do() will re-connect and execute the code reference again. The cool think is that you will never notice that the connection was dropped – unless it’s still gone after the second execution of the code reference.

And finally, thanks to some pushback from mst, ribasushi, and others, I added savepoint support. It’s a little different than that provided by DBIx::Class; instead of relying on a magical auto_savepoint attribute that subtly changes the behavior of txn_do(), you just use the svp_do() method from within txn_do(). The scoping of subtransactions is thus nicely explicit:

$conn->txn_do(sub {
    my $dbh = shift;
    $dbh->do('INSERT INTO table1 VALUES (1)');
    eval {
        $conn->svp_do(sub {
            shift->do('INSERT INTO table1 VALUES (2)');
            die 'OMGWTF?';
        });
    };
    warn "Savepoint failed\n" if $@;
    $dbh->do('INSERT INTO table1 VALUES (3)');
});

This transaction will insert the values 1 and 3, but not 2. If you call svp_do() outside of txn_do(), it will call txn_do() for you, with the savepoint scoped to the entire transaction:

$conn->svp_do(sub {
    my $dbh = shift;
    $dbh->do('INSERT INTO table1 VALUES (4)');
    $conn->svp_do(sub {
        shift->do('INSERT INTO table1 VALUES (5)');
    });
});

This transaction will insert both 3 and 4. And note that you can nest savepoints as deeply as you like. All this is dependent on whether the database supports savepoints; so far, PostgreSQL, MySQL (InnoDB), Oracle, MSSQL, and SQLite do. If you know of others, fork the repository, commit changes to a branch, and send me a pull request!

Overall I’m very happy with this module, and I’ll probably use it in all my Perl database projects from here on in. Perhaps later I’ll build a model class on it (something like Catalyst::Model::DBI, only better!), but next up, I plan to finish documenting Template::Declare and writing some views with it. More on that soon.

"Keep DBI's connect_cached From Horking
Transactions"

Looking for the comments? Try the old layout.

Keep DBI's connect_cached From Horking Transactions

I’ve been on a bit of a Perl hacking tear lately. In addition to knocking out Test::XPath last week, I’ve been experimenting with TAP::Harness sources, Template::Declare, Catalyst views, a new Module::Build subclass for building database-backed applications, and, last but not least, an IRC logging bot. Oh, and that application I’m working on for PGX with Quinn Weaver. So much is crowding my mind these days that I’m having trouble sleeping. Tonight I’m up late hacking to try to get some of this stuff out of my head.

But before I crash, I wanted to share a useful DBI hack. I use connect_cached in a lot of my applications, because it’s a nice way to reuse database handles without having to figure out my own caching algorithm. The code I have to use it looks like this:

sub dbh {
    my $self = shift;
    DBI->connect_cached( @{ $self->_dbi }{qw(dsn username password)}, {
        PrintError     => 0,
        RaiseError     => 0,
        HandleError    => Exception::Class::DBI->handler,
        AutoCommit     => 1,
    });
}

Very simple. I just call the dbh() method whenever I need to talk to the database, and I’m set. Except for one problem: transactions.

Say I have a method that grabs the handle, starts a transaction with the begin_work method, and then inserts a row. Then another method grabs the handle from dbh() on the assumption that it’s in the same transaction, and does its own work. Only, it’s not the same transaction, because, unfortunately, DBI sets the attributes passed to connect_cached every single time it’s called!. So even though that second method may think it’s in the middle of a transaction, it’s really not, because when connect_cached sets AutoCommit back to 1, the transaction gets committed.

Oops.

This really fucks with my tests, where I’m often fetching the database handle to start a transaction, running some tests, and then wanting to rollback the transaction when I’m done. It’s irritating as all hell to discover that data has been inserted into the database. And the DBI, alas, just gives me this warning:

rollback ineffective with AutoCommit enabled at t/botinst.t line 67.

I’m likely not to notice until I get a duplicate key error the next time I run the tests.

As I was dealing with this today, my memory started poking at me, telling me that I’ve dealt with this before. And sure enough, a quick Google shows that Tim Bunce and I had an extensive conversation on this very topic – over four years ago. If you’re patient enough to dig through that thread, you’ll note that this issue is due to some architectural difficulties in the DBI, to be worked out in DBI 2.

Over the last four years, I’ve implemented a couple of solutions to this problem, all involving my code tracking the transaction state and modifying the AutoCommit attribute in the appropriate places. It’s all rather fragile. But as I dug through the thread, I discovered a much cleaner fix, using a little-known and so-far undocumented feature of the DBI: callbacks. This is actually I feature I half-way implemented in the DBI years ago, getting it just far enough that Tim was willing to finish it. And it’s just sitting there, waiting to be used.

So here’s the trick: Specify a callback for the connect_cached() method that’s used only when an existing file handle is retrieved from the cache. A bunch of stuff is passed to the callback, but the important one is the fifth argument, the attributes. All it has to do is delete the AutoCommit attribute. Since this callback is called before the DBI looks at the attributes to set them on the handle, the callback effectively prevents the DBI from horking up your transactions.

Here’s the modified code:

my $cb = {
    'connect_cached.reused' => sub { delete $_[4]->{AutoCommit} },
};

sub dbh {
    my $self = shift;
    DBI->connect_cached( @{ $self->_dbi }{qw(dsn username password)}, {
        PrintError     => 0,
        RaiseError     => 0,
        HandleError    => Exception::Class::DBI->handler,
        AutoCommit     => 1,
        Callbacks      => $cb,
    });
}

Callbacks are passed as a hash reference, with the keys being the names of the DBI methods that should trigger the callbacks, such as ping, data_sources, or connect_cached. The values are, of course, code references. When the DBI calls the code callbacks, it passes in stuff relevant to the method.

In the case of connect_cached, there are two additional special-case callbacks, connect_cached.new and connect_cached.reused, so that you can have different callbacks execute depending on whether connect_cached used a cached database handle or had to create a new one. Here, I’ve used connect_cached.reused, of course, and all I do is kill off the AutoCommit attribute before the DBI gets its greedy hands on it. Et voilà, problem solved!

And before you ask, no, you can’t simply omit AutoCommit from the attributes passed to connect_cached, because the DBI helpfully adds it for you.

So now this is here so I can find it again when it bites me next year, and I hope it helps you, too. Meanwhile, perhaps someone could take it upon themselves to document DBI’s callbacks? At this point, the closest thing to documentation is in the tests. (Hrm. I think I might have had a hand in writing them.) Check ‘em out.

Looking for the comments? Try the old layout.

More about…

Hack: Force DBI::Shell to use a DBI Subclass

So I just had a need to use DBI::Shell with a subclass of the DBI. It doesn’t support subclasses directly (it’d be nice to be able to specify one on the command-line or something), but I was able to hack it into using one anyway by doing this:

use My::DBI;
BEGIN {
    sub DBI::Shell::Base::DBI () { 'My::DBI' };
}
use DBI::Shell;

Yes, it’s extremely sneaky. DBI::Shell::Base uses the string constant DBI, as in DBI->connect(...), so by shoving a constant into DBI::Shell::Base before loading DBI::Shell, I convince it to use my subclass, instead.

Looking for the comments? Try the old layout.

More about…

Add Regular Expression Operator to SQLite

As I discussed a couple of months ago, DBD::SQLite exposes the SQLite sqlite3_create_function() API for adding Pure-Perl functions and aggregates to SQLite on a per-connection basis. This is cool, but in perusing the SQLite expression documentation, I came across this gem:

The REGEXP operator is a special syntax for the regexp() user function. No regexp() user function is defined by default and so use of the REGEXP operator will normally result in an error message. If a user-defined function named “regexp” is defined at run-time, that function will be called in order to implement the REGEXP operator.

Well hell! I thought. I can do that!

In a brief search, I could find no further documentation of this feature, but all it took was a little experimentation to figure it out. The regexp() function should expect two arguments. The first is the regular expression, and the second is the value to match. So it can be added to DBD::SQLite like this:

$dbh = DBI->connect('dbi:SQLite:dbfile=test.db');
$dbh->func('regexp', 2, sub {
    my ($regex, $string) = @_;
    return $string =~ /$regex/;
}, 'create_function');

Yep, that’s it! Now, I have my own module for handling database connections, and I wanted to make sure that all of my custom functions are always present, every time I connect to the database. In a mod_perl environment, you can end up with a lot of connections, and a single process has the potential disconnect and reconnect more than once (due to exceptions thrown by the database and whatnot). The easiest way to ensure that the functions are always there as soon as you connect and every time you connect, I learned thanks to a tip from Tim Bunce, is to subclass the DBI and implement a connected() method. Here’s what it looks like:

package MyApp::SQLite;
use base 'DBI';

package MyApp::SQLite::st;
use base 'DBI::st';

package MyApp::SQLite::db;
use base 'DBI::db';

sub connected {
    my $dbh = shift;
    # Add regexp function.
    $dbh->func('regexp', 2, sub {
        my ($regex, $string) = @_;
        return $string =~ /$regex/;
    }, 'create_function');
}

So how does this work? Here’s a quick app I wrote to demonstrate the use of the REGEXP expression in SQLite using Perl regular expressions:

#!/usr/bin/perl -w

use strict;

my $dbfile = shift || die "Usage: $0 db_file\n";
my $dbh = MyApp::SQLite->connect(
    "dbi:SQLite:dbname=$dbfile", '', '',
    {
        RaiseError  => 1,
        PrintError  => 0,
    }
);

END {
    $dbh->do('DROP TABLE try');
    $dbh->disconnect;
}

$dbh->do('CREATE TABLE try (a TEXT)');

my $ins = $dbh->prepare('INSERT INTO try (a) VALUES (?)');
for my $val (qw(foo bar bat woo oop craw)) {
    $ins->execute($val);
}

my $sel = $dbh->prepare('SELECT a FROM try WHERE a REGEXP ?');

for my $regex (qw( ^b a w?oop?)) {
    print "'$regex' matches:\n  ";
    print join "\n  " =>
        @{ $dbh->selectcol_arrayref($sel, undef, $regex) };
    print "\n\n";
}

This script outputs:

'^b' matches:
  bar
  bat

'a' matches:
  bar
  bat
  craw

'w?oop?' matches:
  foo
  woo
  oop

Pretty slick, no? I wonder if it’d make sense for DBD::SQLite to add the regexp() function itself, in C, using the Perl API, so that it’s just always available to DBD::SQLite apps?

Looking for the comments? Try the old layout.

How to Determine Your Transaction ID

Today I had reason to find out what PostgreSQL transaction I was in the middle of at any given moment in Bricolage. Why? I wanted to make sure that a single request was generating multiple transactions, instead of the normal one. It’s a long story, but suffice it to say that lengthy transactions were causing deadlocks. Read this if you’re really interested.

Anyway, here’s how to determine your current transaction using DBI. The query will be the same for any client, of course.

my $sth = $dbh->prepare(qq{
    SELECT transaction
    FROM   pg_locks
    WHERE  pid = pg_backend_pid()
            AND transaction IS NOT NULL
    LIMIT  1
});

$sth->execute;
$sth->bind_columns(\my $txid);
$sth->fetch;
print "Transaction: $txid\n";

Looking for the comments? Try the old layout.

More about…

My Adventures with Mac OS X

I recently decided to make the leap from Yellow Dog Linux to Mac OS X on my Titanium PowerBook. Getting everything to work the way I wanted proved to be a challenge, but well worth it. This document outlines all that I learned, so that neither you nor I will have to experience such pain again. The overall goal was to get Bricolage up and running, figuring that if it worked, then just about any mod_perl based solution would run. I’m happy to say that I was ultimately successful. You can be, too.

In the descriptions below, I provide links to download the software you’ll need, as well as the shell commands I used to compile and install each package. In all cases (except for the installation of the Developer Tools), I saved each package’s sources to /usr/local/src and gunzipped and untarred them there. I also carried out each step as root, by running sudo -s. If you’re not comfortable using a Unix shell, you might want to read up on it, first. All of my examples also assume a sh-compatible shell, such as bash or zsh. Fortunately, zsh comes with OS X, so you can just enable it for yourself in NetInfo Manager by setting users -> <username> -> shell to “/bin/zsh”, where <username> is your user name.

Developer Tools

All of the software that I describe installing below must be compiled. To compile software on Mac OS X, you need to install the Mac OS X Developer Tools. These provide the cc compiler and many required libraries. Conveniently, these come on a CD-ROM with the Mac OS X Version 10.1 upgrade kit. I just popped in the CD and installed them like you’d install any other OS X software. I needed administrative access to OS X to install the Developer Tools (or, indeed, to install any of the other software I describe below), but otherwise it posed no problems.

The best time to install the Developer Tools is immediately after upgrading to OS X version 10.1. Then run the Software Update applet in the System preferences to get your system completely up-to-date. By the time I was done, I had the system updated to version 10.1.3.

Emacs

The first step I took in the process of moving to OS X was to get working the tools I needed most. Essentially, what this meant was GNU Emacs. Now I happen to be a fan of the X version of Emacs – not XEmacs, but GNU Emacs with X support built in. I wasn’t relishing the idea of having to install X on OS X (although there are XFree86 ports that do this), so I was really pleased to discover the Mac-Emacs project. All I had to do was patch the GNU Emacs 21.1 sources and compile them, and I was ready to go! GNU Emacs works beautifully with the OS X Aqua interface.

There were a few configuration issues for me to work out, however. I have become addicted to the green background that an old RedHat .XConfig file had set, and I wanted this feature in OS X, too. Plus, the default font was really ugly (well, too big, really – anyone know how to make it smaller in Emacs?) and the Mac command key was working as the Emacs META key, rather than the option key. So I poked around the net until I found the settings I needed and put them into my .emacs file:

(custom-set-faces
'(default ((t (:stipple nil
  :background "DarkSlateGrey"
  :foreground "Wheat"
  :inverse-video nil
  :box nil
  :strike-through nil
  :overline nil
  :underline nil
  :slant normal
  :weight normal
  :height 116
  :width normal
  :family "apple-andale mono"))))
'(cursor ((t (:background "Wheat"))))
; Use option for the meta key.
(setq mac-command-key-is-meta nil)

Installing Emacs is not required for installing any of the other packages described below – it just happens to be my favorite text editor and IDE. So I don’t provide the instructions here; the Mac-Emacs project does a plenty good job. If you’re not comfortable with Unix editors, you can use whatever editor you like. BBEdit is a good choice.

GDBM

Mac OS X doesn’t come with a DBM! But since mod_ssl needs it, we have to install it. Fortunately, I found this PDF detailing someone else’s adventures with mod_ssl on OS X, and it provided decent instructions for installing GDBM. First, I created a new user for GDBM. In NetInfoManager, I created a duplicate of the “unknown” user and named it “bin”. Then, I downloaded GDBM from the FSF, and installed it like this:

cd /usr/local/src/gdbm-1.8.0
cp /usr/libexec/config* .
./configure
make
make install
ln -s /usr/local/lib/libgdbm.a /usr/local/lib/libdbm.a

That did the trick. Nothing else was involved, fortunately.

Expat

Who doesn’t do something with XML these days? If your answer is, “not me!”, then you’ll need to install the Expat library in order to work with XML::Parser in Perl. Fortunately it’s relatively easy to install, although support for the -static flag appears to be broken in cc on OS X, so it needs to be stripped out. I downloaded it from its project bpage, and then did this:

cd /usr/local/src/expat-1.95.2
./configure
perl -i.bak -p -e \
  's/LDFLAGS\s*=\s*-static/LDFLAGS=/' \
  examples/Makefile
perl -i.bak -p -e \
  's/LDFLAGS\s*=\s*-static/LDFLAGS=/' \
  xmlwf/Makefile
make
make install

Perl

Although Mac OS X ships with Perl (Yay!), it’s the older 5.6.0 version. There have been many bug fixes included in 5.6.1, so I wanted to make sure I got the latest stable version before I built anything else around it (mod_perl, modules, etc.).

Being a Unix program, Perl doesn’t expect to run into the problems associated with a case-insensitive file system like that Mac OS X’s HFS Plus. So there are a couple of tweaks to the install process that make it slightly more complicated than you would typically expect. Fortunately, many have preceded us in doing this, and the work-arounds are well-known. Basically, it comes down to this:

cd /usr/local/src/perl-5.6.1/
export LC_ALL=C
export LANG=en_US
perl -i.bak -p -e 's|Local/Library|Library|g' hints/darwin.sh
sh Configure -des -Dfirstmakefile=GNUmakefile -Dldflags="-flat_namespace"
make
make test
make install

There were a few errors during make test, but none of them seems to be significant. Hopefully, in the next version of Perl, the build will work just as it does on other platforms.

Downloads

Before installing Open SSL, mod_ssl, mod_perl, and Apache, I needed to get all the right pieces in place. The mod_ssl and mod_perl configure processes patch the Apache sources, so the Apache sources have to be downloaded and gunzipped and untarred into an adjacent directory. Furthermore, the mod_ssl version number corresponds to the Apache version number, so you have to be sure that they match up. Normally, I would just download the latest versions of all of these pieces and run with it.

However, Bricolage requires the libapreq library and its supporting Perl modules to run, and these libraries have not yet been successfully ported to Mac OS X. But worry not; fearless mod_perl hackers are working on the problem even as we speak, and there is an interim solution to get everything working.

As of this writing, the latest version of Apache is 1.3.24. But because I needed libapreq, I had to use an experimental version of Apache modified to statically compile in libapreq. Currently, only version 1.3.23 has been patched for libapreq, so that’s what I had to use. I discovered this experimental path thanks to a discussion on the Mac OS X Perl mail list.

So essentially what I did was download the experimental apache.tar.gz and the experimental lightweight apreq.tar.gz packages and gunzip and untar them into /usr/local/src. Then I was ready to move on to Open SSL, mod_ssl, and mod_perl.

Open SSL

Compiling Open SSL was pretty painless. One of the tests fails, but it all seems to work out, anyway. I download the sources from the Open SSL site, and did this:

cd /usr/local/src/openssl-0.9.6c
./config
make
make test

mod_ssl

The mod_ssl Apache module poses no problems whatsoever. I simply downloaded mod_ssl-2.8.7-1.3.23 from the mod_ssl site (note that the “1.3.23” at the end matches the version of Apache I downloaded) and gunzipped and untarred it into /usr/local/src/. Then I simply excuted:

./configure --with-apache=/usr/local/src/apache_1.3.23

mod_perl

Configuring and installing mod_ssl was, fortunately, a relatively straight-forward process. Getting Apache compiled with mod_perl and mod_ssl, however, was quite tricky, as you’ll see below. A number of braver folks than I have preceded me in installing mod_perl, so I was able to rely on their hard-earned knowledge to get the job done. For example, Randal Schwartz posted instructions to the mod_perl mail list, and his instructions worked well for me. So I downloaded the sources from the mod_perl site, and did this:

cd /usr/local/src/mod_perl-1.26
perl Makefile.PL \
  APACHE_SRC=/usr/local/src/apache_1.3.23/src \
  NO_HTTPD=1 \
  USE_APACI=1 \
  PREP_HTTPD=1 \
  EVERYTHING=1
make
make install

Apache

Getting Apache compiled just right was the most time-consuming part of this process for me. Although many had gone before me in this task, everybody seems to do it differently. I had become accustomed to just allowing Apache to use most of its defaults when I compiled under Linux, but now I was getting all kinds of errors while following different instructions from different authorities from around the web. Sometimes Apache wouldn’t compile at all, and I’d get strange errors. Other times it would compile, pass all of its tests, and install, only to offer up errors such as

dyld: /usr/local/apache/bin/httpd Undefined symbols: _log_config_module

when I tried to start it. It turns out that the problem there was that I had a number of modules compiled as DSOs – that is, libraries that can be loaded into Apache dynamically – but wasn’t loading them properly in my httpd.conf. This was mainly because I’ve grown accustomed to Apache having all the libraries I needed compiled in statically, so I simply didn’t have to worry about them.

But I finally hit on the right incantation to get Apache to compile with everything I need added statically, but still with support for DSOs by compiling in mod_so. I present it here for your viewing pleasure:

SSL_BASE=/usr/local/src/openssl-0.9.6c/ \
    ./configure \
    --with-layout=Apache \
    --enable-module=ssl \
    --enable-module=rewrite \
    --enable-module=so \
    --activate-module=src/modules/perl/libperl.a \
    --disable-shared=perl \
    --without-execstrip
  make
  make certificate TYPE=custom 
  make install

This series of commands successfully compiled Apache with mod_perl and mod_ssl support statically compiled in, along with most of the other default modules that come with Apache. In short, everything is there that you need to run a major application with security such as Bricolage.

Note that make certificate will lead you through the process of creating an SSL certificate. I like to use the “custom” type so that it reflects the name of my organization. But you can use whatever approach you’re most comfortable with. Consult the mod_ssl INSTALL file for more information.

libapreq

Once Apache is installed with mod_perl and mod_ssl, the rest is gravy! The experimental libapreq library I downloaded installed without a hitch:

cd /usr/local/src/httpd-apreq
perl Makefile.PL
make
make install

PostgreSQL

PostgreSQL is a sophisticated open-source Object-Relational DBMS. I use it a lot in my application development, and it, too, is required by Bricolage. I was a bit concerned about how well it would compile and work on Mac OS X, but I needn’t have worried. First of all, Apple has provided some pretty decent instructions. Although they mainly document how to install MySQL, a competing open-source RDBMS, many of the same concepts apply to PostgreSQL.

The first thing I had to do was to create the “postgres” user. This is the system user that PostgreSQL typically runs as. I followed Apple’s instructions, using NetInfo Manager to duplicate the default “www” group and “www” user and give the copies the name “postgres” and a new gid and uid, respectively.

Next I downloaded the PostgreSQL version 7.2.1 sources. Version 7.2 is the first to specifically support Mac OS X, so going about the install was as simple as it is on any Unix system:

./configure --enable-multibyte=UNICODE
make
make install

That was it! PostgreSQL was now installed. Next I had to initialize the PostgreSQL database directory. Again, this works much the same as it does on any Unix system:

sudo -u postgres /usr/local/pgsql/bin/initdb \
  -D /usr/local/pgsql/data

The final step was to start PostgreSQL and try to connect to it:

sudo -u postgres /usr/local/pgsql/bin/pg_ctl start \
  -D /usr/local/pgsql/data /usr/local/pgsql/bin/psql -U postgres template1

If you follow the above steps and find yourself at a psql prompt, you’re in business! Because I tend to use PostgreSQL over TCP, I also enabled TCP connectivity by enabling the “tcpip_socket” option in the postgresql.conf file in the data directory created by initdb:

tcpip_socket = true

If you’re like me, you like to have servers such as PostgreSQL start when your computer starts. I enabled this by creating a Mac OS X PostgreSQL startup bundle. It may or may not be included in a future version of PostgreSQL, but in the meantime, you can download it from here. Simply download it, gunzip and untar it into /Library/StartupItems, restart OS X, and you’ll see it start up during the normal Mac OS X startup sequence. I built this startup bundle by borrowing from the existing FreeBSD PostgreSQL startup script, the Apache startup script that ships with OS X, and by reading the Creating SystemStarter Startup Item Bundles HOWTO.

XML::Parser

At this point, I had most

of the major pieces in place, and it was time for me to install the Perl modules I needed. First up was XML::Parser. For some reason, XML::Parser can’t find the expat libraries, even though the location in which I installed them is pretty common. I got around this by installing XML::Parser like this:

perl Makefile.PL EXPATLIBPATH=/usr/local/lib \
  EXPATINCPATH=/usr/local/include
make
make test
make install

Text::Iconv

In Bricolage, Text::Iconv does all the work of converting text between character sets. This is because all of the data is stored in the database in Unicode, but we wanted to allow users to use the character set with which they’re accustomed in the UI. So I needed to install Text::Iconv. Naturally, Mac OS X doesn’t come with libiconv – a library on which Text::Iconv depends – so I had to install it. Fortunately, it was a simple process to download it and do a normal build:

cd /usr/local/src/libiconv-1.7
./configure
make
make install

Now, Text::Iconv itself was a little more problematic. You have to tell it to look for libiconv by adding the -liconv option to the LIBS key in Makefile.PL. I’ve simplified doing this with the usual Perl magic:

perl -i.bak -p -e \
  "s/'LIBS'\s*=>\s*\[''\]/'LIBS' => \['-liconv'\]/" \
  Makefile.PL
perl Makefile.PL
make
make test
make install

DBD::Pg

Although the DBI installed via the CPAN module without problem, DBD::Pg wanted to play a little less nice. Of course I specified the proper environment variables to install it (anyone know why DBD::Pg’s Makefile.PL script can’t try to figure those out on its own?), but still I got this error during make:

/usr/bin/ld: table of contents for archive:
/usr/local/pgsql/lib/libpq.a is out of date;
rerun  ranlib(1) (can't load from it)

But this was one of those unusual situations in which the error message was helpful. So I took the error message’s advice, and successfully compiled and installed DBD::Pg like this:

ranlib /usr/local/pgsql/lib/libpq.a
export POSTGRES_INCLUDE=/usr/local/pgsql/include
export POSTGRES_LIB=/usr/local/pgsql/lib
perl Makefile.PL
make
make test
make install

LWP

The last piece I needed to worry about customizing when I installed it was LWP. Before installing, back up /usr/bin/head. The reason for this is that LWP will install /usr/bin/HEAD, and because HFS Plus is a case-insensitive file system, it’ll overwrite /usr/bin/head! This is a pretty significant issue, since many configure scripts use /usr/bin/head. So after installing LWP, move /usr/bin/HEAD, GET, & POST to /usr/local/bin. Also move /usr/bin/lwp* to /usr/local/bin. Then move your backed-up copy of head back to /usr/bin.

Naturally, I didn’t realize that this was necessary until it was too late. I installed LWP with the CPAN module, and it wiped out /usr/bin/head. Fortunately, all was not lost (though it took me a while to figure out why my Apache compiles were failing!): I was able to restore head by copying it from the Mac OS X installer CD. I Just popped it in an executed the command:

cp "/Volumes/Mac OS X Install CD/usr/bin/head" /usr/bin

And then everything was happy again.

Bricolage

And finally, the pièce de résistance: Bricolage! All of the other required Perl modules installed fine from Bundle::Bricolage:

perl -MCPAN -e 'install Bundle::Bricolage'

Then I simply followed the directions in Bricolage’s INSTALL file, and started ‘er up! I would document those steps here, but the install process is currently in flux and likely to change soon. The INSTALL file should always be current, however – check it out!

To Be Continued

No doubt my adventures with Unix tools on Mac OS X are far from over. I’ve reported to various authors on the issues I’ve described above, and most will soon be releasing new versions to address those issues. As they do, I’ll endeavor to keep this page up-to-date. In the meantime, I am thoroughly enjoying working with the first really solid OS that Apple has released in years, and thrilled that I can finally have the best of both worlds: a good, reliable, and elegant UI, and all the Unix power tools I can stand! I hope you do, too.

Looking for the comments? Try the old layout.