Just a Theory

Trans rights are human rights

Posts about Caching

Accelerate Perl Github Workflows with Caching

I’ve spent quite a few hours evenings and weekends recently building out a comprehensive suite of GitHub Actions for Sqitch. They cover a dozen versions of Perl, nearly 70 database versions amongst nine database engines, plus a coverage test and a release workflow. A pull request can expect over 100 actions to run. Each build requires over 100 direct dependencies, plus all their dependencies. Installing them for every build would make any given run untenable.

Happily, GitHub Actions include a caching feature, and thanks to a recent improvement to shogo82148/actions-setup-perl, it’s quite easy to use in a version-independent way. Here’s an example:

name: Test
on: [push, pull_request]
jobs:
  OS:
    strategy:
      matrix:
        os: [ ubuntu, macos, windows ]
        perl: [ 'latest', '5.34', '5.32', '5.30', '5.28' ]
    name: Perl ${{ matrix.perl }} on ${{ matrix.os }}
    runs-on: ${{ matrix.os }}-latest
    steps:
      - name: Checkout Source
        uses: actions/checkout@v3
      - name: Setup Perl
        id: perl
        uses: shogo82148/actions-setup-perl@v1
        with: { perl-version: "${{ matrix.perl }}" }
      - name: Cache CPAN Modules
        uses: actions/cache@v3
        with:
          path: local
          key: perl-${{ steps.perl.outputs.perl-hash }}
      - name: Install Dependencies
        run: cpm install --verbose --show-build-log-on-failure --no-test --cpanfile cpanfile
      - name: Run Tests
        env: { PERL5LIB: "${{ github.workspace }}/local/lib/perl5" }
        run: prove -lrj4

This workflow tests every permutation of OS and Perl version specified in jobs.OS.strategy.matrix, resulting in 15 jobs. The runs-on value determines the OS, while the steps section defines steps for each permutation. Let’s take each step in turn:

  • “Checkout Source” checks the project out of GitHub. Pretty much required for any project.
  • “Setup Perl” sets up the version of Perl using the value from the matrix. Note the id key set to perl, used in the next step.
  • “Cache CPAN Modules” uses the cache action to cache the directory named local with the key perl-${{ steps.perl.outputs.perl-hash }}. The key lets us keep different versions of the local directory based on a unique key. Here we’ve used the perl-hash output from the perl step defined above. The actions-setup-perl action outputs this value, which contains a hash of the output of perl -V, so we’re tying the cache to a very specific version and build of Perl. This is important since compiled modules are not compatible across major versions of Perl.
  • “Install Dependencies” uses cpm to quickly install Perl dependencies. By default, it puts them into the local subdirectory of the current directory — just where we configured the cache. On the first run for a given OS and Perl version, it will install all the dependencies. But on subsequent runs it will find the dependencies already present, thank to the cache, and quickly exit, reporting “All requirements are satisfied.” In this Sqitch job, it takes less than a second.
  • “Run Tests” runs the tests that require the dependencies. It requires the PERL5LIB environment variable to point to the location of our cached dependencies.

That’s the whole deal. The first run will be the slowest, depending on the number of dependencies, but subsequent runs will be much faster, up to the seven-day caching period. For a complex project like Sqitch, which uses the same OS and Perl version for most of its actions, this results in a tremendous build time savings. CI configurations we’ve used in the past often took an hour or more to run. Today, most builds take only a few minutes to test, with longer times determined not by dependency installation but by container and database latency.

Keep DBI’s connect_cached From Horking Transactions

I’ve been on a bit of a Perl hacking tear lately. In addition to knocking out Test::XPath last week, I’ve been experimenting with TAP::Harness sources, Template::Declare, Catalyst views, a new Module::Build subclass for building database-backed applications, and, last but not least, an IRC logging bot. Oh, and that application I’m working on for PGX with Quinn Weaver. So much is crowding my mind these days that I’m having trouble sleeping. Tonight I’m up late hacking to try to get some of this stuff out of my head.

But before I crash, I wanted to share a useful DBI hack. I use connect_cached in a lot of my applications, because it’s a nice way to reuse database handles without having to figure out my own caching algorithm. The code I have to use it looks like this:

sub dbh {
    my $self = shift;
    DBI->connect_cached( @{ $self->_dbi }{qw(dsn username password)}, {
        PrintError     => 0,
        RaiseError     => 0,
        HandleError    => Exception::Class::DBI->handler,
        AutoCommit     => 1,
    });
}

Very simple. I just call the dbh() method whenever I need to talk to the database, and I’m set. Except for one problem: transactions.

Say I have a method that grabs the handle, starts a transaction with the begin_work method, and then inserts a row. Then another method grabs the handle from dbh() on the assumption that it’s in the same transaction, and does its own work. Only, it’s not the same transaction, because, unfortunately, DBI sets the attributes passed to connect_cached every single time it’s called!. So even though that second method may think it’s in the middle of a transaction, it’s really not, because when connect_cached sets AutoCommit back to 1, the transaction gets committed.

Oops.

This really fucks with my tests, where I’m often fetching the database handle to start a transaction, running some tests, and then wanting to rollback the transaction when I’m done. It’s irritating as all hell to discover that data has been inserted into the database. And the DBI, alas, just gives me this warning:

rollback ineffective with AutoCommit enabled at t/botinst.t line 67.

I’m likely not to notice until I get a duplicate key error the next time I run the tests.

As I was dealing with this today, my memory started poking at me, telling me that I’ve dealt with this before. And sure enough, a quick Google shows that Tim Bunce and I had an extensive conversation on this very topic – over four years ago. If you’re patient enough to dig through that thread, you’ll note that this issue is due to some architectural difficulties in the DBI, to be worked out in DBI 2.

Over the last four years, I’ve implemented a couple of solutions to this problem, all involving my code tracking the transaction state and modifying the AutoCommit attribute in the appropriate places. It’s all rather fragile. But as I dug through the thread, I discovered a much cleaner fix, using a little-known and so-far undocumented feature of the DBI: callbacks. This is actually I feature I half-way implemented in the DBI years ago, getting it just far enough that Tim was willing to finish it. And it’s just sitting there, waiting to be used.

So here’s the trick: Specify a callback for the connect_cached() method that’s used only when an existing file handle is retrieved from the cache. A bunch of stuff is passed to the callback, but the important one is the fifth argument, the attributes. All it has to do is delete the AutoCommit attribute. Since this callback is called before the DBI looks at the attributes to set them on the handle, the callback effectively prevents the DBI from horking up your transactions.

Here’s the modified code:

my $cb = {
    'connect_cached.reused' => sub { delete $_[4]->{AutoCommit} },
};

sub dbh {
    my $self = shift;
    DBI->connect_cached( @{ $self->_dbi }{qw(dsn username password)}, {
        PrintError     => 0,
        RaiseError     => 0,
        HandleError    => Exception::Class::DBI->handler,
        AutoCommit     => 1,
        Callbacks      => $cb,
    });
}

Callbacks are passed as a hash reference, with the keys being the names of the DBI methods that should trigger the callbacks, such as ping, data_sources, or connect_cached. The values are, of course, code references. When the DBI calls the code callbacks, it passes in stuff relevant to the method.

In the case of connect_cached, there are two additional special-case callbacks, connect_cached.new and connect_cached.reused, so that you can have different callbacks execute depending on whether connect_cached used a cached database handle or had to create a new one. Here, I’ve used connect_cached.reused, of course, and all I do is kill off the AutoCommit attribute before the DBI gets its greedy hands on it. Et voilà, problem solved!

And before you ask, no, you can’t simply omit AutoCommit from the attributes passed to connect_cached, because the DBI helpfully adds it for you.

So now this is here so I can find it again when it bites me next year, and I hope it helps you, too. Meanwhile, perhaps someone could take it upon themselves to document DBI’s callbacks? At this point, the closest thing to documentation is in the tests. (Hrm. I think I might have had a hand in writing them.) Check ’em out.

Looking for the comments? Try the old layout.

Where iCal Keeps Invitations

I was fiddling with iCalendar invitations yesterday, trying to get Sandy’s .ics files to import into Outlook. I got that figured out (yes!), but in the meantime iCal started crashing on me. I was reasonable sure that it was due to a bogus invitation file, but could not for the life of me figure out where iCal was keeping such files. It just kept crashing on me as second or so after starting up, every time.

I finally figured it out by quitting all my apps, moving all of the folders in ~/Library to a temporary folder, and firing up iCal to see what folds it would create. And there it was: ~/Library/Caches/com.apple.iCal. I quit iCal, deleted the new folders in ~/Library, moved the originals back, and looked inside the iCal caches folder to find a bunch of invitation files in the incoming folder. I deleted them all and iCal fired up again without a hitch. W00t!

So if you’re having problems with iCal crashing and have a few invitations in it and you’re wondering how to get iCal to ignore them, just quit iCal, delete all of the files in /Users/yourusername/Library/Caches/com.apple.iCal/incoming, and start iCal back up again.

And now I’ll be able to find this information again when next I need it. :-)

Looking for the comments? Try the old layout.