Just a Theory

By David E. Wheeler

Posts about Perl

Adopt My Modules

Dear Perl Community,

Over the last 17 years, I’ve created, released, updated, and/or maintained a slew of Perl modules on CPAN. Recently my work has changed significantly, and I no longer have the time to properly care for them all. A few, like Pod::Simple and Plack::Middleware::MethodOverride have co-maintainers, but most don’t. They deserve more love than I can currently provide. All, therefore, are up for adoption.

If you regularly use my modules, use a service that depends on them, or just like to contribute the community, consider becoming a maintainer! Have a look at the list, and if you’d like to rescue an orphan module, hit me up via Twitter or email me at david at this domain.

More about…

Wanted: New SVN::Notify Maintainer

I’ve used Subversion very occasionally since 2009, and SVN::Notify at all. Over the years, I’ve fixed minor issues with it now and then, and made the a couple of releases to address issues fixed by others. But it’s past the point where I feel qualified to maintain it. Hell, the repository for SVN::Notify has been hosted on GitHub ever since 2011. I don’t have an instance of Subversion against which to test it; nor do I have any SMTP servers to throw test messages at.

In short, it’s past time I relinquished maintenance of this module to someone with a vested interest in its continued use. Is that you? Do you need to keep SVN::Notify running for your projects, and have a few TUITs to fix the occasional bug or security issue? If so, drop me a line (david @ this domain). I’d be happy to transfer the repository.

Please Test Pod::Simple 3.29_3

Pod Book

I’ve just pushed Pod-Simple 3.29_v3 to CPAN. Karl Williamson did a lot of hacking on this release, finally adding support for EBCDIC. But as part of that work, and in coordination with Pod::Simple’s original author, Sean Burke, as well as pod-people, we have switched the default encoding from Latin-1 to CP-1252.

On the surface, that might sound like a big change, but in truth, it’s pretty straight-forward. CP-1252 is effectively a superset of Latin-1, repurposing 30 or so unused control characters from Latin-1. Those characters are pretty common on Windows (the home of the CP family of encodings), especially in pastes from Word. It’s nice to be able to pick those up essentially for free.

Still, Karl’s done more than that. He also updated the encoding detection to do a better job at detecting UTF-8. This is the real default. Pod::Simple only falls back on CP1252 if there are no obvious UTF-8 byte sequences in your Pod.

Overall these changes should be a great improvement. Better encoding support is always a good idea. But it is a pretty significant change, including a change to the Pod spec. Hence the test release. Please make sure it works well with your code by installing it today:

cpan D/DW/DWHEELER/Pod-Simple-3.29_3.tar.gz
cpanm DWHEELER/Pod-Simple-3.29_3.tar.gz

Oh, and one last thing: If Pod::Simple fails to properly recognize the encoding in your Pod file, you can always use the =encoding command early in your Pod file to make it explicit:

=encoding CP1254

Build Modern Perl RPMs with rpmcpan

iovation + Perl = Love

We’ve been using the CentOS Perl RPMs at iovation to run all of our Perl applications. This has been somewhat painful, because the version of Perl, 5.10.1, is quite old — it shipped in August 2009. In fact, it consists mostly of bug fixes against Perl 5.10.0, which shipped in December 2007! Many of the modules provided by CentOS core and EPEL are quite old, as well, and we had built up quite the collection of customized module RPMs managed by a massive spaghetti-coded Jenkins job. When we recently ran into a Unicode issue that would best have been addressed by running a more modern Perl — rather than a hinky workaround — I finally sat down and knocked out a way to get a solid set of Modern Perl and related CPAN RPMs.

I gave it the rather boring name rpmcpan, and now you can use it, too. Turns out, DevOps doesn’t myopically insist on using core RPMs in the name of some abstract idea about stability. Rather, we just need a way to easily deploy our stuff as RPMs. If the same applies to your organization, you can get Modern Perl RPMs, too.

Here’s how we do it. We have a new Jenkins job that runs both nightly and whenever the rpmcpan Git repository updates. It uses the MetaCPAN API to build the latest versions of everything we need. Here’s how to get it to build the latest version of Perl, 5.20.1:

./bin/rpmcpan --version 5.20.1

That will get you a nice, modern Perl RPM, named perl520, completely encapsulated in /usr/local/perl520. Want 5.18 instead: Just change the version:

./bin/rpmcpan --version 5.18.2

That will give you perl518. But that’s not all. You want to build CPAN distributions against that version. Easy. Just edit the dists.json file. Its contents are a JSON object where the keys name CPAN distributions (not modules), and the values are objects that customize our RPMs get built. Most of the time, the objects can be empty:

{
    "Try-Tiny": {}
}

This results in an RPM named perl520-Try-Tiny (or perl518-Try-Tiny, etc.). Sometimes you might need additional information to customize the CPAN spec file generated to build the distribution. For example, since this is Linux, we need to exclude a Win32 dependency in the Encode-Locale distribution:

{
    "Encode-Locale": { "exclude_requires": ["Win32::Console"] }
}

Other distributions might require additional RPMs or environment variables, like DBD-Pg, which requires the PostgreSQL RPMs:

{
    "DBD-Pg": {
        "build_requires": ["postgresql93-devel", "postgresql93"],
        "environment": { "POSTGRES_HOME": "/usr/pgsql-9.3" }
    }
}

See the README for a complete list of customization options. Or just get started with our dists.json file, which so far builds the bare minimum we need for one of our Perl apps. Add new distributions? Send a pull request! We’ll be doing so as we integrate more of our Perl apps with a Modern Perl and leave the sad RPM past behind.

More about…

Localize Your Perl Apps with this One Weird Trick

Nota Bene: This is a republication of a [post that originally appeared in the 2013 Perl Advent Calendar.


These days, gettext is far and away the most widely-used localization (l10n) and internationalization (i18n) library for open-source software. So far, it has not been widely used in the Perl community, even though it’s the most flexible, capable, and easy-to use solution, thanks to Locale::TextDomain.1 How easy? Let’s get started!

Module Internationale

First, just use Locale::TextDomain. Say you’re creating an awesome new module, Awesome::Module. These CPAN distribution will be named Awesome-Module, so that’s the “domain” to use for its localizations. Just let Locale::TextDomain know:

use Locale::TextDomain 'Awesome-Module';

Locale::TextDomain will later use this string to look for the appropriate translation catalogs. But don’t worry about that just yet. Instead, start using it to translate user-visible strings in your code. With the assistance of the Locale::TextDomain’s [comprehensive documentation], you’ll find it second nature to internationalize your modules in no time. For example, simple strings are denoted with __:

say __ 'Greetings puny human!';

If you need to specify variables, use __x:

say __x(
   'Thank you {sir}, may I have another?',
   sir => $username,
);

Need to manage plurals? Use __n:

say __n(
    'I will not buy this record, it is scratched.',
    'I will not buy these records, they are scratched.',
    $num_records
);

If $num_records is 1, the first phrase will be used. Otherwise the second.

Sometimes you gotta do both, mix variables and plurals. __nx has got you covered there:

say __nx(
    'One item has been grokked.',
    '{count} items have been grokked.',
    $num_items,
    count => $num_items,
);

Congratulations! Your module is now internationalized. Wasn’t that easy? Make a habit of using these functions in all the modules in your distribution, always with the Awesome-Module domain, and you’ll be set.

Encode da Code

Locale::TextDomain is great, but it dates from a time when Perl character encoding was, shall we say, sub-optimal. It therefore took it upon itself to try to do the right thing, which is to to detect the locale from the runtime environment and automatically encode as appropriate. Which might work okay if all you ever do is print localized messages — and never anything else.

If, on the other hand, you will be manipulating localized strings in your code, or emitting unlocalized text (such as that provided by the user or read from a database), then it’s probably best to coerce Locale::TextDomain to return Perl strings, rather than encoded bytes. There’s no formal interface for this in Locale::TextDomain, so we have to hack it a bit: set the $OUTPUT_CHARSET environment variable to “UTF-8” and then bind a filter. Don’t know what that means? Me neither. Just put this code somewhere in your distribution where it will always run early, before anything gets localized:

use Locale::Messages qw(bind_textdomain_filter);
use Encode;
BEGIN {
    $ENV{OUTPUT_CHARSET} = 'UTF-8';
    bind_textdomain_filter 'Awesome-Module' => \&Encode::decode_utf8;
}

You only have to do this once per domain. So even if you use Locale::TextDomain with the Awesome-Module domain in a bunch of your modules, the presence of this code in a single early-loading module ensures that strings will always be returned as Perl strings by the localization functions.

Environmental Safety

So what about output? There’s one more bit of boilerplate you’ll need to throw in. Or rather, put this into the main package that uses your modules to begin with, such as the command-line script the user invokes to run an application.

First, on the shebang line, follow Tom Christiansen’s advice and put -CAS in it (or set the $PERL_UNICODE environment variable to AS). Then use the POSIX setlocale function to the appropriate locale for the runtime environment. How? Like this:

#!/usr/bin/perl -CAS

use v5.12;
use warnings;
use utf8;
use POSIX qw(setlocale);
BEGIN {
    if ($^O eq 'MSWin32') {
        require Win32::Locale;
        setlocale POSIX::LC_ALL, Win32::Locale::get_locale();
    } else {
        setlocale POSIX::LC_ALL, '';
    }
}

use Awesome::Module;

Locale::TextDomain will notice the locale and select the appropriate translation catalog at runtime.

Is that All There Is?

Now what? Well, you could do nothing. Ship your code and those internationalized phrases will be handled just like any other string in your code.

But what’s the point of that? The real goal is to get these things translated. There are two parts to that process:

  1. Parsing the internationalized strings from your modules and creating language-specific translation catalogs, or “PO files”, for translators to edit. These catalogs should be maintained in your source code repository.

  2. Compiling the PO files into binary files, or “MO files”, and distributing them with your modules. These files should not be maintained in your source code repository.

Until a year ago, there was no Perl-native way to manage these processes. Locale::TextDomain ships with a sample Makefile demonstrating the appropriate use of the GNU gettext command-line tools, but that seemed a steep price for a Perl hacker to pay.

A better fit for the Perl hacker’s brain, I thought, is Dist::Zilla. So I wrote Dist::Zilla::LocaleTextDomain to encapsulate the use of the gettext utiltiies. Here’s how it works.

First, configuring Dist::Zilla to compile localization catalogs for distribution: add these lines to your dist.ini file:

[ShareDir]
[LocaleTextDomain]

There are configuration attributes for the LocaleTextDomain plugin, such as where to find the PO files and where to put the compiled MO files. In case you didn’t use your distribution name as your localization domain in your modules, for example:

use Locale::TextDomain 'com.example.perl-libawesome';

Then you’d set the textdomain attribute so that the LocaleTextDomain plugin can find the translation catalogs:

[LocaleTextDomain]
textdomain = com.example.perl-libawesome

Check out the configuration docs for details on all available attributes.

At this point, the plugin doesn’t do much, because there are no translation catalogs yet. You might see this line from dzil build, though:

[LocaleTextDomain] Skipping language compilation: directory po does not exist

Let’s give it something to do!

Locale Motion

To add a French translation file, use the msg-init command2:

% dzil msg-init fr
Created po/fr.po.

The msg-init command uses the GNU gettext utilities to scan your Perl source code and initialize the French catalog, po/fr.po. This file is now ready translation! Commit it into your source code repository so your agile-minded French-speaking friends can find it. Use msg-init to create as many language files as you like:

% dzil msg-init de ja.JIS en_US.UTF-8 en_UK.UTF-8
Created po/de.po.
Created po/ja.po.
Created po/en_US.po.
Created po/en_UK.po.

Each language has its on PO file. You can even have region-specific catalogs, such as the en_US and en_UK variants here. Each time a catalog is updated, the changes should be committed to the repository, like code. This allows the latest translations to always be available for compilation and distribution. The output from dzil build now looks something like:

po/fr.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/ja.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/en_US.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/en_UK.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.

The resulting MO files will be in the shared directory of your distribution:

% find Awesome-Module-0.01/share -type f
Awesome-Module-0.01/share/LocaleData/de/LC_MESSAGES/Awesome-Module.mo
Awesome-Module-0.01/share/LocaleData/en_UK/LC_MESSAGES/Awesome-Module.mo
Awesome-Module-0.01/share/LocaleData/en_US/LC_MESSAGES/Awesome-Module.mo
Awesome-Module-0.01/share/LocaleData/ja/LC_MESSAGES/Awesome-Module.mo

From here Module::Build or ExtUtils::MakeMaker will install these MO files with the rest of your distribution, right where Locale::TextDomain can find them at runtime. The PO files, on the other hand, won’t be used at all, so you might as well exclude them from the distribution. Add this line to your MANIFEST.SKIP to prevent the po directory and its contents from being included in the distribution:

^po/

Mergers and Acquisitions

Of course no code base is static. In all likelihood, you’ll change your code — and end up adding, editing, and removing localizable strings as a result. You’ll need to periodically merge these changes into all of your translation catalogs so that your translators can make the corresponding updates. That’s what the the msg-merge command is for:

% dzil msg-merge
extracting gettext strings
Merging gettext strings into po/de.po
Merging gettext strings into po/en_UK.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po

This command re-scans your Perl code and updates all of the language files. Old messages will be commented-out and new ones added. Commit the changes and give your translators a holler so they can keep the awesome going.

Template Scan

The msg-init and msg-merge commands don’t actually scan your source code. Sort of lied about that. Sorry. What they actually do is merge a template file into the appropriate catalog files. If this template file does not already exist, a temporary one will be created and discarded when the initialization or merging is done.

But projects commonly maintain a permanent template file, stored in the source code repository along with the translation catalogs. For this purpose, we have the msg-scan command. Use it to create or update the template, or POT file:

% dzil msg-scan
extracting gettext strings into po/Awesome-Module.pot

From here on in, the resulting .pot file will be used by msg-init and msg-merge instead of scanning your code all over again. But keep in mind that, if you do maintain a POT file, future merges will be a two-step process: First run msg-scan to update the POT file, then msg-merge to merge its changes into the PO files:

% dzil msg-scan
extracting gettext strings into po/Awesome-Module.pot
% dzil msg-merge
Merging gettext strings into po/de.po
Merging gettext strings into po/en_UK.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po

Lost in Translation

One more thing, a note for translators. They can, of course, also use msg-scan and msg-merge to update the catalogs they’re working on. But how do they test their translations? Easy: use the msg-compile command to compile a single catalog:

% dzil msg-compile po/fr.po
[LocaleTextDomain] po/fr.po: 195 translated messages.

The resulting compiled catalog will be saved to the LocaleData subdirectory of the current directory, so it’s easily available to your app for testing. Just be sure to tell Perl to include the current directory in the search path, and set the $LANGUAGE environment variable for your language. For example, here’s how I test the [Sqitch] French catalog:

% dzil msg-compile po/fr.po              
[LocaleTextDomain] po/fr.po: 148 translated messages, 36 fuzzy translations, 27 untranslated messages.
% LANGUAGE=fr perl -Ilib -CAS -I. bin/sqitch foo
"foo" n'est pas une commande valide

Just be sure to delete the LocaleData directory when you’re done — or at least don’t commit it to the repository.

TL;DR

This may seem like a lot of steps, and it is. But once you have the basics in place — Configuring the Dist::Zilla::LocaleTextDomain plugin, setting up the “textdomain filter”, setting and the locale in the application — there are just a few habits to get into:

  • Use the functions __, __x, __n, and __nx to internationalize user-visible strings
  • Run msg-scan and msg-merge to keep the catalogs up-to-date
  • Keep your translators in the loop.

The Dist::Zilla::LocaleTextDomain plugin will do the rest.


  1. What about Locale::Maketext, you ask? It has not, alas, withsthood the test of time. For details, see Nikolai Prokoschenko’s epic 2009 polemic, “On the state of i18n in Perl.” See also Steffen Winkler’s presentation, Internationalisierungs-Framework auswählen (and the English translation by Aristotle Pagaltzis), from German Perl Workshop 2010.

  2. The msg-init function — like all of the dzil msg-* commands — uses the GNU gettext utilities under the hood. You’ll need a reasonably modern version in your path, or else it won’t work.

Lexical Subroutines

Ricardo Signes:

One of the big new experimental features in Perl 5.18.0 is lexical subroutines. In other words, you can write this:

my sub quickly { ... }
my @sorted = sort quickly @list;

my sub greppy (&@) { ... }
my @grepped = greppy { ... } @input;

These two examples show cases where lexical references to anonymous subroutines would not have worked. The first argument to sort must be a block or a subroutine name, which leads to awful code like this:

sort { $subref->($a, $b) } @list

With our greppy, above, we get to benefit from the parser-affecting behaviors of subroutine prototypes.

My favorite tidbit about this feature? Because lexical subs are lexical, and method-dispatch is package-based, lexical subs are not subject to method lookup and dispatch! This just might alleviate the confusion of methods and subs, as chromatic complained about just yesterday. Probably doesn’t solve the problem for imported subs, though.

More about…

TPF To Revamp Grants

Alberto Simões:

Nevertheless, this lack of “lower than $3000” grant proposals, and the fact that lot of people have been discussing (and complaining) about this value being too low, the Grants Committee is starting a discussion on rewriting and reorganizing the way it works. Namely, in my personal blog I opened a discussion about the Grants Committee some time ago, and had plenty of feedback, that will be helpful for our internal discussion.

This is great news. I would love to see more and more ambitious grant proposals, as well as awards people an subsist on. I look forward to seeing the new rules.

More about…

Mopping the Moose

Stevan Little:

I spent much of last week on vacation with the family so very little actual coding got done on the p5-mop, but instead I did a lot of thinking. My next major goal for the p5-mop is to port a module written in Moose, in particular, one that uses many different Moose features. The module I have chosen to port is Bread::Board and I chose it for two reasons; first, it was the first real module that I wrote using Moose and second, it makes heavy use of a lot of Moose’s features.

I’m so happy to see Stevan making progress on the Perl 5 MOP again.

More about…

A Perl Blog

I have been unsatisfied with Just a Theory for some time. I started that blog in 2004 more or less for fun, thinking it would be my permanent home on the internet. And it has been. But the design, while okay in 2004, is just awful by today’s standards. A redesign is something I have planned to do for quite some time.

I had also been thinking about my audience. Or rather, audiences. I’ve blogged about many things, but while a few dear family members might want to read everything I ever post, most folks, I think, are interested in only a subset of topics. Readers of Just a Theory came for posts about Perl, or PostgreSQL, or culture, travel, or politics. But few came for all those topics, in my estimation.

More recently, a whole bunch of top-level domains have opened up, often with the opportunity for anyone to register them. I was lucky enough to snag theory.pm and theory.pl, thinking that perhaps I would create a site just for blogging about Perl. I also nabbed theory.so, which I might dedicate to database-related blogging, and theory.me, which would be my personal blog (travel, photography, cultural essays, etc.).

And then there is Octopress. A blogging engine for hackers. Perfect for me. Hard to imagine something more appropriate (unless it was written in Perl). It seemed like a good opportunity to partition my online blogging.

So here we are with my first partition. theory.pm is a Perl blog. Seemed like the perfect name. I fiddled with it off and on for a few months, often following Matt Gemmell’s Advice, and I’m really happy with it. The open-source fonts Source Sans Pro and Source Code Pro, from Adobe, look great. The source code examples are beautifully marked up and displayed using the Solarized color scheme (though presentation varies in feed readers). Better still, it’s equally attractive and readable on computers, tablets and phones, thanks to the foundation laid by Aron Cedercrantz’s BlogTheme.

I expect to fork this code to create a database blog soon, and then perhaps put together a personal blog. Maybe the personal blog will provide link posts for posts on the other sites, so that if anyone really wants to read everything, they can. I haven’t decided yet.

In the meantime, now that I have a dedicated Perl blog, I guess I’ll have to start writing more Perl-related stuff. I’m starting with some posts about the state of exception handling in Perl 5, the first of which is already up. Stay tuned for more.

Update: 2018-05-23: I decided that a separate Perl blog wasn’t a great idea after all, and relaunched Just a Theory.

Trying Times

Exception handling is a bit of a pain in Perl. Traditionally, we use eval {}:

eval {
    foo();
}
if (my $err = $@) {
    # Inspect $err…
}

The use of the if block is a bit unfortunate; worse is the use of the global $@ variable, which has inflicted unwarranted pain on developers over the years1. Many Perl hackers put Try::Tiny to work to circumvent these shortcomings:

try {
    foo();
} catch {
    # Inspect $_…
};

Alas, Try::Tiny introduces its own idiosyncrasies, particularly its use of subroutine references rather than blocks. While a necessity of a pure-Perl implementation, it prevents returning from the calling context. One must work around this deficiency by checking return values:

my $rv = try {
   f();
} catch {
   # …
};

if (!$rv) {
   return;
}

I can’t tell you how often this quirk burns me.

Sadly, there is a deeper problem then syntax: Just what, exactly, is an exception? How does one determine the exceptional condition, and what can be done about it? It might be a string. The string might be localized. It might be an Exception::Class object, or a Throwable object, or a simple array reference. Or any other value a Perl scalar can hold. This lack of specificity requires careful handling of exceptions:

if (my $err = $@) {
    if (ref $err) {
        if (eval { $err->isa('Exception::Class') }) {
            if ( $err->isa('SomeException') ) {
                # …
            } elsif ( $err->isa('SomeException') ) {
                # …
            } else {
                # …
            }
        } elsif (eval { $err->DOES('Throwable') }) {
            # …
        } elsif ( ref $err eq 'ARRAY') {
            # …
        }
    } else {
        if ( $err =~ /DBI/ ) {
            # …
        } elsif ( $err =~ /cannot open '([^']+)'/ ) {
            # …
        }
    }
}

Not every exception handler requires so many conditions, but I have certainly exercised all these approaches. Usually my exception handlers accrete condition as users report new, unexpected errors.

That’s not all. My code frequently requires parsing information out of a string error. Here’s an example from PGXN::Manager:

try {
    $self->distmeta(decode_json scalar $member->contents );
} catch {
    my $f = quotemeta __FILE__;
    (my $err = $_) =~ s/\s+at\s+$f.+//ms;
    $self->error([
        'Cannot parse JSON from “[_1]”: [_2]',
        $member->fileName,
        $err
    ]);
    return;
} or return;

return $self;

When JSON throws an exception on invalid JSON, the code must catch that exception to show the user. The user cares not at all what file threw the exception, nor the line number. The code must strip that stuff out before passing the original message off to a localizing error method.

Gross.

It’s time to end this. A forthcoming post will propose a plan for adding proper exception handling to the core Perl language, including exception objects and an official try/catch syntax.


  1. In fairness much of the $@ pain has been addressed in Perl 5.14.
More about…

Sqitch on Windows (and Linux, Solaris, and OS X)

Thanks to the hard-working hamsters at the ActiveState PPM Index, Sqitch is available for installation on Windows. According to the Sqitch PPM Build Status, the latest version is now available for installation. All you have to do is:

  1. Download and install ActivePerl
  2. Open the Command Prompt
  3. Type ppm install App-Sqitch

As of this writing, only PostgreSQL is supported, so you will need to install PostgreSQL.

But otherwise, that’s it. In fact, this incantation works for any OS that ActivePerl supports. Here’s where you can find the sqitch executable on each:

  • Windows: C:\perl\site\bin\sqitch.bat
  • Mac OS X: ~/Library/ActivePerl-5.16/site/bin/sqitch (Or /usr/local/ActivePerl-5.16/site/bin if you run sudo ppm)
  • Linux: /opt/ActivePerl-5.16/site/bin/sqitch
  • Solaris/SPARC (Business edition-only): /opt/ActivePerl-5.16/site/bin/sqitch

This makes it easy to get started with Sqitch on any of those platforms without having to become a Perl expert. So go for it, and then get started with the tutorial!

Looking for the comments? Try the old layout.

Dist::Zilla::LocaleTextDomain for Translators

Here’s a followup on my post about localizing Perl modules with Locale::TextDomain. Dist::Zilla::LocaleTextDomain was great for developers, less so for translators. A Sqitch translator asked how to test the translation file he was working on. My only reply was to compile the whole module, then install it and test it. Ugh.

Today, I released Dist::Zilla::LocaleTextDomain v0.85 with a new command, msg-compile. This command allows translators to easily compile just the file they’re working on and nothing else. For pure Perl modules in particular, it’s pretty easy to test then. By default, the compiled catalog goes into ./LocaleData, where convincing the module to find it is simple. For example, I updated the test sqitch app to take advantage of this. Now, to test, say, the French translation file, all the translator has to do is:

> dzil msg-compile po/fr.po
[LocaleTextDomain] po/fr.po: 155 translated messages, 24 fuzzy translations, 16 untranslated messages.

> LANGUAGE=fr ./t/sqitch foo
"foo" n'est pas une commande valide

I hope this simplifies things for translators. See the notes for translators for a few more words on the subject.

Looking for the comments? Try the old layout.

Localize Your Perl modules with Locale::TextDomain and Dist::Zilla

I’ve just released Dist::Zilla::LocaleTextDomain v0.80 to the CPAN. This module adds support for managing Locale::TextDomain-based localization and internationalization in your CPAN libraries. I wanted to make it as simple as possible for CPAN developers to do localization and to support translators in their projects, and Dist::Zilla seemed like the perfect place to do it, since it has hooks to generate the necessary binary files for distribution.

Starting out with Locale::TextDomain was decidedly non-intuitive for me, as a Perl hacker, likely because of its gettext underpinnings. Now that I’ve got a grip on it and created the Dist::Zilla support, I think it’s pretty straight-forward. To demonstrate, I wrote the following brief tutorial, which constitutes the main documentation for the Dist::Zilla::LocaleTextDomain distribution. I hope it makes it easier for you to get started localizing your Perl libraries.

Localize Your Perl modules with Locale::TextDomain and Dist::Zilla

Locale::TextDomain provides a nice interface for localizing your Perl applications. The tools for managing translations, however, is a bit arcane. Fortunately, you can just use this plugin and get all the tools you need to scan your Perl libraries for localizable strings, create a language template, and initialize translation files and keep them up-to-date. All this is assuming that your system has the gettext utilities installed.

The Details

I put off learning how to use Locale::TextDomain for quite a while because, while the gettext tools are great for translators, the tools for the developer were a little more opaque, especially for Perlers used to Locale::Maketext. But I put in the effort while hacking Sqitch. As I had hoped, using it in my code was easy. Using it for my distribution was harder, so I decided to write Dist::Zilla::LocaleTextDomain to make life simpler for developers who manage their distributions with Dist::Zilla.

What follows is a quick tutorial on using Locale::TextDomain in your code and managing it with Dist::Zilla::LocaleTextDomain.

This is my domain

First thing to do is to start using Locale::TextDomain in your code. Load it into each module with the name of your distribution, as set by the name attribute in your dist.ini file. For example, if your dist.ini looks something like this:

name    = My-GreatApp
author  = Homer Simpson <homer@example.com>
license = Perl_5

Then, in you Perl libraries, load Locale::TextDomain like this:

use Locale::TextDomain qw(My-GreatApp);

Locale::TextDomain uses this value to find localization catalogs, so naturally Dist::Zilla::LocaleTextDomain will use it to put those catalogs in the right place.

Okay, so it’s loaded, how do you use it? The documentation of the Locale::TextDomain exported functions is quite comprehensive, and I think you’ll find it pretty simple once you get used to it. For example, simple strings are denoted with __:

say __ 'Hello';

If you need to specify variables, use __x:

say __x(
    'You selected the color {color}',
    color => $color
);

Need to deal with plurals? Use __n

say __n(
    'One file has been deleted',
    'All files have been deleted',
    $num_files,
);

And then you can mix variables with plurals with __nx:

say __nx(
    'One file has been deleted.',
    '{count} files have been deleted.',
    $num_files,
    count => $num_files,
);

Pretty simple, right? Get to know these functions, and just make it a habit to use them in user-visible messages in your code. Even if you never expect to translate those messages, just by doing this you make it easier for someone else to come along and start translating for you.

The setup

Now you’re localizing your code. Great! What’s next? Officially, nothing. If you never do anything else, your code will always emit the messages as written. You can ship it and things will work just as if you had never done any localization.

But what’s the fun in that? Let’s set things up so that translation catalogs will be built and distributed once they’re written. Add these lines to your dist.ini:

[ShareDir]
[LocaleTextDomain]

There are actually quite a few attributes you can set here to tell the plugin where to find language files and where to put them. For example, if you used a domain different from your distribution name, e.g.,

use Locale::TextDomain 'com.example.My-GreatApp';

Then you would need to set the textdomain attribute so that the LocaleTextDomain does the right thing with the language files:

[LocaleTextDomain]
textdomain = com.example.My-GreatApp

Consult the LocaleTextDomain configuration docs for details on all available attributes.

(Special note until this Locale::TextDomain patch is merged: set the share_dir attribute to lib instead of the default value, share. If you use Module::Build, you will also need a subclass to do the right thing with the catalog files; see “Installation” in Dist::Zilla::Plugin::LocaleTextDomain for details.)

What does this do including the plugin do? Mostly nothing. You might see this line from dzil build, though:

[LocaleTextDomain] Skipping language compilation: directory po does not exist

Now at least you know it was looking for something to compile for distribution. Let’s give it something to find.

Initialize languages

To add translation files, use the msg-init command:

> dzil msg-init de
Created po/de.po.

At this point, the gettext utilities will need to be installed and visible in your path, or else you’ll get errors.

This command scans all of the Perl modules gathered by Dist::Zilla and initializes a German translation file, named po/de.po. This file is now ready for your German-speaking public to start translating. Check it into your source code repository so they can find it. Create as many language files as you like:

> dzil msg-init fr ja.JIS en_US.UTF-8
Created po/fr.po.
Created po/ja.po.
Created po/en_US.po.

As you can see, each language results in the generation of the appropriate file in the po directory, sans encoding (i.e., no .UTF-8 in the en_US file name).

Now let your translators go wild with all the languages they speak, as well as the regional dialects. (Don’t forget to colour your code with en_UK translations!)

Once you have translations and they’re committed to your repository, when you build your distribution, the language files will automatically be compiled into binary catalogs. You’ll see this line output from dzil build:

[LocaleTextDomain] Compiling language files in po
po/fr.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/ja.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/en_US.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.

You’ll then find the catalogs in the shared directory of your distribution:

> find My-GreatApp-0.01/share -type f
My-GreatApp-0.01/share/LocaleData/de/LC_MESSAGES/App-Sqitch.mo
My-GreatApp-0.01/share/LocaleData/en_US/LC_MESSAGES/App-Sqitch.mo
My-GreatApp-0.01/share/LocaleData/ja/LC_MESSAGES/App-Sqitch.mo

These binary catalogs will be installed as part of the distribution just where Locale::TextDomain can find them.

Here’s an optional tweak: add this line to your MANIFEST.SKIP:

^po/

This prevents the po directory and its contents from being included in the distribution. Sure, you can include them if you like, but they’re not required for the running of your app; the generated binary catalog files are all you need. Might as well leave out the translation files.

Mergers and acquisitions

You’ve got translation files and helpful translators given them a workover. What happens when you change your code, add new messages, or modify existing ones? The translation files need to periodically be updated with those changes, so that your translators can deal with them. We got you covered with the msg-merge command:

> dzil msg-merge
extracting gettext strings
Merging gettext strings into po/de.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po

This will scan your module files again and update all of the translation files with any changes. Old messages will be commented-out and new ones added. Just commit the changes to your repository and notify the translation army that they’ve got more work to do.

If for some reason you need to update only a subset of language files, you can simply list them on the command-line:

> dzil msg-merge po/de.po po/en_US.po
Merging gettext strings into po/de.po
Merging gettext strings into po/en_US.po

What’s the scan, man

Both the msg-init and msg-merge commands depend on a translation template file to create and merge language files. Thus far, this has been invisible: they will create a temporary template file to do their work, and then delete it when they’re done.

However, it’s common to also store the template file in your repository and to manage it directly, rather than implicitly. If you’d like to do this, the msg-scan command will scan the Perl module files gathered by Dist::Zilla and make it for you:

> dzil msg-scan
gettext strings into po/My-GreatApp.pot

The resulting .pot file will then be used by msg-init and msg-merge rather than scanning your code all over again. This actually then makes msg-merge a two-step process: You need to update the template before merging. Updating the template is done by exactly the same command, msg-scan:

> dzil msg-scan
extracting gettext strings into po/My-GreatApp.pot
> dzil msg-merge
Merging gettext strings into po/de.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po

Ship It!

And that’s all there is to it. Go forth and localize and internationalize your Perl apps!

Acknowledgements

My thanks to Ricardo Signes for invaluable help plugging in to Dist::Zilla, to Guido Flohr for providing feedback on this tutorial and being open to my pull requests, to David Golden for I/O capturing help, and to Jérôme Quelin for his patience as I wrote code to do the same thing as Dist::Zilla::Plugin::LocaleMsgfmt without ever noticing that it already existed.

Looking for the comments? Try the old layout.

Use of DBI in Sqitch

Sqitch uses the native database client applications (psql, sqlite3, mysql, etc.). So for tracking metadata about the state of deployments, I have been trying to stick to using them. I’m first targeting PostgreSQL, and as a result need to open a connection to psql, start a transaction, and be able to read and write stuff to it as migrations go along. The IPC is a huge PITA. Furthermore, getting things properly quoted is also pretty annoying — and it will be worse for SQLite and MySQL, I expect (psql’s --set support is pretty slick).

If, on the other hand, I used the DBI, on the other hand, all this would be very easy. There is no IPC, just a direct connection to the database. It would save me a ton of time doing development, and be robust and safer to use (e.g., exception handling rather than platform-dependent signal handling (or not, in the case of Windows)). I am quite tempted to just so that.

However, I have been trying to be sensitive to dependencies. I had planned to make Sqitch simple to install on any system, and if you had the command-line client for your preferred database, it would just work. If I used the DBI instead, then Sqitch would not work at all unless you installed the appropriate DBI driver for your database of choice. This is no big deal for Perl people, of course, but I don’t want this to be a Perl people tool. I want it to be dead simple for anyone to use for any database. Ideally, there will be RPMs and Ubuntu packages, so one can just install it and go, and not have to worry about figuring out what additional Perl DBD to install for your database of choice. It should be transparent.

That is still my goal, but at this point the IPC requirements for controlling the clients is driving me a little crazy. Should I just give up and use the DBI (at least for now)? Or persevere with the IPC stuff and get it to work? Opinions wanted!

Looking for the comments? Try the old layout.

More about…

Today on the Perl Advent Calendar

Hey look everybody, I wrote today’s Perl Advent Calendar post, Less Tedium, More Transactions. Go read it! (Update: moved here.)

Looking for the comments? Try the old layout.

Less Tedium, More Transactions

A frequent pattern when writing database-backed applications with the DBI is to connect to the database and cache the database handle somewhere. A simplified example:

package MyApp::DB;
use DBI;
use strict;

my $DBH = DBI->connect('DBI:SQLite:dbname=myapp.db', '', '', {
    PrintError     => 0,
    RaiseError     => 1,
    AutoCommit     => 1,
    sqlite_unicode => 1,
});

sub dbh { $DBH }

Just load MyApp::DB anywhere in your app and, whenever you want to use the database, grab the handle from MyApp::DB->dbh.

This pattern is common enough that Apache::DBI was created to magically do it for you on mod_perl, and the DBI added connect_cached() so that it could cache connections itself. However, each of these solutions has some issues:

  • What happens when your program forks? Apache::DBI handles this condition, but neither the home-grown solution nor connect_cached() does, and identifying a forked database handle as the source of a crash is notoriously unintuitive.

  • What happens when your program spawns a new thread? Sure, some DBI drivers might still work, but others might not. Best to treat new threads the same as new processes and reconnect to the database. Neither Apache::DBI nor connect_cached() deal with threading issues, and of course neither does the custom solution.

  • Apache::DBI is magical and mysterious; but the magic comes with serious side-effects. Apache::DBI plugs itself right into the DBI itself, replacing its connection methods (which is why load ordering is so important to use it properly). Knowledge of Apache::DBI is actually built right into the DBI itself, meaning that the magic runs deep and both ways. These are pretty serious violations of encapsulation in both directions.

  • connect_cached() has a bit of its own unfortunate magic. Every call to connect_cached() resets the connection attributes. So if you have code in one place that starts a transaction, and code elsewhere but executed in the same scope that also fetches a connect_cached() handle, the transaction will be committed then and there, even though the code that started it might not be done with it. One can work around this issue via callbacks, but it’s a bit of a hack.

Using a custom caching solution avoids the magic, but getting fork- and thread-safety right is surprisingly non-trivial, in the same way that doing your own exception-handling is surprisingly non-trivial.

Enter DBIx::Connector, a module that efficiently manages your database connections in a thread- and fork-safe manner so that you don’t have to. If you already have a custom solution, switching to DBIx::Connector is easy. Here’s a revision of MyApp::DB that uses it:

package MyApp::DB;
use DBIx::Connector;
use strict;

my $CONN = DBIx::Connector->new('DBI:SQLite:dbname=myapp.db', '', '', {
    PrintError     => 0,
    RaiseError     => 1,
    AutoCommit     => 1,
    sqlite_unicode => 1,
});

sub conn { $CONN }
sub dbh  { $CONN->dbh }

Simple, right? You pass exactly the same parameters to DBIx::Connector->new that you passed to DBI->connect. The DBIx::Connector object simply proxies the DBI. You want the database handle itself, just call dbh() and proceed as usual, confident that if your app forks or spawns new threads, your database handle will be safe. Why? Because DBIx::Connector detects such changes, and re-connects to the database, being sure to properly dispose of the original connection. But really, you don’t have to worry about that, because DBIx::Connector does the worrying for you.

Execution Methods

DBIx::Connector is very good at eliminating the technical friction of process and thread management. But that’s not all there is to it.

Although you can just fetch the DBI handle from your DBIx::Connector object and go, a better approach is to use its execution methods. These methods scope execution to a code block. Here’s an example using run():

$conn->run(sub {
    shift->do($query);
});

That may not seem so useful, and is more to type, but the real power comes from the txn() method. txn() executes the code block within the scope of a transaction. So where you normally would write something like this:

use Try::Tiny;
use MyApp::DBH;

my $dbh = MyApp::DBH->dbh;  
try {
    $dbh->begin_work;
    # do stuff...
    $dbh->commit;
} catch {
    $dbh->rollback;
    die $_;
};

The try() method scopes the transaction for you, so that you can just focus on the work to be done and transaction management:

use Try::Tiny;
use MyApp::DBH;

try {
    MyApp::DBH->conn->txn(sub {
        # do stuff...
    }
} catch {
    die $_;
};

There’s no need to call begin_work, commit, or rollback, as txn() does all that for you. Furthermore, it improves the maintainability of your code, as the scope of the transaction is much more clearly defined as the scope of the code block. Additional calls to txn() or run() within that block are harmless, and just become part of the same transaction:

MyApp::DBH->conn->txn(sub {
    my $dbh = shift;
    $dbh->do($_) for @queries;
    $conn->run(sub {
        shift->do($expensive_query);
        $conn->txn(sub {
            shift->do($another_expensive_query);
        });
    });
});

Even cooler is the svp() method, which scopes execution of a code block to a savepoint, or subtransaction, if your database supports it (all of the drivers currently supported by DBIx::Connector do). For example, this transaction will commit the insertion of values 1 and 3, but not 2:

MyApp::DBH->conn->txn(sub {
    my $dbh = shift;
    $dbh->do('INSERT INTO table1 VALUES (1)');
    try {
        $conn->svp(sub {
            shift->do('INSERT INTO table1 VALUES (2)');
            die 'OMGWTF?';
        });
    } catch {
           warn "Savepoint failed: $_\n";
    };
    $dbh->do('INSERT INTO table1 VALUES (3)');
});

Connection Management

The recommended pattern for using a cached DBI handle is to call ping() when you fetch it from the cache, and reconnect if it returns false. Apache::DBI and connect_cached() do this for you, and so does DBIx::Connector. However, in a busy application ping() can get called a lot. We recently did some query analysis for a client, and found that 1% of the database execution time was taken up with ping() calls. That may not sound like a lot, but looking at the numbers, it amounted to 100K pings per hour. For something that just returns true 99.9*% of the time, it seems a bit silly.

Enter DBIx::Connector connection modes. The default mode is “ping”, as that’s what most installations are accustomed to. A second mode is “no_ping”, which simply disables pings. I don’t recommend that.

A better solution is to use “fixup” mode. This mode doesn’t normally call ping() either. However, if a code block passed to run() or txn() throws an exception, then DBIx::Connector will call ping(). If it returns false, DBIx::Connector reconnects to the database and executes the code block again. This configuration should handle some common situations, such as idle timeouts, without bothering you about it.

Specify “fixup” mode whenever you call an execution method, like so:

$conn->txn(fixup => sub { ... });

You can also specify that your connection always use “fixup” via the fixup() accessor. Modify the caching library like so (line 8 is new):

my $CONN = DBIx::Connector->new('DBI:SQLite:dbname=myapp.db', '', '', {
    PrintError     => 0,
    RaiseError     => 1,
    AutoCommit     => 1,
    sqlite_unicode => 1,
});

$CONN->mode('fixup'); # ⬅ ⬅ ⬅  enter fixup mode!

sub conn { $CONN }
sub dbh  { $CONN->dbh }

However, you must be more careful with fixup mode than with ping mode, because a code block can be executed twice. So you must be sure to write it such that there are no side effects to multiple executions. Don’t do this, for example:

my $count = 0;
$conn->txn(fixup => sub {
    shift->do('INSERT INTO foo (count) VALUES(?)', undef, ++$count);
});
say $count; # may be 1 or 2

Will it insert a value of 1 or 2? It’s much safer to remove non-transactional code from the block, like so:

my $count = 0;
++$count;
$conn->txn(fixup => sub {
    shift->do('INSERT INTO foo (count) VALUES(?)', undef, $count);
});
say $count; # can only be 1

An even trickier pattern to watch out for is something like this:

my $user = 'rjbs';
$conn->run(fixup => sub {
    my $dbh = shift;
    $dbh->do('INSERT INTO users (nick) VALUES (?)', undef, $user);

    # Do some other stuff...

    $dbh->do('INSERT INTO log (msg) VALUES (?)', undef, 'Created user');
});

If the database disconnects between the first and second calls to do, and DBIx::Connector manages to re-connect and run the block again, you might get a unique key violation on the first call to do. This is because we’ve used the run() method. In the fist execution of the block, user “rjbs” was inserted and autocommitted. On the second call, user “rjbs” is already there, and because it’s a username, we get a unique key violation.

The rule of thumb here is to use run() only for database reads, and to use txn() (and svp()) for writes. txn() will ensure that the transaction is rolled back, so the second execution of the code block will be side-effect-free.

Pedigree

DBIx::Connector is derived from patterns originally implemented for DBIx::Class, though it’s nearly all original code. The upside for those of us who don’t use ORMs is that we get this independent piece of ORM-like behavior without its ORMishness. So if you’re a database geek like me, DBIx::Connector is a great way to reduce technical friction without buying into the whole idea of an ORM.

As it turns out, DBIx::Connector is good not just for straight-to-database users, but also for ORMs. Both DBIx::Class and Rose::DB have plans to replace their own caching and transaction-handling implementations with DBIx::Connector under the hood. That will be great for everyone, as the problems will all be solved in this one place.

This post originally appeared on the Perl Advent Calendar 2011.