Just a Theory

By David E. Wheeler

Adopt My Modules

Dear Perl Community,

Over the last 17 years, I’ve created, released, updated, and/or maintained a slew of Perl modules on CPAN. Recently my work has changed significantly, and I no longer have the time to properly care for them all. A few, like Pod::Simple and Plack::Middleware::MethodOverride have co-maintainers, but most don’t. They deserve more love than I can currently provide. All, therefore, are up for adoption.

If you regularly use my modules, use a service that depends on them, or just like to contribute the community, consider becoming a maintainer! Have a look at the list, and if you’d like to rescue an orphan module, hit me up via Twitter or email me at david at this domain.

More about…

Evolutionary Theory

Back in 2013, a slew of new top-level domains became available, and I pounced on a number of them, thinking it’d be good to make a shorter domain my own. My favorite was theory.pm. In the early years of Just a Theory, I wrote mostly about Perl and related topics like Bricolage. I thought naming a Perl blog like a Perl module would be appropriate. By that time I wrote a lot about Postgres, and didn’t want to mix topics. So alongside theory.pm, I also launched theory.so — as in “stored objects”. Both used a new static design built on Octopress hosted on GitHub Pages.

Unfortunately, by this time I wrote very little about Perl anymore. I wrote more on Postgres and Sqitch, but had to shut down theory.so when the domain registration became too expensive. I merged it into theory.pm, but it never felt right to post about Postgres a “Perl blog”. I wrote a few link posts about security and privacy, topics I’ve been thinking about quite a lot, but it still felt…off. My last post to theory.pm was nearly two years ago.

I’ve posted little personal writing, either: no politics, photos, travelogues, essays, or anything else. I let Twitter, Instagram, and Facebook fill those gaps.

Lately, though, I’ve had the itch to write my own site again, both to think through technical and cultural issues in the technology business, but also to reclaim a personal space on the net. The recent privacy challenges for the big social media companies finally drove me from their easy embrace back onto the open web. But where to put down my hypertext roots?

My friends, Just a Theory returns

In retrospect, I now realize that my original domain name was just right. It’s, me, just me, but not topic limited. I can post whatever I want, without constraints imposed by attention-limited domains. I decided to rehabilitate it.

Of course I could no longer use the old design. Inspired by the likes of Slashdot, it was boxy, crowded, and 2004-era ugly. I took a few weeks, imported the theory.pm posts into a new Hugo-powered site, and revamped the design from there. I took on the arduous task to import all the original Just a Theory posts, cleaning up typos and fixing images.

The result is the revamped site you now see in your browser. Or perhaps in your RSS reader (The old URLs should have redirected you here). The result is something far better than any of the previous sites:

  • The design emphasizes readability above all. I’ve made it as clean and attractive as I can. The design is my own, and likely full of flaws; don’t hesitate to holler if you spot anything that doesn’t look right.
  • No baggage. The new design uses no JavaScript — no tracking or analytics at all. I’ll never host ads, so I don’t need all the weight of ad-tech. The site is 100% HTML and CSS and nothing else. Only the custom fonts, Source Sans Pro and Source Code Pro, add to the bandwidth.
  • No comments. I’m serious about shedding the baggage. Wading through comment spam wastes valuable writing and family time, while the comment services demand heavy JavaScript and tracking penalties. I generally get very few comments, but if you really want to talk to me, hit me up on Twitter or drop me an email (david at this domain).
  • The imported historical posts have no comments, either, but you can still browse the old design if you need to see them. Each migrated post links to the original, as well.
  • History. Previously, it was impossible to find stuff on Just a Theory. The new design borrows a page from kottke.org to provide links to all the tags, and all tag pages are paginated — as is the home page. Plus, the Archives lists every post and link post on the site, nice and friendly to search engines.
  • Speaking of tags, each has its own RSS feed. If you’re only interested in a particular subject, you can just subscribe its feed. I will never create topic-specific sites again; tagging is so much easier.
  • Identity. Yes, this is really Just a Theory, and you can tell because the TLS certificate proves it. Thanks to CloudFront and Let’s Encrypt for making it a cinch.
  • Scaling. It’s unlikely Just a Theory will be Fireballed again anytime soon, but since I’m using CloudFront for TLS already, this is a no-brainer. Just a Theory should be served from somewhere reasonable close to you.

Punctuated Equilibrium

I plan to write a fair bit over the next few months. I’ve been thinking a lot about security, privacy, and the impact of data privacy regulations like the GDPR on data rights and the technology business in general. I’m happy to once again have a place to write on such topics. I expect to make social posts too, to share what’s going on with friends and family. Before long, I expect to also make photoblog-style posts and perhaps integrate micro-blogging posts.

Let’s find out if I’m as good as my word.

More about…

iovation Tokenization

C’est mois, in the first of a series for the iovation blog:

Given our commitment to responsible data stewardship, as well as the invalidation of Safe Harbor and the advent of the GDPR, we saw an opportunity to reduce these modest but very real risks without impacting the efficacy of our services. A number of methodologies for data protection exist, including encryption, strict access control, and tokenization. We undertook the daunting task to determine which approaches best address data privacy compliance requirements and work best to protect customers and users — without unacceptable impact on service performance, cost to maintain infrastructure, or loss of product usability.

The post covers encryption, access control, and tokenization.

Wanted: New SVN::Notify Maintainer

I’ve used Subversion very occasionally since 2009, and SVN::Notify at all. Over the years, I’ve fixed minor issues with it now and then, and made the a couple of releases to address issues fixed by others. But it’s past the point where I feel qualified to maintain it. Hell, the repository for SVN::Notify has been hosted on GitHub ever since 2011. I don’t have an instance of Subversion against which to test it; nor do I have any SMTP servers to throw test messages at.

In short, it’s past time I relinquished maintenance of this module to someone with a vested interest in its continued use. Is that you? Do you need to keep SVN::Notify running for your projects, and have a few TUITs to fix the occasional bug or security issue? If so, drop me a line (david @ this domain). I’d be happy to transfer the repository.

The Blockchain Hype Cycle

Excerpt from William Mougayar’s new book on TechCrunch:

At its core, the blockchain is a technology that permanently records transactions in a way that cannot be later erased but can only be sequentially updated, in essence keeping a never-ending historical trail. This seemingly simple functional description has gargantuan implications. It is making us rethink the old ways of creating transactions, storing data, and moving assets, and that’s only the beginning.

The blockchain cannot be described just as a revolution. It is a tsunami-like phenomenon, slowly advancing and gradually enveloping everything along its way by the force of its progression. Plainly, it is the second significant overlay on top of the Internet, just as the Web was that first layer back in 1990. That new layer is mostly about trust, so we could call it the trust layer.

What a steaming pile of hype and nonsense. I find it hard to take such revolutionary fervor seriously, as if people forget the Web in the 90s or real estate in 2006. Given that the author is a venture capitalist invested in a blockchain startup, it just feels like a way to try to inflate the value of his investments for short-term gain. A piece like this is snake oil.

Blockchains are inarguably useful tools, like databases or encryption algorithms, and we in the technology business should do our best to understand how they work and figure out the applications for which they make sense. I’m still trying to wrap my mind around blockchains, but one thing I understand very well: they’re not a panacea. The industry overall won’t see true benefits from blockchains for a couple of years, once the practicalities have been worked out and the nonsense has subsided. We should learn and contribute to those practicalities, but as for the hype cycle, for now I just hold my nose.

A Porous “Privacy Shield”

Glyn Moody, in Ars Technica, on the proposed replacement for the recently struck-down Safe Harbor framework:

However, with what seems like extraordinarily bad timing, President Obama has just made winning the trust of EU citizens even harder. As Ars reported last week, the Obama administration is close to allowing the NSA to share more of the private communications it intercepts with other federal agencies, including the FBI and the CIA, without removing identifying information first.

In other words, not only will the new Privacy Shield allow the NSA to continue to scoop up huge quantities of personal data from EU citizens, it may soon be allowed to share them widely. That’s unlikely to go down well with Europeans, the Article 29 Working Party, or the CJEU—all of which ironically increases the likelihood that the new Privacy Shield will suffer the same fate as the Safe Harbour scheme it has been designed to replace.

So let me get this straight. Under this proposal:

  • The NSA can continue to bulk collect EU citizen data.
  • That data may be shared with other agencies in the US government.
  • Said collection must fall under six allowed case, one of which is undefined “counter-terrorism” purposes. No one ever abused that kind of thing before.
  • The US claims there is no more bulk surveillance, except that there is under those six cases.
  • The appointed “independent ombudsman” to address complaints by EU citizens will be a single US Undersecretary of State.
  • Complaints can also be addressed to US companies housing EU citizen data, even though, in the absence of another Snowden-scale whistle-blowing, they may have no idea their data is being surveiled.

Color me skeptical that this would work, let alone not be thrown out by another case similar to the one that killed Safe Harbor.

I have a better idea. How about eliminating mass surveillance?

Do We Have Right to Security?

Rich Mogull:

Don’t be distracted by the technical details. The model of phone, the method of encryption, the detailed description of the specific attack technique, and even the feasibility are all irrelevant.

Don’t be distracted by the legal wrangling. By the timing, the courts, or the laws in question. Nor by politicians, proposed legislation, Snowden, or speeches at think tanks or universities.

Don’t be distracted by who is involved. Apple, the FBI, dead terrorists, or common drug dealers.

Everything, all of it, boils down to a single question.

Do we have a right to security?

How about we introduce a bill guaranteeing a right to security. Senator Wyden?

(Via Daring Fireball)

Anthem Breach Harms Consumers

Paul Roberts in Digital Guardian:

Whether or not harm has occurred to plaintiffs is critical for courts to decide whether the plaintiff has a right – or “standing” – to sue in the first place. But proving that data exposed in a breach has actually been used for fraud is notoriously difficult.

In her decision in the Anthem case, [U.S. District Judge Lucy] Koh reasoned that the theft of personal identification information is harm to consumers in itself, regardless of whether any subsequent misuse of it can be proven. Allegations of a “concrete and imminent threat of future harm” are enough to establish an injury and standing in the early stages of a breach suit, she said.

Seems like a no-brainer to me. Personal information is just that: personal. Organizations that collect and store personal information must take every step they can to protect it. Failure to do so harms their users, exposing them to increased risk of identity theft, fraud, surveillance, and abuse. It’s reasonable to expect that firms not be insulated from litigation for failing to protect user data.

Apple Challenges FBI Decryption Demand

Incredible post from Apple, signed by Tim Cook:

The government is asking Apple to hack our own users and undermine decades of security advancements that protect our customers — including tens of millions of American citizens — from sophisticated hackers and cybercriminals. The same engineers who built strong encryption into the iPhone to protect our users would, ironically, be ordered to weaken those protections and make our users less safe.

We can find no precedent for an American company being forced to expose its customers to a greater risk of attack. For years, cryptologists and national security experts have been warning against weakening encryption. Doing so would hurt only the well-meaning and law-abiding citizens who rely on companies like Apple to protect their data. Criminals and bad actors will still encrypt, using tools that are readily available to them.

I only wish there was a place to co-sign. Companies must do all they can to safeguard the privacy of their users, preferably such only users can unlock and access their personal information. It’s in the interest of the government to ensure that private data remain private. Forcing Apple to crack its own encryption sets a dangerous precedent likely to be exploited by cybercriminals for decades to come. Shame on the FBI.

theory.so is No More

Until last week, I had two newish blogs. This one, theory.pm, was to be my Perl blog. The other one, theory.so, was my database blog. I thought it would be a good idea to have separate blogs for separate audiences, but it turns out I don’t post enough to make much difference. And now, as of last week, I let the theory.so domain expire. Control the .so domain was turned over to Somalia a few months ago, and domain renewal fees went way up. Since I had so few posts over there (14 since August, 2013), I decided it was a good time to just merge it with theory.pm and be done with it.

So my apologies if a bunch of my old posts just showed up in your RSS readers. (You all still use RSS readers, right?). This is a one-time merging of the two blogs, so should not happen again.

Well…maybe. Now I have a total of 25 posts on theory.pm (since July 2013), which is still pretty paltry. I’m thinking it’s silly to have this thing separate from my original blog, Just a Theory, so I might eventually merge that blog, too. Not sure what domain I’ll use for it. Maybe I’ll go back to justatheory.com. Or maybe I’ll use one of the other domains I registered, like the recently added theory.one. Or maybe theory dot something else.

Not that you care. Good on you for reading this far. I would have stopped before now. You’re a better person than I.

Update: 2018-05-23: I merged everything back into Just a Theory last week.

The Watch is You

“iPhone and Apple Watch“

Multiple factors. Photo: Apple.

Back when Apple introduced Touch ID, I had an idea for a blog post, never written, entitled “Touch ID is Step Zero in Apple’s Authentication Plan.” As an ardent user of online services (over 500 passwords in 1Password!), the challenge of passwords frequently frustrates me. Passwords stink. People don’t like them, don’t like the crazy and often pointless complexities piled on them by naïve developers. Worse, many sites employ useless techniques, such as secret images and challenge questions, utterly failing to understand the distinctions between the various factors of authentication.

Touch ID, I thought, was a solid step toward solving these problems. Initially, it would simplify the act of identifying yourself to your iPhone. Long-term, I hoped, it would extend to other applications and online accounts. As late as last last month, I Tweeted my desire to have Touch ID on the MacBook line so I could finally stop mis-typing my password to access my desktop.

Turns out I wasn’t thinking big enough. The next step in Apple’s identity plan wasn’t online logins (though some apps take advantage of it).

It was Apple Pay.

An under-appreciated benefit of Apple Pay is its implementation of multi-factor authentication. The first factor is your PIN — something you know — which you must put into your iPhone when you turn it on. Then, at purchase, you use Touch ID, authenticating with a second factor — something you are. This greatly reduces the chances of identity theft: someone would have to steal your iPhone and both circumvent the PIN and somehow fake your fingerprint in order to use it. Both exploits are notoriously difficult to pull off. An Apple Pay transaction almost certainly cannot be hacked or spoofed.

Crucially, the Apple Watch also offers Apple Pay and requires two factors of authentication. The first is the iPhone with which the Watch is paired — something you have. The second is a passcode input when you put the Watch on — something you know — and you’ll stay “logged in” as long as the Watch remains on your wrist. This is not quite as invulnerable as Touch ID on presentation, but still a powerful indicator of the identity of the customer.

Which brings us back to the issue of authentication. Well, not authentication so much as identity. If the Watch is an effectively low risk means of identifying a credit card owner, why not use it for identification in general? Consider these recent developments:

Let’s take these developments to their logical conclusions. Before long, you’ll be able to use the Watch to:

  • Open your hotel room or rental car without even checking in
  • Control lights when you walk into a room
  • Adjust the car seat and mirrors to your preferred positions
  • Identify yourself when picking up packages at the post office
  • Access and use public transportation
  • And yes, unlock your computer or phone (thanks Glenn)

In the end, the Watch isn’t a gadget. It isn’t (just) jewelry. It’s more than a password or wallet replacement, more than a controller for the devices around you. The Watch is your identification, an ever-present token that represents your presence in the universe.

Effectively, the Watch is you.

This post originally appeared on Medium.

Please Test Pod::Simple 3.29_3

Pod Book

I’ve just pushed Pod-Simple 3.29_v3 to CPAN. Karl Williamson did a lot of hacking on this release, finally adding support for EBCDIC. But as part of that work, and in coordination with Pod::Simple’s original author, Sean Burke, as well as pod-people, we have switched the default encoding from Latin-1 to CP-1252.

On the surface, that might sound like a big change, but in truth, it’s pretty straight-forward. CP-1252 is effectively a superset of Latin-1, repurposing 30 or so unused control characters from Latin-1. Those characters are pretty common on Windows (the home of the CP family of encodings), especially in pastes from Word. It’s nice to be able to pick those up essentially for free.

Still, Karl’s done more than that. He also updated the encoding detection to do a better job at detecting UTF-8. This is the real default. Pod::Simple only falls back on CP1252 if there are no obvious UTF-8 byte sequences in your Pod.

Overall these changes should be a great improvement. Better encoding support is always a good idea. But it is a pretty significant change, including a change to the Pod spec. Hence the test release. Please make sure it works well with your code by installing it today:

cpan D/DW/DWHEELER/Pod-Simple-3.29_3.tar.gz
cpanm DWHEELER/Pod-Simple-3.29_3.tar.gz

Oh, and one last thing: If Pod::Simple fails to properly recognize the encoding in your Pod file, you can always use the =encoding command early in your Pod file to make it explicit:

=encoding CP1254

Build Modern Perl RPMs with rpmcpan

iovation + Perl = Love

We’ve been using the CentOS Perl RPMs at iovation to run all of our Perl applications. This has been somewhat painful, because the version of Perl, 5.10.1, is quite old — it shipped in August 2009. In fact, it consists mostly of bug fixes against Perl 5.10.0, which shipped in December 2007! Many of the modules provided by CentOS core and EPEL are quite old, as well, and we had built up quite the collection of customized module RPMs managed by a massive spaghetti-coded Jenkins job. When we recently ran into a Unicode issue that would best have been addressed by running a more modern Perl — rather than a hinky workaround — I finally sat down and knocked out a way to get a solid set of Modern Perl and related CPAN RPMs.

I gave it the rather boring name rpmcpan, and now you can use it, too. Turns out, DevOps doesn’t myopically insist on using core RPMs in the name of some abstract idea about stability. Rather, we just need a way to easily deploy our stuff as RPMs. If the same applies to your organization, you can get Modern Perl RPMs, too.

Here’s how we do it. We have a new Jenkins job that runs both nightly and whenever the rpmcpan Git repository updates. It uses the MetaCPAN API to build the latest versions of everything we need. Here’s how to get it to build the latest version of Perl, 5.20.1:

./bin/rpmcpan --version 5.20.1

That will get you a nice, modern Perl RPM, named perl520, completely encapsulated in /usr/local/perl520. Want 5.18 instead: Just change the version:

./bin/rpmcpan --version 5.18.2

That will give you perl518. But that’s not all. You want to build CPAN distributions against that version. Easy. Just edit the dists.json file. Its contents are a JSON object where the keys name CPAN distributions (not modules), and the values are objects that customize our RPMs get built. Most of the time, the objects can be empty:

    "Try-Tiny": {}

This results in an RPM named perl520-Try-Tiny (or perl518-Try-Tiny, etc.). Sometimes you might need additional information to customize the CPAN spec file generated to build the distribution. For example, since this is Linux, we need to exclude a Win32 dependency in the Encode-Locale distribution:

    "Encode-Locale": { "exclude_requires": ["Win32::Console"] }

Other distributions might require additional RPMs or environment variables, like DBD-Pg, which requires the PostgreSQL RPMs:

    "DBD-Pg": {
        "build_requires": ["postgresql93-devel", "postgresql93"],
        "environment": { "POSTGRES_HOME": "/usr/pgsql-9.3" }

See the README for a complete list of customization options. Or just get started with our dists.json file, which so far builds the bare minimum we need for one of our Perl apps. Add new distributions? Send a pull request! We’ll be doing so as we integrate more of our Perl apps with a Modern Perl and leave the sad RPM past behind.

More about…

Sqitch on FLOSS Weekly

Yours truly was feature in this week’s episode of FLOSS Weekly, talking about Sqitch. I feel pretty good about this interview, despite continually banging on my legs, the table, and the mic. It’s interesting to try to communicate what Sqitch is about purely by talking.

If it’s enough to get you interested in giving a try, try installing it and using working through one of the tutorials:

More about…

Localize Your Perl Apps with this One Weird Trick

Nota Bene: This is a republication of a [post that originally appeared in the 2013 Perl Advent Calendar.

These days, gettext is far and away the most widely-used localization (l10n) and internationalization (i18n) library for open-source software. So far, it has not been widely used in the Perl community, even though it’s the most flexible, capable, and easy-to use solution, thanks to Locale::TextDomain.1 How easy? Let’s get started!

Module Internationale

First, just use Locale::TextDomain. Say you’re creating an awesome new module, Awesome::Module. These CPAN distribution will be named Awesome-Module, so that’s the “domain” to use for its localizations. Just let Locale::TextDomain know:

use Locale::TextDomain 'Awesome-Module';

Locale::TextDomain will later use this string to look for the appropriate translation catalogs. But don’t worry about that just yet. Instead, start using it to translate user-visible strings in your code. With the assistance of the Locale::TextDomain’s [comprehensive documentation], you’ll find it second nature to internationalize your modules in no time. For example, simple strings are denoted with __:

say __ 'Greetings puny human!';

If you need to specify variables, use __x:

say __x(
   'Thank you {sir}, may I have another?',
   sir => $username,

Need to manage plurals? Use __n:

say __n(
    'I will not buy this record, it is scratched.',
    'I will not buy these records, they are scratched.',

If $num_records is 1, the first phrase will be used. Otherwise the second.

Sometimes you gotta do both, mix variables and plurals. __nx has got you covered there:

say __nx(
    'One item has been grokked.',
    '{count} items have been grokked.',
    count => $num_items,

Congratulations! Your module is now internationalized. Wasn’t that easy? Make a habit of using these functions in all the modules in your distribution, always with the Awesome-Module domain, and you’ll be set.

Encode da Code

Locale::TextDomain is great, but it dates from a time when Perl character encoding was, shall we say, sub-optimal. It therefore took it upon itself to try to do the right thing, which is to to detect the locale from the runtime environment and automatically encode as appropriate. Which might work okay if all you ever do is print localized messages — and never anything else.

If, on the other hand, you will be manipulating localized strings in your code, or emitting unlocalized text (such as that provided by the user or read from a database), then it’s probably best to coerce Locale::TextDomain to return Perl strings, rather than encoded bytes. There’s no formal interface for this in Locale::TextDomain, so we have to hack it a bit: set the $OUTPUT_CHARSET environment variable to “UTF-8” and then bind a filter. Don’t know what that means? Me neither. Just put this code somewhere in your distribution where it will always run early, before anything gets localized:

use Locale::Messages qw(bind_textdomain_filter);
use Encode;
    bind_textdomain_filter 'Awesome-Module' => \&Encode::decode_utf8;

You only have to do this once per domain. So even if you use Locale::TextDomain with the Awesome-Module domain in a bunch of your modules, the presence of this code in a single early-loading module ensures that strings will always be returned as Perl strings by the localization functions.

Environmental Safety

So what about output? There’s one more bit of boilerplate you’ll need to throw in. Or rather, put this into the main package that uses your modules to begin with, such as the command-line script the user invokes to run an application.

First, on the shebang line, follow Tom Christiansen’s advice and put -CAS in it (or set the $PERL_UNICODE environment variable to AS). Then use the POSIX setlocale function to the appropriate locale for the runtime environment. How? Like this:

#!/usr/bin/perl -CAS

use v5.12;
use warnings;
use utf8;
use POSIX qw(setlocale);
    if ($^O eq 'MSWin32') {
        require Win32::Locale;
        setlocale POSIX::LC_ALL, Win32::Locale::get_locale();
    } else {
        setlocale POSIX::LC_ALL, '';

use Awesome::Module;

Locale::TextDomain will notice the locale and select the appropriate translation catalog at runtime.

Is that All There Is?

Now what? Well, you could do nothing. Ship your code and those internationalized phrases will be handled just like any other string in your code.

But what’s the point of that? The real goal is to get these things translated. There are two parts to that process:

  1. Parsing the internationalized strings from your modules and creating language-specific translation catalogs, or “PO files”, for translators to edit. These catalogs should be maintained in your source code repository.

  2. Compiling the PO files into binary files, or “MO files”, and distributing them with your modules. These files should not be maintained in your source code repository.

Until a year ago, there was no Perl-native way to manage these processes. Locale::TextDomain ships with a sample Makefile demonstrating the appropriate use of the GNU gettext command-line tools, but that seemed a steep price for a Perl hacker to pay.

A better fit for the Perl hacker’s brain, I thought, is Dist::Zilla. So I wrote Dist::Zilla::LocaleTextDomain to encapsulate the use of the gettext utiltiies. Here’s how it works.

First, configuring Dist::Zilla to compile localization catalogs for distribution: add these lines to your dist.ini file:


There are configuration attributes for the LocaleTextDomain plugin, such as where to find the PO files and where to put the compiled MO files. In case you didn’t use your distribution name as your localization domain in your modules, for example:

use Locale::TextDomain 'com.example.perl-libawesome';

Then you’d set the textdomain attribute so that the LocaleTextDomain plugin can find the translation catalogs:

textdomain = com.example.perl-libawesome

Check out the configuration docs for details on all available attributes.

At this point, the plugin doesn’t do much, because there are no translation catalogs yet. You might see this line from dzil build, though:

[LocaleTextDomain] Skipping language compilation: directory po does not exist

Let’s give it something to do!

Locale Motion

To add a French translation file, use the msg-init command2:

% dzil msg-init fr
Created po/fr.po.

The msg-init command uses the GNU gettext utilities to scan your Perl source code and initialize the French catalog, po/fr.po. This file is now ready translation! Commit it into your source code repository so your agile-minded French-speaking friends can find it. Use msg-init to create as many language files as you like:

% dzil msg-init de ja.JIS en_US.UTF-8 en_UK.UTF-8
Created po/de.po.
Created po/ja.po.
Created po/en_US.po.
Created po/en_UK.po.

Each language has its on PO file. You can even have region-specific catalogs, such as the en_US and en_UK variants here. Each time a catalog is updated, the changes should be committed to the repository, like code. This allows the latest translations to always be available for compilation and distribution. The output from dzil build now looks something like:

po/fr.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/ja.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/en_US.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/en_UK.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.

The resulting MO files will be in the shared directory of your distribution:

% find Awesome-Module-0.01/share -type f

From here Module::Build or ExtUtils::MakeMaker will install these MO files with the rest of your distribution, right where Locale::TextDomain can find them at runtime. The PO files, on the other hand, won’t be used at all, so you might as well exclude them from the distribution. Add this line to your MANIFEST.SKIP to prevent the po directory and its contents from being included in the distribution:


Mergers and Acquisitions

Of course no code base is static. In all likelihood, you’ll change your code — and end up adding, editing, and removing localizable strings as a result. You’ll need to periodically merge these changes into all of your translation catalogs so that your translators can make the corresponding updates. That’s what the the msg-merge command is for:

% dzil msg-merge
extracting gettext strings
Merging gettext strings into po/de.po
Merging gettext strings into po/en_UK.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po

This command re-scans your Perl code and updates all of the language files. Old messages will be commented-out and new ones added. Commit the changes and give your translators a holler so they can keep the awesome going.

Template Scan

The msg-init and msg-merge commands don’t actually scan your source code. Sort of lied about that. Sorry. What they actually do is merge a template file into the appropriate catalog files. If this template file does not already exist, a temporary one will be created and discarded when the initialization or merging is done.

But projects commonly maintain a permanent template file, stored in the source code repository along with the translation catalogs. For this purpose, we have the msg-scan command. Use it to create or update the template, or POT file:

% dzil msg-scan
extracting gettext strings into po/Awesome-Module.pot

From here on in, the resulting .pot file will be used by msg-init and msg-merge instead of scanning your code all over again. But keep in mind that, if you do maintain a POT file, future merges will be a two-step process: First run msg-scan to update the POT file, then msg-merge to merge its changes into the PO files:

% dzil msg-scan
extracting gettext strings into po/Awesome-Module.pot
% dzil msg-merge
Merging gettext strings into po/de.po
Merging gettext strings into po/en_UK.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po

Lost in Translation

One more thing, a note for translators. They can, of course, also use msg-scan and msg-merge to update the catalogs they’re working on. But how do they test their translations? Easy: use the msg-compile command to compile a single catalog:

% dzil msg-compile po/fr.po
[LocaleTextDomain] po/fr.po: 195 translated messages.

The resulting compiled catalog will be saved to the LocaleData subdirectory of the current directory, so it’s easily available to your app for testing. Just be sure to tell Perl to include the current directory in the search path, and set the $LANGUAGE environment variable for your language. For example, here’s how I test the [Sqitch] French catalog:

% dzil msg-compile po/fr.po              
[LocaleTextDomain] po/fr.po: 148 translated messages, 36 fuzzy translations, 27 untranslated messages.
% LANGUAGE=fr perl -Ilib -CAS -I. bin/sqitch foo
"foo" n'est pas une commande valide

Just be sure to delete the LocaleData directory when you’re done — or at least don’t commit it to the repository.


This may seem like a lot of steps, and it is. But once you have the basics in place — Configuring the Dist::Zilla::LocaleTextDomain plugin, setting up the “textdomain filter”, setting and the locale in the application — there are just a few habits to get into:

  • Use the functions __, __x, __n, and __nx to internationalize user-visible strings
  • Run msg-scan and msg-merge to keep the catalogs up-to-date
  • Keep your translators in the loop.

The Dist::Zilla::LocaleTextDomain plugin will do the rest.

  1. What about Locale::Maketext, you ask? It has not, alas, withsthood the test of time. For details, see Nikolai Prokoschenko’s epic 2009 polemic, “On the state of i18n in Perl.” See also Steffen Winkler’s presentation, Internationalisierungs-Framework auswählen (and the English translation by Aristotle Pagaltzis), from German Perl Workshop 2010.

  2. The msg-init function — like all of the dzil msg-* commands — uses the GNU gettext utilities under the hood. You’ll need a reasonably modern version in your path, or else it won’t work.

Sqitch Goes Vertical

I released Sqitch v0.996 today. Despite the minor version increase, this is a pretty big release. I’m busy knocking out all the stuff I want to get done for 1.0, but the version space is running out, so just a minor version jump from v0.995 to v0.996. But a lot changed. A couple the biggies:

Goodbye Mouse and Moose, Hello Moo

If you’re not a Perl programmer, you probably aren’t familiar with Moose or its derivatives Mouse and Moo. Briefly, it’s an object system. Great interface and features, but freaking *huge*—and slow. Mouse is a lighter version, and when we (mostly) switched to it last year, it yielded a 20-30% speed improvement.

Still wasn’t great, though. So on a day off recently, I switched to Moo, which implements most of Moose but without a lot of the baggage. At first, there wasn’t much difference in performance, but as I profiled it (Devel::NYTProf is indispensable for profiling Perl apps, BTW), I was able to root out all trace of Moose or Mouse, including in CPAN modules Sqitch depends on. The result is around a 40% speedup over what we had before. Honestly, it feels like a new app, it’s so fast. I’m really happy with how it turned out, and to have shed some of the baggage from the code base.

The downside is that package maintainers will need to do some work to get the new dependencies built. Have a look at the RPM spec changes I made to get our internal Sqitch RPMs to build v0.996.

MySQL Password Handling

The handling of MySQL passwords has also been improved. Sqitch now uses the $MYSQL_PWD environment variable if a password is provided in a target. This should simplify authentication when running MySQL change scripts through the mysql client client.

Furthermore, if MySQL::Config is installed, Sqitch will look for passwords in the client and mysql sections of your MySQL configuration files (~/.my.cnf, /etc/my.cnf). This should already happen automatically when executing scripts, but Sqitch now tries to replicate that behavior when connecting to the database via DBI.

Spotting the $MYSQL_PWD commit, Ștefan Suciu updated the Firebird engine to use the $ISC_PASSWORD when running scripts. Awesome.

Vertically Integrated

And finally, another big change: I added support for Vertica, a very nice commercial column-store database that features partitioning and sharding, among other OLAP-style functionality. It was originally forked from PostgreSQL, so it was fairly straight-forward to port, though I did have to borrow a bit from the Oracle and SQLite engines, too. This port was essential for work, as we’re starting to use Vertical more and more, and need ways to manage changes.

If you’re using Vertica, peruse the tutorial to get a feel for what it’s all about. If you want to install it, you can get it from CPAN:

cpan install App::Sqitch BDD::ODBC

Or, if you’re on Homebrew:

brew tap theory/sqitch
brew install sqitch_vertica

Be warned that there’s a minor bug in v0.996, though. Apply this diff to fix it:

 @@ -16,7 +16,7 @@ our $VERSION = '0.996';
 sub key    { 'vertica' }
 sub name   { 'Vertica' }
-sub driver { 'DBD::Pg 2.0' }
+sub driver { 'DBD::ODBC 1.43' }
 sub default_client { 'vsql' }
 has '+destination' => (

That fix will be in the next release, of course, as will support for Vertica 6.

What Next?

I need to focus on some other work stuff for a few weeks, but then I expect to come back to Sqitch again. I’d like to get 1.0 shipped before the end of the year. To that end, next up I will be rationalizing configuration hierarchies to make engine selection and deploy-time configuration more sensible. I hope to get that done by early October.

More about…