Just a Theory

Trans rights are human rights

Posts about Internationalization

Localize Your Perl Apps with this One Weird Trick

Nota Bene: This is a republication of a post that originally appeared in the 2013 Perl Advent Calendar.


These days, gettext is far and away the most widely-used localization (l10n) and internationalization (i18n) library for open-source software. So far, it has not been widely used in the Perl community, even though it’s the most flexible, capable, and easy-to use solution, thanks to Locale::TextDomain.1 How easy? Let’s get started!

Module Internationale

First, just use Locale::TextDomain. Say you’re creating an awesome new module, Awesome::Module. These CPAN distribution will be named Awesome-Module, so that’s the “domain” to use for its localizations. Just let Locale::TextDomain know:

use Locale::TextDomain 'Awesome-Module';

Locale::TextDomain will later use this string to look for the appropriate translation catalogs. But don’t worry about that just yet. Instead, start using it to translate user-visible strings in your code. With the assistance of the Locale::TextDomain’s comprehensive documentation, you’ll find it second nature to internationalize your modules in no time. For example, simple strings are denoted with __:

say __ 'Greetings puny human!';

If you need to specify variables, use __x:

say __x(
   'Thank you {sir}, may I have another?',
   sir => $username,
);

Need to manage plurals? Use __n:

say __n(
    'I will not buy this record, it is scratched.',
    'I will not buy these records, they are scratched.',
    $num_records
);

If $num_records is 1, the first phrase will be used. Otherwise the second.

Sometimes you gotta do both, mix variables and plurals. __nx has got you covered there:

say __nx(
    'One item has been grokked.',
    '{count} items have been grokked.',
    $num_items,
    count => $num_items,
);

Congratulations! Your module is now internationalized. Wasn’t that easy? Make a habit of using these functions in all the modules in your distribution, always with the Awesome-Module domain, and you’ll be set.

Encode da Code

Locale::TextDomain is great, but it dates from a time when Perl character encoding was, shall we say, sub-optimal. It therefore took it upon itself to try to do the right thing, which is to to detect the locale from the runtime environment and automatically encode as appropriate. Which might work okay if all you ever do is print localized messages — and never anything else.

If, on the other hand, you will be manipulating localized strings in your code, or emitting unlocalized text (such as that provided by the user or read from a database), then it’s probably best to coerce Locale::TextDomain to return Perl strings, rather than encoded bytes. There’s no formal interface for this in Locale::TextDomain, so we have to hack it a bit: set the $OUTPUT_CHARSET environment variable to “UTF-8” and then bind a filter. Don’t know what that means? Me neither. Just put this code somewhere in your distribution where it will always run early, before anything gets localized:

use Locale::Messages qw(bind_textdomain_filter);
use Encode;
BEGIN {
    $ENV{OUTPUT_CHARSET} = 'UTF-8';
    bind_textdomain_filter 'Awesome-Module' => \&Encode::decode_utf8, Encode::FB_DEFAULT;
}

You only have to do this once per domain. So even if you use Locale::TextDomain with the Awesome-Module domain in a bunch of your modules, the presence of this code in a single early-loading module ensures that strings will always be returned as Perl strings by the localization functions.

Environmental Safety

So what about output? There’s one more bit of boilerplate you’ll need to throw in. Or rather, put this into the main package that uses your modules to begin with, such as the command-line script the user invokes to run an application.

First, on the shebang line, follow Tom Christiansen’s advice and put -CAS in it (or set the $PERL_UNICODE environment variable to AS). Then use the POSIX setlocale function to the appropriate locale for the runtime environment. How? Like this:

#!/usr/bin/perl -CAS

use v5.12;
use warnings;
use utf8;
use POSIX qw(setlocale);
BEGIN {
    if ($^O eq 'MSWin32') {
        require Win32::Locale;
        setlocale POSIX::LC_ALL, Win32::Locale::get_locale();
    } else {
        setlocale POSIX::LC_ALL, '';
    }
}

use Awesome::Module;

Locale::TextDomain will notice the locale and select the appropriate translation catalog at runtime.

Is that All There Is?

Now what? Well, you could do nothing. Ship your code and those internationalized phrases will be handled just like any other string in your code.

But what’s the point of that? The real goal is to get these things translated. There are two parts to that process:

  1. Parsing the internationalized strings from your modules and creating language-specific translation catalogs, or “PO files”, for translators to edit. These catalogs should be maintained in your source code repository.

  2. Compiling the PO files into binary files, or “MO files”, and distributing them with your modules. These files should not be maintained in your source code repository.

Until a year ago, there was no Perl-native way to manage these processes. Locale::TextDomain ships with a sample Makefile demonstrating the appropriate use of the GNU gettext command-line tools, but that seemed a steep price for a Perl hacker to pay.

A better fit for the Perl hacker’s brain, I thought, is Dist::Zilla. So I wrote Dist::Zilla::LocaleTextDomain to encapsulate the use of the gettext utiltiies. Here’s how it works.

First, configuring Dist::Zilla to compile localization catalogs for distribution: add these lines to your dist.ini file:

[ShareDir]
[LocaleTextDomain]

There are configuration attributes for the LocaleTextDomain plugin, such as where to find the PO files and where to put the compiled MO files. In case you didn’t use your distribution name as your localization domain in your modules, for example:

use Locale::TextDomain 'com.example.perl-libawesome';

Then you’d set the textdomain attribute so that the LocaleTextDomain plugin can find the translation catalogs:

[LocaleTextDomain]
textdomain = com.example.perl-libawesome

Check out the configuration docs for details on all available attributes.

At this point, the plugin doesn’t do much, because there are no translation catalogs yet. You might see this line from dzil build, though:

[LocaleTextDomain] Skipping language compilation: directory po does not exist

Let’s give it something to do!

Locale Motion

To add a French translation file, use the msg-init command:2

% dzil msg-init fr
Created po/fr.po.

The msg-init command uses the GNU gettext utilities to scan your Perl source code and initialize the French catalog, po/fr.po. This file is now ready translation! Commit it into your source code repository so your agile-minded French-speaking friends can find it. Use msg-init to create as many language files as you like:

% dzil msg-init de ja.JIS en_US.UTF-8 en_UK.UTF-8
Created po/de.po.
Created po/ja.po.
Created po/en_US.po.
Created po/en_UK.po.

Each language has its on PO file. You can even have region-specific catalogs, such as the en_US and en_UK variants here. Each time a catalog is updated, the changes should be committed to the repository, like code. This allows the latest translations to always be available for compilation and distribution. The output from dzil build now looks something like:

po/fr.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/ja.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/en_US.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/en_UK.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.

The resulting MO files will be in the shared directory of your distribution:

% find Awesome-Module-0.01/share -type f
Awesome-Module-0.01/share/LocaleData/de/LC_MESSAGES/Awesome-Module.mo
Awesome-Module-0.01/share/LocaleData/en_UK/LC_MESSAGES/Awesome-Module.mo
Awesome-Module-0.01/share/LocaleData/en_US/LC_MESSAGES/Awesome-Module.mo
Awesome-Module-0.01/share/LocaleData/ja/LC_MESSAGES/Awesome-Module.mo

From here Module::Build or ExtUtils::MakeMaker will install these MO files with the rest of your distribution, right where Locale::TextDomain can find them at runtime. The PO files, on the other hand, won’t be used at all, so you might as well exclude them from the distribution. Add this line to your MANIFEST.SKIP to prevent the po directory and its contents from being included in the distribution:

^po/

Mergers and Acquisitions

Of course no code base is static. In all likelihood, you’ll change your code — and end up adding, editing, and removing localizable strings as a result. You’ll need to periodically merge these changes into all of your translation catalogs so that your translators can make the corresponding updates. That’s what the the msg-merge command is for:

% dzil msg-merge
extracting gettext strings
Merging gettext strings into po/de.po
Merging gettext strings into po/en_UK.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po

This command re-scans your Perl code and updates all of the language files. Old messages will be commented-out and new ones added. Commit the changes and give your translators a holler so they can keep the awesome going.

Template Scan

The msg-init and msg-merge commands don’t actually scan your source code. Sort of lied about that. Sorry. What they actually do is merge a template file into the appropriate catalog files. If this template file does not already exist, a temporary one will be created and discarded when the initialization or merging is done.

But projects commonly maintain a permanent template file, stored in the source code repository along with the translation catalogs. For this purpose, we have the msg-scan command. Use it to create or update the template, or POT file:

% dzil msg-scan
extracting gettext strings into po/Awesome-Module.pot

From here on in, the resulting .pot file will be used by msg-init and msg-merge instead of scanning your code all over again. But keep in mind that, if you do maintain a POT file, future merges will be a two-step process: First run msg-scan to update the POT file, then msg-merge to merge its changes into the PO files:

% dzil msg-scan
extracting gettext strings into po/Awesome-Module.pot
% dzil msg-merge
Merging gettext strings into po/de.po
Merging gettext strings into po/en_UK.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po

Lost in Translation

One more thing, a note for translators. They can, of course, also use msg-scan and msg-merge to update the catalogs they’re working on. But how do they test their translations? Easy: use the msg-compile command to compile a single catalog:

% dzil msg-compile po/fr.po
[LocaleTextDomain] po/fr.po: 195 translated messages.

The resulting compiled catalog will be saved to the LocaleData subdirectory of the current directory, so it’s easily available to your app for testing. Just be sure to tell Perl to include the current directory in the search path, and set the $LC_ALL environment variable for your language. For example, here’s how I test the Sqitch French catalog:

% dzil msg-compile po/fr.po              
[LocaleTextDomain] po/fr.po: 148 translated messages, 36 fuzzy translations, 27 untranslated messages.
% LC_ALL=fr perl -Ilib -CAS -I. bin/sqitch foo
"foo" n'est pas une commande valide

Just be sure to delete the LocaleData directory when you’re done — or at least don’t commit it to the repository.

TL;DR

This may seem like a lot of steps, and it is. But once you have the basics in place — Configuring the Dist::Zilla::LocaleTextDomain plugin, setting up the “textdomain filter”, setting and the locale in the application — there are just a few habits to get into:

  • Use the functions __, __x, __n, and __nx to internationalize user-visible strings
  • Run msg-scan and msg-merge to keep the catalogs up-to-date
  • Keep your translators in the loop.

The Dist::Zilla::LocaleTextDomain plugin will do the rest.

Update 2019-01-31: Added required malformed data configuration argument to the call to bind_textdomain_filter required by this change in Encode v2.99.


  1. What about Locale::Maketext, you ask? It has not, alas, withsthood the test of time. For details, see Nikolai Prokoschenko’s epic 2009 polemic, “On the state of i18n in Perl.” See also Steffen Winkler’s presentation, Internationalisierungs-Framework auswählen (and the English translation by Aristotle Pagaltzis), from German Perl Workshop 2010↩︎

  2. The msg-init function — like all of the dzil msg-* commands — uses the GNU gettext utilities under the hood. You’ll need a reasonably modern version in your path, or else it won’t work. ↩︎

Dist::Zilla::LocaleTextDomain for Translators

Here’s a followup on my post about localizing Perl modules with Locale::TextDomain. Dist::Zilla::LocaleTextDomain was great for developers, less so for translators. A Sqitch translator asked how to test the translation file he was working on. My only reply was to compile the whole module, then install it and test it. Ugh.

Today, I released Dist::Zilla::LocaleTextDomain v0.85 with a new command, msg-compile. This command allows translators to easily compile just the file they’re working on and nothing else. For pure Perl modules in particular, it’s pretty easy to test then. By default, the compiled catalog goes into ./LocaleData, where convincing the module to find it is simple. For example, I updated the test sqitch app to take advantage of this. Now, to test, say, the French translation file, all the translator has to do is:

> dzil msg-compile po/fr.po
[LocaleTextDomain] po/fr.po: 155 translated messages, 24 fuzzy translations, 16 untranslated messages.

> LANGUAGE=fr ./t/sqitch foo
"foo" n'est pas une commande valide

I hope this simplifies things for translators. See the notes for translators for a few more words on the subject.

Looking for the comments? Try the old layout.

Localize Your Perl modules with Locale::TextDomain and Dist::Zilla

I’ve just released Dist::Zilla::LocaleTextDomain v0.80 to the CPAN. This module adds support for managing Locale::TextDomain-based localization and internationalization in your CPAN libraries. I wanted to make it as simple as possible for CPAN developers to do localization and to support translators in their projects, and Dist::Zilla seemed like the perfect place to do it, since it has hooks to generate the necessary binary files for distribution.

Starting out with Locale::TextDomain was decidedly non-intuitive for me, as a Perl hacker, likely because of its gettext underpinnings. Now that I’ve got a grip on it and created the Dist::Zilla support, I think it’s pretty straight-forward. To demonstrate, I wrote the following brief tutorial, which constitutes the main documentation for the Dist::Zilla::LocaleTextDomain distribution. I hope it makes it easier for you to get started localizing your Perl libraries.

Localize Your Perl modules with Locale::TextDomain and Dist::Zilla

Locale::TextDomain provides a nice interface for localizing your Perl applications. The tools for managing translations, however, is a bit arcane. Fortunately, you can just use this plugin and get all the tools you need to scan your Perl libraries for localizable strings, create a language template, and initialize translation files and keep them up-to-date. All this is assuming that your system has the gettext utilities installed.

The Details

I put off learning how to use Locale::TextDomain for quite a while because, while the gettext tools are great for translators, the tools for the developer were a little more opaque, especially for Perlers used to Locale::Maketext. But I put in the effort while hacking Sqitch. As I had hoped, using it in my code was easy. Using it for my distribution was harder, so I decided to write Dist::Zilla::LocaleTextDomain to make life simpler for developers who manage their distributions with Dist::Zilla.

What follows is a quick tutorial on using Locale::TextDomain in your code and managing it with Dist::Zilla::LocaleTextDomain.

This is my domain

First thing to do is to start using Locale::TextDomain in your code. Load it into each module with the name of your distribution, as set by the name attribute in your dist.ini file. For example, if your dist.ini looks something like this:

name    = My-GreatApp
author  = Homer Simpson <homer@example.com>
license = Perl_5

Then, in you Perl libraries, load Locale::TextDomain like this:

use Locale::TextDomain qw(My-GreatApp);

Locale::TextDomain uses this value to find localization catalogs, so naturally Dist::Zilla::LocaleTextDomain will use it to put those catalogs in the right place.

Okay, so it’s loaded, how do you use it? The documentation of the Locale::TextDomain exported functions is quite comprehensive, and I think you’ll find it pretty simple once you get used to it. For example, simple strings are denoted with __:

say __ 'Hello';

If you need to specify variables, use __x:

say __x(
    'You selected the color {color}',
    color => $color
);

Need to deal with plurals? Use __n

say __n(
    'One file has been deleted',
    'All files have been deleted',
    $num_files,
);

And then you can mix variables with plurals with __nx:

say __nx(
    'One file has been deleted.',
    '{count} files have been deleted.',
    $num_files,
    count => $num_files,
);

Pretty simple, right? Get to know these functions, and just make it a habit to use them in user-visible messages in your code. Even if you never expect to translate those messages, just by doing this you make it easier for someone else to come along and start translating for you.

The setup

Now you’re localizing your code. Great! What’s next? Officially, nothing. If you never do anything else, your code will always emit the messages as written. You can ship it and things will work just as if you had never done any localization.

But what’s the fun in that? Let’s set things up so that translation catalogs will be built and distributed once they’re written. Add these lines to your dist.ini:

[ShareDir]
[LocaleTextDomain]

There are actually quite a few attributes you can set here to tell the plugin where to find language files and where to put them. For example, if you used a domain different from your distribution name, e.g.,

use Locale::TextDomain 'com.example.My-GreatApp';

Then you would need to set the textdomain attribute so that the LocaleTextDomain does the right thing with the language files:

[LocaleTextDomain]
textdomain = com.example.My-GreatApp

Consult the LocaleTextDomain configuration docs for details on all available attributes.

(Special note until this Locale::TextDomain patch is merged: set the share_dir attribute to lib instead of the default value, share. If you use Module::Build, you will also need a subclass to do the right thing with the catalog files; see “Installation” in Dist::Zilla::Plugin::LocaleTextDomain for details.)

What does this do including the plugin do? Mostly nothing. You might see this line from dzil build, though:

[LocaleTextDomain] Skipping language compilation: directory po does not exist

Now at least you know it was looking for something to compile for distribution. Let’s give it something to find.

Initialize languages

To add translation files, use the msg-init command:

> dzil msg-init de
Created po/de.po.

At this point, the gettext utilities will need to be installed and visible in your path, or else you’ll get errors.

This command scans all of the Perl modules gathered by Dist::Zilla and initializes a German translation file, named po/de.po. This file is now ready for your German-speaking public to start translating. Check it into your source code repository so they can find it. Create as many language files as you like:

> dzil msg-init fr ja.JIS en_US.UTF-8
Created po/fr.po.
Created po/ja.po.
Created po/en_US.po.

As you can see, each language results in the generation of the appropriate file in the po directory, sans encoding (i.e., no .UTF-8 in the en_US file name).

Now let your translators go wild with all the languages they speak, as well as the regional dialects. (Don’t forget to colour your code with en_UK translations!)

Once you have translations and they’re committed to your repository, when you build your distribution, the language files will automatically be compiled into binary catalogs. You’ll see this line output from dzil build:

[LocaleTextDomain] Compiling language files in po
po/fr.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/ja.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/en_US.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.

You’ll then find the catalogs in the shared directory of your distribution:

> find My-GreatApp-0.01/share -type f
My-GreatApp-0.01/share/LocaleData/de/LC_MESSAGES/App-Sqitch.mo
My-GreatApp-0.01/share/LocaleData/en_US/LC_MESSAGES/App-Sqitch.mo
My-GreatApp-0.01/share/LocaleData/ja/LC_MESSAGES/App-Sqitch.mo

These binary catalogs will be installed as part of the distribution just where Locale::TextDomain can find them.

Here’s an optional tweak: add this line to your MANIFEST.SKIP:

^po/

This prevents the po directory and its contents from being included in the distribution. Sure, you can include them if you like, but they’re not required for the running of your app; the generated binary catalog files are all you need. Might as well leave out the translation files.

Mergers and acquisitions

You’ve got translation files and helpful translators given them a workover. What happens when you change your code, add new messages, or modify existing ones? The translation files need to periodically be updated with those changes, so that your translators can deal with them. We got you covered with the msg-merge command:

> dzil msg-merge
extracting gettext strings
Merging gettext strings into po/de.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po

This will scan your module files again and update all of the translation files with any changes. Old messages will be commented-out and new ones added. Just commit the changes to your repository and notify the translation army that they’ve got more work to do.

If for some reason you need to update only a subset of language files, you can simply list them on the command-line:

> dzil msg-merge po/de.po po/en_US.po
Merging gettext strings into po/de.po
Merging gettext strings into po/en_US.po

What’s the scan, man

Both the msg-init and msg-merge commands depend on a translation template file to create and merge language files. Thus far, this has been invisible: they will create a temporary template file to do their work, and then delete it when they’re done.

However, it’s common to also store the template file in your repository and to manage it directly, rather than implicitly. If you’d like to do this, the msg-scan command will scan the Perl module files gathered by Dist::Zilla and make it for you:

> dzil msg-scan
gettext strings into po/My-GreatApp.pot

The resulting .pot file will then be used by msg-init and msg-merge rather than scanning your code all over again. This actually then makes msg-merge a two-step process: You need to update the template before merging. Updating the template is done by exactly the same command, msg-scan:

> dzil msg-scan
extracting gettext strings into po/My-GreatApp.pot
> dzil msg-merge
Merging gettext strings into po/de.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po

Ship It!

And that’s all there is to it. Go forth and localize and internationalize your Perl apps!

Acknowledgements

My thanks to Ricardo Signes for invaluable help plugging in to Dist::Zilla, to Guido Flohr for providing feedback on this tutorial and being open to my pull requests, to David Golden for I/O capturing help, and to Jérôme Quelin for his patience as I wrote code to do the same thing as Dist::Zilla::Plugin::LocaleMsgfmt without ever noticing that it already existed.

Looking for the comments? Try the old layout.

Sqitch Symbolism

It has been a while since I last blogged about Sqitch. The silence is in part due to the fact that I’ve moved from full-time Sqitch development to actually putting it to use building databases at work. This is exciting, because it needs the real-world experience to grow up.

That’s not to say that nothing has happened with Sqitch. I’ve just released v0.931 which includes a bunch of improvement since I wrote about v0.90. First a couple of the minor things:

  • Sqitch now checks dependencies before reverting, and dies if they would be broken by the revert. This change, introduced in v0.91, required that the dependencies be moved to their own table, so if you’ve been messing with an earlier version of Sqitch, you’ll have to rebuild the database. Sorry about that.
  • I fixed a bunch of Windows-related issues, including finding the current user’s full name, correctly setting the locale for displaying dates and times, executing shell commands, and passing tests. The awesome ActiveState PPM Index has been invaluable in tracking these issues down.
  • Added the bundle command. All it does is copy your project configuration file, plan, and deploy, revert, and test scripts to a directory you identify. The purpose is to be able to export the project into a directory structure suitable for distribution in a tarball, RPM, or whatever. That my not sound incredibly useful, since copying files is no big deal. However, the long-term plan is to add VCS support to Sqitch, which would entail fetching scripts from various places in VCS history. At that point, it will be essential to use bundle to do the export, so that scripts are properly exported from the VCS history. That said, I’m actually using it already to build RPMs. Useful already!

Symbolic References

And now the more immediately useful changes. First, I added new symbolic tags, @FIRST and @LAST. These represent the first and last changes currently deployed to a database, respectively. These complement the existing @ROOT and @HEAD symbolic tags, which represent the first and last changes listed in the plan. The distinction is important: The change plan vs actual deployments to a database.

The addition of @FIRST and @LAST may not sounds like much, but there’s more.

I also added forward and reverse change reference modifiers ^ and ~. The basic idea was stolen from Git Revisions, though the semantics vary. For Sqitch changes, ^ appended to a name or tag means “the change before this change,” while ~ means “the change after this change”. I find ^ most useful when doing development, where I’m constantly deploying and reverting a change as I work. Here’s how I do that revert:

sqitch revert --to @LAST^

That means “revert to the change before the last change”, or simply “revert the last change”. If I want to revert two changes, I use two ^s:

sqitch revert --to @LAST^^

To go back any further, I need to use an integer with the ^. Here’s how to revert the last four changes deployed to the database:

sqitch revert --to @LAST^4

The cool thing about this is that I don’t have to remember the name of the change to revert, as was previously required. And of course, if I just wanted to deploy two changes since the last deployment, I would use ~~:

sqitch deploy --to @LAST~~

Nice, right? One thing to bear in mind, as I was reminded while giving a [Sqitch presentation][slides] to PDXPUG: Changes are deployed in a sequence. You can think of them as a linked list. So this command:

sqitch revert @LAST^^

Does not mean to revert the second-to-last change, leaving the two after it. It will revert the last change and the penultimate change. This is why I actually encourage the use of the --to option, to emphasize that you’re deploying or reverting all changes to the named point, rather than deploying or reverting the named point in isolation. Sqitch simply doesn’t do that.

Internationalize Me

One more change. With today’s release of v0.931, there is now proper internationalization support in Sqitch. The code has been localized for a long time, but there was no infrastructure for internationalizing. Now there is, and I’ve stubbed out files for translating Sqitch messages into French and German. Adding others is easy.

If you’re interested in translating Sqitch’s messages (only 163 of them, should be quick!), just fork Sqitch, juice up your favorite gettext editor, and start editing. Let me know if you need a language file generated; I’ve built the tools to do it easily with dzil, but haven’t released them yet. Look for a post about that later in the week.

Presentation

Oh, and that PDXPUG presentation? Here are the slides (also for download and on Slideshare). Enjoy!

Looking for the comments? Try the old layout.