A couple months ago, RJBS and I collaborated on adding a new feature to Pod: sane URL links. For, well, ever, the case has been that to link to URLs or any other scheme: links in Pod, You had to do something like this:

For more information, consult the pgTAP documentation:
L<http://pgtap.projects.postgresql.org/documentation.html>

The reasons why you couldn’t include text in the link to server as the link text has never been really well spelled-out. Sean Burke, the most recent author of the Pod spec, had only said that the support wasn’t there "for various reasons."

Meanwhile, I accidentally discovered that Pod::Simple has in fact supported such formats for a long time. At some point Sean added it, but didn’t update the spec. Maybe he thought it was fragile. I have no idea. But since the support was already there, and most of the other Pod tools already support it or want to, it was a simple change to make to the spec, and it was released in Perl 5.11.3 and Pod::Simple 3.11. It’s now officially a part of the spec. The above Pod can now be written as:

For more information, consult the L<pgTAP
documentation|http://pgtap.projects.postgresql.org/documentation.html>.

So much better! And to show it off, I’ve just updated all the links in SVN::Notify and released a new version. Check it out on CPAN Search. See how the links such as to "HookStart.exe" and "Windows Subversion + Apache + TortoiseSVN + SVN::Notify HOWTO" are nice links? They no longer use the URL for the link text. Contrast with the previous version.

And as of yesterday, the last piece to allow this went into place. Andy gave me maintenance of Test::Pod, and I immediately released a new version to allow the new syntax. So update your t/pod.t file to require Test::Pod 1.41, update your links, and celebrate the arrival of sane links in Pod documentation.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

SQL Hack: The Something-est From Each Entity

This is a pattern that I have dealt with many times, but never figured out how to adequately handle. Say that you have imported a mailbox into your database, and you want a list of the latest messages between each pair of recipients (sender and receiver — I’m ignoring multiple receivers for the moment). The data might look like this:

BEGIN;

CREATE TABLE messages (
   sender   TEXT        NOT NULL,
   receiver TEXT        NOT NULL,
   sent_at  TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
   body     TEXT        NOT NULL DEFAULT ''
);

INSERT INTO messages ( sender, receiver, body )
VALUES ('Theory', 'Strongrrl', 'Hi There.' );

INSERT INTO messages ( sender, receiver, body )
VALUES ('Strongrrl', 'Theory', 'Hi yourself.' );

INSERT INTO messages ( sender, receiver, body )
VALUES ('Anna', 'Theory', 'What''s for dinner?' );

INSERT INTO messages ( sender, receiver, body )
VALUES ('Theory', 'Anna', 'Brussels Sprouts.' );

INSERT INTO messages ( sender, receiver, body )
VALUES ('Anna', 'Theory', 'Oh man!' );

COMMIT;

So the goal is to show the most recent message between Theory and Strongrrl and the most recent message between Theory and Anna, without regard to who is the sender and who is the receiver. After running into this many times, today I consulted my colleagues, showing them this dead simple (and wrong!) query to demonstrate what I wanted:

SELECT sender, recipient, sent_at, body
  FROM messages
 GROUP BY sender, recipient
HAVING sent_at = max(sent_at);

That’s wrong because one can’t have columns in the SELECT expression that are not either aggregate expressions or included in theGROUP BY expression. It’s a violation of the standard (and prone to errors, I suspect). Andrew immediately said, “Classic case for DISTINCT ON”. This lovely little expression is a PostgreSQL extension not included in the SQL standard. It’s implementation looks like this:

SELECT DISTINCT ON (
          CASE WHEN receiver > sender
              THEN receiver || sender
              ELSE sender   || receiver
          END
       ) sender, receiver, sent_at, body
  FROM messages
 ORDER BY CASE WHEN receiver > sender
              THEN receiver || sender
              ELSE sender   || receiver
          END, sent_at DESC;

This query is saying, “fetch the rows where the sender and the receiver are distinct, and order by sent_at DESC. THE CASE statement to get a uniform value for the combination of sender and receiver is a bit unfortunate, but it does the trick:

  sender   | receiver |            sent_at            |     body     
-----------+----------+-------------------------------+--------------
 Anna      | Theory   | 2010-01-12 05:00:07.026711+00 | Oh man!
 Strongrrl | Theory   | 2010-01-12 05:00:07.02589+00  | Hi yourself.

Great, exactly the data I wanted. And the CASE statement can actually be indexed to speed up filtering. But I wondered if it would be possible to get the same results without the DISTINCT ON. In other words, can this be done with standard SQL? If you’re using PostgreSQL 8.4, the answer is “yes.” All you have to do is exploit window functions and a subquery. It looks like this:

SELECT sender, receiver, sent_at, body
  FROM (
    SELECT sender, receiver, sent_at, body,
           row_number() OVER ( PARTITION BY 
               CASE WHEN receiver > sender
                   THEN receiver || sender
                   ELSE sender   || receiver
               END
               ORDER BY sent_at DESC
           ) AS rnum
      FROM messages
  ) AS t
 WHERE rnum = 1;

Same nasty CASE statement as before (no way around it with this database design, alas), but this is fully conforming SQL. It’s also the first time I’ve ever used window functions. If you just focus on the row_number() OVER () expression, it’s simply partitioning the table according to the same value as in the DISTINCT ON value, but it’s ordering it by sent_at directly. The result is a row number, where the first is 1 for the most recent message for each combination of recipients. Then we just filter for that in the WHERE clause.

Not exactly intuitive (I’m really only understanding it now as I explain write it out), but quite straight-forward once you accept the expressivity in this particular OVER expression. It might be easier to understand if we remove some of the cruft. If instead we wanted the most recent message from each sender (regardless of the recipient), we’d write:

SELECT sender, receiver, sent_at, body
  FROM (
    SELECT sender, receiver, sent_at, body,
           row_number() OVER (
               PARTITION BY sender ORDER BY sent_at DESC
           ) AS rnum
      FROM messages
  ) AS t
 WHERE rnum = 1;

And that yields:

  sender   | receiver |            sent_at            |     body     
-----------+----------+-------------------------------+--------------
 Anna      | Theory   | 2010-01-12 05:00:07.026711+00 | Oh man!
 Strongrrl | Theory   | 2010-01-12 05:00:07.02589+00  | Hi yourself.
 Theory    | Anna     | 2010-01-12 05:00:07.24982+00  | Brussels Sprouts.

Furthermore, we can use a common table expression to eliminate the subquery. This query is functionally identical to the subquery example (returning to uniqueness for sender and receiver), just with the WITH clause coming before the SELECT clause, setting things up for it:

WITH t AS (
    SELECT sender, receiver, sent_at, body,
           row_number() OVER (PARTITION BY CASE
               WHEN receiver > sender
                   THEN receiver || sender
                   ELSE sender   || receiver
                   END
               ORDER BY sent_at DESC
           ) AS rnum
      FROM messages
) SELECT sender, receiver, sent_at, body
    FROM t
   WHERE rnum = 1;

So it’s kind of like putting the subquery first, only it’s not a subquery, it’s more like a temporary view. Nice, eh? Either way, the results are the same as before:

  sender   | receiver |            sent_at            |     body     
-----------+----------+-------------------------------+--------------
 Anna      | Theory   | 2010-01-12 05:00:07.026711+00 | Oh man!
 Strongrrl | Theory   | 2010-01-12 05:00:07.02589+00  | Hi yourself.

I hereby dub this “The Entity’s Something-est” pattern (I’m certain someone else has already come up with a good name for it, but this will do). I can see it working any place requiring the highest, lowest, latest, earliest, or something else-est item from each of a list of entities. Perhaps the latest headline from every news source:

WITH t AS (
    SELECT source, headline, dateline, row_number() OVER (
               PARTITION BY source ORDER BY dateline DESC
           ) AS rnum
      FROM news
) SELECT source, headline, dateline
    FROM t
   WHERE rnum = 1;

Or perhaps the lowest score for for each basketball team over the course of a season:

WITH t AS (
    SELECT team, date, score, row_number() OVER (
               PARTITION BY team ORDER BY score
           ) AS rnum
      FROM games
) SELECT team, date, score
    FROM t
   WHERE rnum = 1;

Easy! How have you handled a situation like this in your database hacking?

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

RFC: PostgreSQL Add-on Network

I’ve posted a plan to implement PGAN, a CPAN for PostgreSQL extensions. I’ve tried to closely follow the CPAN philosophy to come up with a plan that requires a minimum-work implementation that builds on the existing PostgreSQL tools and the examples of the CPAN and JSAN. My hope is that it’s full of JFDI! I would be very grateful for feedback and suggestions.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Enforcing a Set of Values

Enumerate Me

I love enums. They’re a terrific way to quickly create self-documenting data types that represent a set of values, and the nice thing is that the underlying values are stored as integers, making them very space- and performance-efficient. A typical example might be a workflow approval process for publishing magazine articles. You create it like so:

CREATE TYPE article_states AS ENUM (
    'draft', 'copy', 'approved', 'published'
);

Nice: we now have a simple data type that’s self-documenting. An an important feature of enums is that the ordering of values is the same as the declared labels. For a workflow such as this, it makes a lot of sense, because the workflow states are inherently ordered: “draft” comes before “copy” and so on.

Unfortunately, enums aren’t a panacea. I would use them all over the place if I could, but, alas, the value-set data types I tend to need tend not to have inherently ordered values other than the collation order of the text. For example, say that we need a table describing people’s faces. Using an enum to manage eye colors might look something like this:

CREATE TYPE eye_color AS ENUM ( 'blue', 'green', 'brown' );

CREATE TABLE faces (
    face_id SERIAL PRIMARY KEY,
    name TEXT NOT NULL DEFAULT '',
    eye_color eye_color NOT NULL
);

Nice, huh? So let’s insert a few values and see what it looks like:

INSERT INTO faces (name, eye_color)
VALUES ('David', 'blue' ),
       ('Julie', 'green' ),
       ('Anna', 'blue' ),
       ('Noriko', 'brown' )
;

So let’s look at the data ordered by the enum:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
  name  | eye_color 
--------+-----------
 David  | blue
 Anna   | blue
 Julie  | green
 Noriko | brown

Hrm. That’s not good. I forgot to put “green” after “brown” when I created the enum. Oh, and I forgot the color “hazel”:

% INSERT INTO faces (name, eye_color) VALUES ('Kat', 'hazel' );
ERROR:  invalid input value for enum eye_color: "hazel"

Well, nice to know that it’s enforced, and that message is really helpful. But the real problem is that we run into the inherent ordering of enum labels, and now we need to adjust the enum to meet our needs. Here’s how to do it:

ALTER TABLE faces RENAME eye_color TO eye_color_tmp;
ALTER TABLE faces ALTER eye_color_tmp TYPE TEXT;
DROP TYPE eye_color;
CREATE TYPE eye_color AS ENUM ( ‘blue’, ‘brown’, ‘green’, ‘hazel’ );
ALTER TABLE faces ADD eye_color eye_color;
UPDATE faces SET eye_color = eye_color_tmp::eye_color;
ALTER TABLE faces ALTER eye_color SET NOT NULL;
ALTER TABLE faces DROP column eye_color_tmp;

Yikes! I have to rename the column, change its type to TEXT, drop the enum, create a new enum, and then copy all of the data into the new column before finally dropping the old column. If I have a lot of data, this will not be very efficient, requiring that every single row be rewritten. Still, it does work:

% INSERT INTO faces (name, eye_color) VALUES ('Kat', 'hazel' );
% SELECT name, eye_color FROM faces ORDER BY eye_color;
  name  | eye_color 
--------+-----------
 David  | blue
 Anna   | blue
 Noriko | brown
 Julie  | green
 Kat    | hazel

The upshot is that enums are terrific if you have a very well-defined set of values that are inherently ordered (or where order is not important) and that are extremely unlikely to change. Perhaps someday PostgreSQL will have a more robust ALTER TYPE that allows enums to be more efficiently reorganized, but even then it seems likely that re-ordering values will require a table rewrite.

Lookup to Me

Another approach to handling a type as a set of values is to take advantage of the relational model and create store the values in a table. Going with the faces example, it looks like this:

CREATE TABLE eye_colors (
    eye_color TEXT PRIMARY KEY
);

INSERT INTO  eye_colors VALUES( 'blue' ), ('green'), ('brown' );

CREATE TABLE faces (
    face_id SERIAL PRIMARY KEY,
    name TEXT NOT NULL DEFAULT '',
    eye_color TEXT REFERENCES eye_colors(eye_color)
);

We can use this table much as we did before:

INSERT INTO faces (name, eye_color)
VALUES ('David', 'blue' ),
       ('Julie', 'green' ),
       ('Anna', 'blue' ),
       ('Noriko', 'brown' )
;

And of course we can get the rows back properly ordered by eye_color, unlike the original enum example:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
  name  | eye_color 
----+------
 David  | blue
 Anna   | blue
 Noriko | brown
 Julie  | green

Cool! But there are a couple of downsides. One is that you’re adding a bit of I/O overhead to every update. Most likely you won’t have very many values in the eye_colors table, so given PostgreSQL’s caching, this isn’t a big deal. A bigger deal is error handling:

INSERT INTO eye_colors VALUES ('hazel');
ERROR:  insert or update on table "faces" violates foreign key constraint "faces_eye_color_fkey"

That’s not an incredibly useful error message. One might ask, without knowing the schema, what has an eye color has to do with a foreign key constraint? At least looking at the tables can tell you a bit more:

% \dt
          List of relations
 Schema |    Name    | Type  | Owner 
----+------+----+----
 public | eye_colors | table | david
 public | faces      | table | david

A quick look at the eye_colors table will tell you what’s going on, and you can figure out that you just need to add a new row:

INSERT INTO eye_colors VALUES ('hazel');
INSERT INTO faces (name, eye_color) VALUES ('Kat', 'hazel' );

So it is self-documenting, but unlike enums it doesn’t do a great job of it. Plus if you have a bunch of set-constrained value types, you can end up with a whole slew of lookup tables. This can make it harder to sort the important tables that contain actual business data from those that are just lookup tables, because there is nothing inherent in them to tell the difference. You could put them into a separate schema, of course, but still, it’s not exactly intuitive.

Given these downsides, I’m not a big fan of using lookup tables for managing what is in fact a simple list of allowed values for a particular column unless those values change frequently. So what else can we do?

Constrain Me

A third approach is to use a table constraint, like so:

CREATE TABLE faces (
    face_id SERIAL PRIMARY KEY,
    name TEXT NOT NULL DEFAULT '',
    eye_color TEXT NOT NULL,
    CONSTRAINT valid_eye_colors CHECK (
        eye_color IN ( 'blue', 'green', 'brown' )
    )
);

No lookup table, no inherent ENUM ordering. And in regular usage it works just like the lookup table example. The usual INSERT and SELECT once again yields:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
  name  | eye_color 
----+------
 David  | blue
 Anna   | blue
 Noriko | brown
 Julie  | green

The error message, however, is a bit more helpful:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
ERROR:  new row for relation "faces" violates check constraint "valid_eye_colors"

A check constraint violation on eye_color is much more informative than a foreign key constraint violation. The downside to a check constraint, however, is that it’s not as self-documenting. You have to look at the entire table in order to find the constraint:

% \d faces
                             Table "public.faces"
  Column   |  Type   |                        Modifiers                        
------+-----+-----------------------------
 face_id   | integer | not null default nextval('faces_face_id_seq'::regclass)
 name      | text    | not null default ''::text
 eye_color | text    | not null
Indexes:
    "faces_pkey" PRIMARY KEY, btree (face_id)
Check constraints:
    "valid_eye_colors" CHECK (eye_color = ANY (ARRAY['blue', 'green', 'brown']))

There it is at the bottom. Kind of tucked away there, eh? At least now we can change it. Here’s how:

ALTER TABLE faces DROP CONSTRAINT valid_eye_colors;
ALTER TABLE faces ADD CONSTRAINT valid_eye_colors CHECK (
    eye_color IN ( 'blue', 'green', 'brown', 'hazel' )
);

Not as straight-forward as updating the lookup table, and much less efficient (because PostgreSQL must validate that existing rows don’t violate the constraint before committing the constraint). But it’s pretty simple and at least doesn’t require the entire table be UPDATEd as with enums. For occasional changes to the value list, a table scan is not a bad tradeoff. And of course, once that’s done, it just works:

INSERT INTO eye_colors VALUES ('hazel');
INSERT INTO faces (name, eye_color) VALUES ('Kat', 'hazel' );

So this is almost perfect for our needs. Only poor documentation persists as an issue.

This is My Domain

To solve that problem, switch to domains. A domain is simply a custom data type that inherits behavior from another data type and to which one or more constraints can be added. It’s pretty simple to switch from the table constraint to a domain:

CREATE DOMAIN eye_color AS TEXT
CONSTRAINT valid_eye_colors CHECK (
    VALUE IN ( 'blue', 'green', 'brown' )
);

CREATE TABLE faces (
    face_id SERIAL PRIMARY KEY,
    name TEXT NOT NULL DEFAULT '',
    eye_color eye_color NOT NULL
);

Nice table declaration, eh? Very clean. Looks exactly like the enum example, in fact. And it works as well as the table constraint:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
  name  | eye_color 
----+------
 David  | blue
 Anna   | blue
 Noriko | brown
 Julie  | green

A constraint violation is a bit more useful than with the table constraint:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
ERROR:  value for domain eye_color violates check constraint "valid_eye_colors"

This points directly to the domain. It’d be nice if it mentioned the violating value the way the enum error did, but at least we can look at the domain out like so:

\dD eye_color
                                                        List of domains
 Schema |   Name    | Type | Modifier |                                         Check                                          
----+------+---+-----+--------------------------------------------
 public | eye_color | text |          | CHECK (VALUE = ANY (ARRAY['blue', 'green', 'brown', 'hazel']))

None of the superfluous stuff about the entire table to deal with, just the constraint, thank you very much. Changing it is just as easy as changing the table constraint:

ALTER DOMAIN eye_color DROP CONSTRAINT valid_eye_colors;
ALTER DOMAIN eye_color ADD CONSTRAINT valid_eye_colors CHECK (
    VALUE IN ( 'blue', 'green', 'brown', 'hazel' )
);

Yep, you can alter domains just as you can alter tables. And of course now it will work:

INSERT INTO eye_colors VALUES ('hazel');
INSERT INTO faces (name, eye_color) VALUES ('Kat', 'hazel' );

And as usual the data is well-ordered when we need it to be:

% SELECT name, eye_color FROM faces ORDER BY eye_color;
  name  | eye_color 
----+------
 David  | blue
 Anna   | blue
 Noriko | brown
 Julie  | green
 Kat    | hazel

And as an added bonus, if you happened to need an eye color in another table, you can just use the same domain and get all the proper semantics. Sweet!

Color Me Happy

Someday I’d love to see support for a PostgreSQL feature like enums, but allowing an arbitrary list of strings that are ordered by the contents of the text rather than the order in which they were declared, and that’s efficient to update. Maybe it could use integers for the underlying storage, too, and allow values to be modified without a table rewrite. Such would be the ideal for this use case. Hell, I’d find it much more useful than enums.

But domains get us pretty close to that without too much effort, so maybe it’s not that important. I’ve tried all of the above approaches and discussed it quite a lot with my colleagues before settling on domains, and I’m quite pleased with it. The only caveat I’d have is that it’s not to be used lightly. If the value set is likely to change fairly often (at least once a week, say), then you’d be better off with the lookup table.

In short, I recommend:

  • For an inherently ordered set of values that’s extremely unlikely to ever change, use an enum.
  • For a set of values that won’t often change and has no inherent ordering, use a domain.
  • For a set of values that changes often, use a lookup table.

What do you use to constrain a column to a defined set of unordered values?

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Quest for PostgreSQL Project Hosting

The pgTAP project is currently hosted by pgFoundry. This is an old version of GForge, and from what I understand, highly modified for the PostgreSQL project. That’s fine, except that it apparently makes it impossible for anyone to find the tuits to upgrade it to newer versions.

And it needs upgrading. One annoying thing I noticed is that the URLs for release files include an integer in them. For example, the URL to download pgTAP 0.23 is http://pgfoundry.org/frs/download.php/2511/pgtap-0.23.tar.bz2. See the “25111” there? It appears to be a primary key value or something, but is completely irrelevant for a release URL. I would much prefer that the URL be something like http://pgfoundry.org/frs/download.php/pgtap-0.23.tar.bz2 or, even better, http://pgfoundry.org/projects/pgtap/frs/pgtap-0.23.tar.bz2. But such is not the case now.

Another issue is hosting. I’ve registered pgtap.org to use for hosting the pgTAP Web site, but there is no support for pointing a hostname at a pgFoundry/GForge site.

These issues could of course be worked out if someone had the tuits to take them on, but apparently there is no one. So I’m looking to move.

The question is, where to? I could get a paid GitHub account (the pgTAP source is already on GitHub) and be able to have a pgTAP site on pgtap.org from there, so that’s a plus. And I can do file releases, too, in which case the URL format would be something like http://cloud.github.com/downloads/theory/pgtap/pgtap-0.23.tar.bz2, which isn’t ideal, but is a hell of a lot better than a URL with a sequence number in it. I could put them on the hosted site, too, in which case they’d have whatever URL I wanted them to have.

There are only two downsides I can think of to moving to GitHub:

  1. No mail list support. The pgTAP mail list has next to no traffic so far, so I’m not sure this is a big deal. I could also set up a list elsewhere, like Librelist, if I really needed one. I’d prefer to have @pgtap.org mail lists, but it’s not a big deal.

  2. I would lose whatever community presence I gain from hosting on pgFoundry. I know that when I release a Perl module to CPAN that it will be visible to lots of people in the Perl community, and automatically searchable via search.cpan.org and other tools. A CPAN release is a release to the Perl community.

    There is nothing like this for PostgreSQL. pgFoundry is the closest thing, and, frankly, nowhere near as good (pgFoundry’s search rankings have always stunk). So if I were to remove my projects from pgFoundry, how could I make them visible to the community? Is there any other central repository of or searchable list of third-party PostgreSQL offerings?

So I’m looking for advice. Does having an email list matter? If I can get pgTAP announcements included in the PostgreSQL Weekly News, is that enough community visibility? Do you know of a nice project hosting site that offers hosting, mail lists, download mirroring and custom domain handling?

I’ll follow up with a summary of what I’ve found in a later post.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Make the Pragmas Stop!

I’ve been following the development of a few things in the Perl community lately, and it’s leaving me very frustrated. For years now, I’ve written modules that start with the same incantation:

package My::Module;

use strict;
our $VERSION = '0.01';

Pretty simple: declare the module name and version, and turn on strictures to make sure I’m not doing anything stupid. More recently I’ve added use warnings; as a best practice. And even more recently, I’ve started adding use utf8;, too, because I like to write my code in UTF-8. And I like to turn on all of the Perl 5.10 features. It’s mildly annoying to have the same incantation at the start of every module, but I could deal with it:

package My::Module;

use strict;
use warnings;
use feature ':5.10';
use utf8;

our $VERSION = '0.01';

Until now that is. Last year, chromatic started something with his Modern::Perl module. It was a decent idea for newbies to help them get started with Perl by having to have only one declaration at the tops of their modules:

package My::Module;

use Modern::Perl;
our $VERSION = '0.01';

Alas, it wasn’t really designed for me, but for more casual users of Perl, so that they don’t have to think about the pragmas they need to use. The fact that it doesn’t include the utf8 pragma also made it a non-starter for me. Or did it? Someone recently suggested that the utf8 pragma has problems (I can’t find the Perl Monks thread at the moment). Others report that the encoding pragma has issues, too. So what’s the right thing to do with regard to assuming everything is UTF8 in my program and its inputs (unless I say otherwise)? I’m not at all sure.

Not only that, but Modern::Perl has lead to an explosion of other pragma-like modules on CPAN that promise best pragma practices. There’s common::sense, which loads utf8 but only some of of the features of strict, warnings, and feature. uni::perl looks almost exactly the same. There’s also Damian Conway’s Toolkit, which allows you to write your own pragma-like loader module. There’s even Acme::Very::Modern::Perl, which is meant to be a joke, but is it really?

If I want to simplify the incantation at the top of every file, what do I use?

And now it’s getting worse. In addition to feature, Perl 5.11 introduces the legacy pragma, which allows one to get back behaviors from older Perls. For example, to get back the old Unicode semantics, you’d use legacy ‘unicode8bit’;. I mean, WTF?

I’ve had it. Please make the pragma explosion stop! Make it so that the best practices known at the time of the release of any given version of Perl can automatically imported if I just write:

package My::Module '0.01';
use 5.12;

That’s it. Nothing more. Whatever has been deemed the best practice at the time 5.12 is released will simply be used. If the best practices change in 5.14, I can switch to use 5.14; and get them, or just leave it at use 5.12 and keep what was the best practices in 5.12 (yay future-proofing!).

What should the best practices be? My list would include:

  • strict
  • warnings
  • features — all of them
  • UTF-8 — all input and output to the scope, as well as the source code

Maybe you disagree with that list. Maybe I’d disagree with what Perl 5 Porters settles on. But then you can I can read what’s included and just add or removed pragmas as necessary. But surely there’s a core list of likely candidates that should be included the vast majority of the time, including for all novices.

In personal communication, chromatic tells me, with regard to Modern::Perl, “Experienced Perl programmers know the right incantations to get the behavior they want. Novices don’t, and I think we can provide them much better defaults without several lines of incantations.” I’m fine with the second assertion, but disagree with the first. I’ve been hacking Perl for almost 15 years, and I no longer have any fucking idea what incantation is best to use in my modules. Do help the novices, and make the power tools available to experienced hackers, but please make life easier for the experienced hackers, too.

I think that declaring the semantics of a particular version of Perl is where the Perl 5 Porters are headed. I just hope that includes handling all of the likely pragmas too, so that I don’t have to.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Japan Photos, Current Status

5,000 Tweets

Much to my amusement, Twitter seems to think that I’ve tweeted over 5,000 times. I don’t see how this can possibly be, but there are days when I tweet a log I guess. Still, 5,000 in a little under three years seems…excessive. I realize that it counts DMs and retweets in that number (not many retweets yet, of course), but still, I question their algorithm. Since one can’t access one’s entire history of tweets (yet?), I can’t count for myself, either, alas.

Kiyomizu Temple Stage

In other news, I’ve been slowly making may way through my photos from the family trip to Japan. Yesterday, I posted photos from Kiyomizu-dera, Gion, and Teramachi, taken our first day in Kyoto. What a fabulously beautiful place! We walked around for hours, and it was a photographic feast. It was all I could do to trim things down to just 63 photos. I have many hundreds more to process still, from Kyoto, Nara, and Tokyo. I’ll get them up over the next week.

As for my current status? I realize I’ve been blogging less, but I’ve been no less busy hacking Perl and PostgreSQL stuff. In addition to ushering out a new version of Pod::Simple, I’ve been updating the code for my Perl doc browser project (example), adding features to pgTAP, and, erm, planning other stuff. Things will likely be slowish through the holidays, but there should be some interesting stuff trickling out over the coming weeks. Till then, this status post will have to do.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Powered by KinoSearch