Just a Theory

By David E. Wheeler

Posts about Perl

Testing Catalyst Template::Declare Views

Now that we have our default Catalyst tests passing, let’s have a look at testing the views we’ve created. You can follow along via the Part 6 tag tag in the GitHub repository. Start by looking at the default test script for our HTML view, t/view_HTML.t. It should look something like this:

use strict;
use warnings;
use Test::More tests => 3;
# use Test::XPath;

BEGIN {
    use_ok 'MyApp::View::HTML' or die;
    use_ok 'MyApp' or die;
}

ok my $view = MyApp->view('HTML'), 'Get HTML view object';

# ok my $output = $view->render(undef, 'hello', { user => 'Theory' }),
#     'Render the "hello" template';

# Test output using Test::XPath or similar.
# my $tx = Test::XPath->new( xml => $output, is_html => 1);
# $tx->ok('/html', 'Should have root html element');
# $tx->is('/html/head/title', 'Hello, Theory', 'Title should be correct');

Yeah, this looks a bit different that the view test created for Template Toolkit or Mason views. That’s because Catalyst::View::TD ships with its own test script template. One of the advantage is that it shows off testing the view without having to instantiate the entire app or send mock HTTP requests. These are unit tests, after all: we want to make sure that the view templates do what they want, not test an entire request process. The latter is more appropriate for integration tests, which I’ll cover later.

So let’s have a look at this test script. The first commented-out statement is:

# ok my $output = $view->render(undef, 'hello', { user => 'Theory' }),
#     'Render the "hello" template';

What this is showing us is that one can use the view’s render() method to execute a view without a context object, thus saving the expense of initializing the application. And if you have templates that don’t rely on it, I highly recommend this approach for keeping your tests fast. Even if the use of the the context object is fairly minimal, you can use Test::MockObject to mock up a context object like so:

use Test::MockObject;
my $c = Test::MockObject->new;
$c->mock(uri_for => sub { $_[1] });
$c->mock(config  => sub { { name => 'MyApp' } });
$c->mock(debug   => sub { });
$c->mock(log     => sub { });

ok my $output = $view->render($c, 'hello', { user => 'Theory' }),
    'Render the "hello" template';

Then you can use the mock() method to mock more methods as your template uses them.

Alas, our app has already passed the point where that seems worthwhile. So far we have just one template, books/list, and it requires that there also be a database statement handle available. Sure we could create a database connection and prepare a statement handle. But that would start to require a fair bit more code to set up. So let’s just instantiate the application object and be done with it. Change the test plan to 5:

use Test::More tests => 5;

Then change the test body after the BEGIN block to:

# Instantiate the context object and the view.
ok my $c = MyApp->new, 'Create context object';
ok my $view = $c->view('HTML'), 'Get HTML view object';

# Create a statement handle for books/list.
my $sth = $c->conn->run(sub { $_->prepare(q{
    SELECT isbn, title, rating, authors FROM books_with_authors
}) });
$sth->execute;

# Render books/list.
ok my $output = $view->render($c, 'books/list', {
    title => 'Book List',
    books => $sth,
}), 'Render the "books/list" template';

This allows us to get a full test of the view.

% prove --lib --verbose t/view_HTML.t
t/view_HTML.t .. 
1..5
ok 1 - use MyApp::View::HTML;
ok 2 - use MyApp;
ok 3 - Create context object
ok 4 - Get HTML view object
Explicit blessing to '' (assuming package main) at /usr/local/lib/perl5/site_perl/5.10.1/Catalyst.pm line 1281.
Explicit blessing to '' (assuming package main) at /usr/local/lib/perl5/site_perl/5.10.1/Catalyst.pm line 1281.
Explicit blessing to '' (assuming package main) at /usr/local/lib/perl5/site_perl/5.10.1/Catalyst.pm line 1281.
Explicit blessing to '' (assuming package main) at /usr/local/lib/perl5/site_perl/5.10.1/Catalyst.pm line 1281.
ok 5 - Render the "books/list" template
ok
All tests successful.
Files=1, Tests=5,  1 wallclock secs ( 0.02 usr  0.00 sys +  0.69 cusr  0.06 csys =  0.77 CPU)
Result: PASS

Hrm. Those warnings are rather annoying. Looking at Catalyst.pm I see that they come from the uri_for() method. I expect that they somehow result from a lack of state in the context object. That’s not really important for our unit tests, so let’s just mock that one method to do something reasonable. Add this code after instantiating the context object but before rendering the view:

use Test::MockObject::Extends;
my $mocker = Test::MockObject::Extends->new($c);
$mocker->mock( uri_for => sub { $_[1]} );

And now we get:

% prove --lib --verbose t/view_HTML.t
t/view_HTML.t .. 
1..5
ok 1 - use MyApp::View::HTML;
ok 2 - use MyApp;
ok 3 - Create context object
ok 4 - Get HTML view object
ok 5 - Render the "books/list" template
ok
All tests successful.
Files=1, Tests=5,  1 wallclock secs ( 0.02 usr  0.01 sys +  0.77 cusr  0.07 csys =  0.87 CPU)
Result: PASS

Ah, much better! And thanks to our mock, we also have a much better idea of what will be returned from uri_for(), which will be important for later tests.

Now that we have things properly mocked up and the objects created such that we can actually get the template to render, it’s time to test the output from the template. For HTML and XML format, I like the Test::XPath module. In fact, it’s for this very use that I wrote Test::XPath. It’s great because it allows me to effectively test the correctness of the template output. Here’s the basic outline:

# Test output using Test::XPath.
my $tx = Test::XPath->new( xml => $output, is_html => 1);
test_basics($tx, 'Book List');

# Call this function for every request to make sure that they all
# have the same basic structure.
sub test_basics {
    my ($tx, $title) = @_;

    # Some basic sanity-checking.
    $tx->is( 'count(/html)',      1, 'Should have 1 html element' );
    $tx->is( 'count(/html/head)', 1, 'Should have 1 head element' );
    $tx->is( 'count(/html/body)', 1, 'Should have 1 body element' );

    # Check the head element.
    $tx->is(
        '/html/head/title',
        $title,
        'Title should be corect'
    );
    $tx->is(
        '/html/head/link[@type="text/css"][@rel="stylesheet"]/@href',
        '/static/css/main.css',
        'Should load the CSS',
    );
}

I’ve set up the test_basics() function to test the things that should be mostly the same for every request. This will mainly cover the output of the wrapper, and includes things like making sure that there is just one <html> tag, one <head> tag, and one <body> tag; and that the title and CSS-related elements are output properly. Running this (with the test plan set to no_plan as I develop), I get:

% prove --lib t/view_HTML.tt
t/view_HTML.t .. 2/? 
#   Failed test 'Should load the CSS'
#   at t/view_HTML.t line 52.
#          got: ''
#     expected: '/static/css/main.css'
# Looks like you failed 1 test of 10.
t/view_HTML.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/10 subtests 

Test Summary Report
-------------------
t/view_HTML.t (Wstat: 256 Tests: 10 Failed: 1)
  Failed test:  10
  Non-zero exit status: 1
Files=1, Tests=10,  1 wallclock secs ( 0.02 usr  0.01 sys +  0.79 cusr  0.08 csys =  0.90 CPU)
Result: FAIL

Hrm. Let’s stick a diag $output in there and see what we get. Now the output includes this bit:

# <html>
#  <head>
#   <title>Book List</title>
#   <link rel="stylesheet" href="/static/css/main.css" />
#  </head>

Ah! the <link> element for the stylesheet is missing the type attribute. So let’s add it. Edit lib/MyApp/Templates/HTML.pm and change the proper bit of the wrapper template to:

link {
    rel is 'stylesheet';
    type is 'text/css';
    href is $c->uri_for('/static/css/main.css' );
};

Note the addition of the type attribute. Now when we run the tests (removing the diag), we get:

% prove --lib t/view_HTML.t
t/view_HTML.t .. ok    
All tests successful.
Files=1, Tests=10,  1 wallclock secs ( 0.02 usr  0.00 sys +  0.78 cusr  0.07 csys =  0.87 CPU)
Result: PASS

Ah, much better! A lot more testing should go in there to make sure that the wrapper is doing things right. I’ve committed such testing, so check it out.

Now we need to test the output specific to the books/list template. Below the call to test_basics(), add this code:

$tx->ok('/html/body/div[@id="bodyblock"]/div[@id="content"]/table', sub {
    $_->is('count(./tr)', 6, 'Should have seven rows' );
    $_->ok('./tr[1]', sub {
        $_->is('count(./th)', 3, 'Should have three table headers');
        $_->is('./th[1]', 'Title', '... first is "Title"');
        $_->is('./th[2]', 'Rating', '... second is "Rating"');
        $_->is('./th[3]', 'Authors', '... third is "Authors"');
    }, 'Should have first table row')
}, 'Should have a table');

Notice the nested block there? Test::XPath supports passing blocks to its ok() method, so that you can naturally scope your tests to blocks of XML and HTML. Neat, huh? If you don’t like the use of $_, the test object is also passed as the sole argument to such blocks.

Anyway, these tests makes sure that the table is where it should be, has the proper number of rows, and that the first row has three headers with their proper values. The test outputs:

% prove --lib t/view_HTML.tt
t/view_HTML.t .. 1/? 
#   Failed test '... third is "Authors"'
#   at t/view_HTML.t line 42.
#          got: 'Author'
#     expected: 'Authors'
# Looks like you failed 1 test of 28.
t/view_HTML.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/28 subtests 

Test Summary Report
-------------------
t/view_HTML.t (Wstat: 256 Tests: 28 Failed: 1)
  Failed test:  28
  Non-zero exit status: 1
Files=1, Tests=28,  1 wallclock secs ( 0.03 usr  0.01 sys +  0.79 cusr  0.08 csys =  0.91 CPU)
Result: FAIL

Whoops! Looks like I forgot to change the header when I changed the template to output a list of authors last week. So edit lib/MyApp/Templates/HTML/Books.pm and change the template to output “Authors” instead of “Author”:

row {
    th { 'Title'   };
    th { 'Rating'  };
    th { 'Authors' };
};

And now all tests pass again:

% prove --lib t/view_HTML.t
t/view_HTML.t .. ok    
All tests successful.
Files=1, Tests=28,  1 wallclock secs ( 0.02 usr  0.01 sys +  0.78 cusr  0.09 csys =  0.90 CPU)
Result: PASS

Great. So let’s finish testing the rest of the output. Ah, but wait! We have on ORDER BY clause on the query, so the order in which the books will be output is undefined. So let’s add an ORDER BY clause. Change the creation of the statement handle in the test file to:

my $sth = $c->conn->run(sub { $_->prepare(q{
    SELECT isbn, title, rating, authors
        FROM books_with_authors
        ORDER BY title
}) });

And now you can start to see why I use the q{} operator for SQL queries. You should also note that the inputs for the view test are now different than those from the controller, which still has no ORDER BY clause. It’s likely that we’ll want to go back and change that later, but I bring it up here to highlight the difference from integration tests – and to emphasize that we’ll need to write those integration tests at some point!

But back to the view unit tests. We can now test the contents of the table by adding code after the test for ./tr[1]. Here’s what the test for the next row looks like:

$_->ok('./tr[2]', sub {
    $_->is('count(./td)', 3, 'Should have three cells');
    $_->is(
        './td[1]',
        'CCSP SNRS Exam Certification Guide',
        '... first is "CCSP SNRS Exam Certification Guide"'
    );
    $_->is('./td[2]', 5, '... second is "5"');
    $_->is(
        './td[3]',
        'Bastien, Nasseh, Degu',
        '... third is "Bastien, Nasseh, Degu"',
    );
}, 'Should have second table row');

The other rows can be similarly tested; have a look at the commit to see all the new tests.

This reminds me, however, that we never created an order for the list of authors. So it’s possible that this test could fail, as the order of the author last names is undefined. We should go back and fix that, probably by listing the authors as they are actually listed on the cover of the book. But in the meantime, our test of this view is done.

Next up, I think I’ll hit controller tests. So come on back!

Looking for the comments? Try the old layout.

Testing the Tutorial App

Yet another entry in my ongoing attempt to rewrite the Catalyst tutorial in my own coding style.

So far, I’ve been following the original tutorial pretty closely. But now I want to skip ahead a bit to chapter 8: testing. I skip because, really, we should be writing tests from the very beginning. They shouldn’t be an afterthought stuck in the penultimate chapter of a tutorial. So let’s write some tests. You can follow along in the Part 5 tag in the GitHub repository.

Oops, A Missing Dependency

Oh, wait! I forgot to tell the build system that we now depend on Catalyst::View::TD and DBIx::Connector. So add these two lines to Makefile.PL:

requires 'Catalyst::View::TD' => '0.11';
requires 'DBIx::Connector' => '0.30';

Okay, now we can write some tests.

STFU

Well, no, actually, let’s start by running the tests we have:

perl Makefile.PL
make test

You should see some output after this — lots of stuff, actually — ending something like this:

[debug] Loaded Path actions:
.-------------------------------------+--------------------------------------.
| Path                                | Private                              |
+-------------------------------------+--------------------------------------+
| /                                   | /index                               |
| /                                   | /default                             |
| /books                              | /books/index                         |
| /books/list                         | /books/list                          |
'-------------------------------------+--------------------------------------'

[info] MyApp powered by Catalyst 5.80013
t/view_HTML.t ......... ok   
All tests successful.
Files=5, Tests=8,  3 wallclock secs ( 0.04 usr  0.02 sys +  2.19 cusr  0.25 csys =  2.50 CPU)
Result: PASS

I don’t know about you, but having all that debugging crap just drives me nuts while I’m running tests. It’s helpful while doing development, but mainly just gets in the way of the tests. So let’s get rid of them. Open up lib/MyApp.pm and change the use Catalyst statement to:

use Catalyst (qw(
    ConfigLoader
    Static::Simple
    StackTrace
), $ENV{HARNESS_ACTIVE} ? () : '-Debug');

Essentially, we’re just turning on the debugging output only if the test harness is not active. Now when we run the tests, we get:

t/01app.t ............. ok   
t/02pod.t ............. skipped: set TEST_POD to enable this test
t/03podcoverage.t ..... skipped: set TEST_POD to enable this test
t/controller_Books.t .. ok   
t/view_HTML.t ......... ok   
All tests successful.
Files=5, Tests=8,  3 wallclock secs ( 0.04 usr  0.02 sys +  2.15 cusr  0.23 csys =  2.44 CPU)
Result: PASS

Much better. Now I can actually see other stuff, such as the fact that I’m skipping POD tests. Personally, I like to make sure that POD tests run all the time, as I’m likely to forget to set the environment variable. So let’s edit t/02pod.t and t/03podcoverage.t and delete this line from each:

plan skip_all => 'set TEST_POD to enable this test' unless $ENV{TEST_POD};

So what does that get us?

t/01app.t ............. ok   
t/02pod.t ............. ok     
t/03podcoverage.t ..... 1/6 
#   Failed test 'Pod coverage on MyApp::Controller::Books'
#   at /usr/local/lib/perl5/site_perl/5.10.1/Test/Pod/Coverage.pm line 126.
# Coverage for MyApp::Controller::Books is 50.0%, with 1 naked subroutine:
#   list

#   Failed test 'Pod coverage on MyApp::Controller::Root'
#   at /usr/local/lib/perl5/site_perl/5.10.1/Test/Pod/Coverage.pm line 126.
# Coverage for MyApp::Controller::Root is 66.7%, with 1 naked subroutine:
#   default
# Looks like you failed 2 tests of 6.
t/03podcoverage.t ..... Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/6 subtests 
t/controller_Books.t .. ok   
t/view_HTML.t ......... ok   

Test Summary Report
-------------------
t/03podcoverage.t   (Wstat: 512 Tests: 6 Failed: 2)
  Failed tests:  2-3
  Non-zero exit status: 2
Files=5, Tests=25,  3 wallclock secs ( 0.05 usr  0.02 sys +  2.82 cusr  0.29 csys =  3.18 CPU)
Result: FAIL
Failed 1/5 test programs. 2/25 subtests failed.

Well that figures, doesn’t it? We added the list action to MyApp::Controller Books but never documented it. And for some reason, Catalyst creates the default action in MyApp::Controller::Root with no documentation. Such a shame. So let’s document those methods. Add this to t/lib/MyApp/Controller/Root.pm:

=head2 default

The default action. Just returns a 404/NOT FOUND error. Might want to update
later with a template to format the error like the rest of our site.

=cut

While there, I notice that the index action has a doc header, but nothing to actually describe what it does. Let’s fix that, too:

The default Catalyst action, which just displays the welcome message. This is
the "Yay it worked!" page. Consider changing to a real home page for our app.

Great. Now open t/lib/MyApp/Controller/Books.pm and document the list action:

=head2 list

Looks up all of the books in the system and executes a template to display
them in a nice table. The data includes the title, rating, and authors of each
book

=cut

Oh hey, look at that. There’s an index method that doesn’t do anything. And it has a POD header and no docs, too. So let’s document it:

The default method for the books controller. Currently just says that it
matches the request; we'll likely want to change it to something more
reasonable down the line.

Okay, so how do the tests look now?

t/01app.t ............. ok   
t/02pod.t ............. ok     
t/03podcoverage.t ..... ok   
t/controller_Books.t .. ok   
t/view_HTML.t ......... ok   
All tests successful.
Files=5, Tests=25,  3 wallclock secs ( 0.05 usr  0.02 sys +  2.82 cusr  0.31 csys =  3.20 CPU)
Result: PASS

Excellent! Now, the truth is that we didn’t document our templates, either. Test::Pod doesn’t cotton on to that fact because they’re not installed like normal subroutines in the test classes. So it’s up to us to document them ourselves. (Note to self: Consider adding a module to test that all Template::Declare classes have docs for all of their templates.) I’ll wait here while you do that.

All done? Great! I had actually planned to start testing the view next, but I think this is enough for today. Stay tuned for more testing goodness.

Looking for the comments? Try the old layout.

Tutorial on GitHub

Following a very good suggestion from Pedro Melo, I’ve created a Git repository for this tutorial and put it on GitHub. I replayed each step, making each into its own commit, and tagged the state of the code for each entry:

So as I continue to make modifications, I’ll keep this repository up-to-date, and tag things as of each blog entry. This will make it easy for you to follow along; you can simply clone the repository and git pull for each post.

More soon.

Looking for the comments? Try the old layout.

My Catalyst Tutorial: Add Authors to the View

Another post in my ongoing series of posts on using Catalyst with Template::Declare and DBIx::Connector. This will be the last post covering material from chapter 3, I promise. This is a fun one, though, because we continue to use this really nice DSL called “SQL,” which I think is more expressive than an ORM would be.

To whit, the next task is to add the missing list of authors to the book list. The thing is, the more I work with databases, the more I’m inclined to think about them not only as the “M” in “MVC”, but also the “V”. I’ll show you what I mean.

A Quick Fix

But first, a diversion. In the second post in this series, I created an SQL statement to insert book authors, but I made a mistake: the values for surnames and given names were reversed. Oops. Furthermore, I included explicit author IDs, even though the id column uses a sequence for it’s default value. So first we need to fix these issues. Change the INSERT INTO authors statement in sql/001-books.sql to:

INSERT INTO authors (surname, given_name)
VALUES ('Bastien',      'Greg'),
       ('Nasseh',       'Sara'),
       ('Degu',         'Christian'),
       ('Stevens',      'Richard'),
       ('Comer',        'Douglas'),
       ('Christiansen', 'Tom'),
       ('Torkington',   'Nathan'),
       ('Zeldman',      'Jeffrey')
;

This time, we’re letting the sequence populate the id column. Fortunately, it starts from 1 just like we did, so we don’t need to update the values in the INSERT INTO book_author statement. Now let’s fix the database:

DELETE FROM book_author;
DELETE FROM authors;

Then run the above SQL query to restore the authors with their proper names, and then run the INSERT INTO book_author statement. That will get us back in business.

Constructing our Query

Now it’s time for the fun. The original SQL query we wrote to get the list of books was:

SELECT isbn, title, rating FROM books;

Nothing unusual there. But to get at the authors, we need to join to book_author and from there to authors. Our first cut looks like this:

SELECT b.isbn, b.title, b.rating, a.surname
  FROM books       b
  JOIN book_author ba ON b.isbn       = ba.isbn
  JOIN authors     a  ON ba.author_id = a.id;

Which yields this data:

       isbn        |               title                | rating |   surname    
-------------------+------------------------------------+--------+--------------
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | Bastien
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | Nasseh
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | Degu
 978-0-201-63346-7 | TCP/IP Illustrated, Volume 1       |      5 | Stevens
 978-0-13-018380-4 | Internetworking with TCP/IP Vol.1  |      4 | Comer
 978-1-56592-243-3 | Perl Cookbook                      |      5 | Christiansen
 978-1-56592-243-3 | Perl Cookbook                      |      5 | Torkington
 978-0-7357-1201-0 | Designing with Web Standards       |      5 | Zeldman

Good start, but note how we now have three rows for “CCSP SNRS Exam Certification Guideâ€? and two for “Perl Cookbookâ€?. We could of course modify our Perl code to look at the ISBN in each row and combine as appropriate, but it’s better to get the database to do that work, since it’s designed for that sort of thing. So let’s use an aggregate function to combine the values over multiple rows into a single row. All we have to do is use the column that changes (surname) in an aggregate function and tell PostgreSQL to use the other columns to group rows into one. PostgreSQL 8.4 introduces a really nice aggregate function, array_agg(), for pulling a series of strings together into an array. Let’s put it to use:

SELECT b.isbn, b.title, b.rating, array_agg(a.surname) as authors
  FROM books       b
  JOIN book_author ba ON b.isbn     = ba.isbn
  JOIN authors     a  ON ba.author_id = a.id
 GROUP BY b.isbn, b.title, b.rating;

Now the output is:

       isbn        |               title                | rating |          authors         
-------------------+------------------------------------+--------+--------------------------
 978-0-201-63346-7 | TCP/IP Illustrated, Volume 1       |      5 | {Stevens}
 978-0-13-018380-4 | Internetworking with TCP/IP Vol.1  |      4 | {Comer}
 978-1-56592-243-3 | Perl Cookbook                      |      5 | {Christiansen,Torkington}
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | {Bastien,Nasseh,Degu}
 978-0-7357-1201-0 | Designing with Web Standards       |      5 | {Zeldman}

Much better. We now have a single row for each book, and the authors are all grouped into a single column. Cool. But we can go one step further. Although we could use Perl to turn the array of author surnames into a comma-delimited string, there’s a PostgreSQL function for that, too: array_to_string(). Check it out:

SELECT b.isbn, b.title, b.rating,
       array_to_string(array_agg(a.surname), ', ') as authors
  FROM books       b
  JOIN book_author ba ON b.isbn     = ba.isbn
  JOIN authors     a  ON ba.author_id = a.id
 GROUP BY b.isbn, b.title, b.rating;

Now the rows will be:

       isbn        |               title                | rating |          authors          
-------------------+------------------------------------+--------+--------------------------
 978-0-201-63346-7 | TCP/IP Illustrated, Volume 1       |      5 | Stevens
 978-0-13-018380-4 | Internetworking with TCP/IP Vol.1  |      4 | Comer
 978-1-56592-243-3 | Perl Cookbook                      |      5 | Christiansen, Torkington
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5 | Bastien, Nasseh, Degu
 978-0-7357-1201-0 | Designing with Web Standards       |      5 | Zeldman

Create a Database View

Cool! All the formatting work is done! But since it’s likely what we’ll often need to fetch book titles along with their authors, let’s create an SQL view for this query. That way, we don’t have to write the same SQL in different places in the application: we can just use the view. So create a new file, sql/002-books_with_authors.sql, and add this SQL:

CREATE VIEW books_with_authors AS
SELECT b.isbn, b.title, b.rating,
       array_to_string(array_agg(a.surname), ', ') as authors
  FROM books       b
  JOIN book_author ba ON b.isbn     = ba.isbn
  JOIN authors     a  ON ba.author_id = a.id
 GROUP BY b.isbn, b.title, b.rating;

Now install this view in the database:

psql -U postgres -d myapp -f sql/002-books_with_authors.sql

And now we can make use of the view any time we want and get the results of the full query. It’s time to do that in our controller. Edit lib/MyApp/Controller/Books.pm and change this line in the list action:

my $sth = $_->prepare('SELECT isbn, title, rating FROM books');

To:

my $sth = $_->prepare(q{
    SELECT isbn, title, rating, authors FROM books_with_authors
});

The use of the q{} operator is a style I use for SQL queries in Perl code; you can use whatever style you like. Since this is a very short SQL statement (thanks to the view), it’s not really necessary to have it on multiple lines, but I like to be fairly consistent about this sort of thing.

The last thing we need to do is a a very simple change to the list template in lib/MyApp/Templates/HTML/Books.pm. In previous posts, I was referring to the non-existent “author” key in the each hash reference fetched from the database. In the new view, however, I’ve named that column “authors”. So change this line:

cell { $book->{author} };

To

cell { $book->{authors} };

And that’s it. Restart the server and reload http://localhost:3000/books/list and you should now see all of the books listed with their authors.

Notes

I think you can appreciate why, to a certain degree, I’m starting to think of the database as handling both the “M” and the “V” in “MVC”. It’s no mistake that the database object we created is known as a “view”. It was written in such a way that it not only expressed the relationship between books and authors in a compact but clear way, but it formatted the appropriate data for publishing on the site—all in a single, efficient query. All the Template::Declare view does is wrap it all up in the appropriate HTML.

PostgreSQL isn’t the only database to support feature such as this, by the way. All of the databases I’ve used support views, and many offer useful aggregate functions, as well. Among the MySQL aggregates, for example, is group_concat(), which sort of combines the array_to_string(array_agg()) PostgreSQL syntax into a single function. And I’ve personally written a custom aggregate for SQLite in Perl. So although I use PostgreSQL for these examples and make use of its functionality, you can do much the same thing in most other databases.

Either way, I find this to be a lot less work than using an ORM or other abstraction layer between my app and the database. Frankly, SQL provides just the right level of abstraction.

Looking for the comments? Try the old layout.

Create a Template::Declare Wrapper

Next in my ongoing series of posts on using Catalyst with Template::Declare and DBIx::Connector, we pick up again in chapter 3 to create a wrapper for the view. I added the wrapper support to Template::Declare over a year ago, and while the idea is sound, the interface makes it feel like it’s bolted on. See if you agree with me.

Returning to the MyApp project, open lib/MyApp/Templates/HTML.pm and implement a wrapper like so:

use Sub::Exporter -setup => { exports => [qw(wrapper) ] };

create_wrapper wrapper => sub {
    my ($code, $c, $args) = @_;
    xml_decl { 'xml', version => '1.0' };
    outs_raw '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" '
            . '"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">';
    html {
        head {
            title { $args->{title} || 'My Catalyst App!' };
            link {
                rel is 'stylesheet';
                href is $c->uri_for('/static/css/main.css' );
            };

        };

        body {
            div {
                id is 'header';
                # Your logo can go here.
                img {
                    src is $c->uri_for('/static/images/btn_88x31_powered.png');
                };
                # Page title.
                h1 { $args->{title} || $c->config->{name} };
            }; # end header.

            div {
                id is 'bodyblock';
                div {
                    id is 'menu';
                    h3 { 'Navigation' };
                    ul {
                        li {
                            a {
                                href is $c->uri_for('/books/list');
                                'Home';
                            };
                        };
                        li {
                            a {
                                href is $c->uri_for('/');
                                title is 'Catalyst Welcome Page';
                                'Welcome';
                            };
                        };
                    };
                }; # end menu

                div {
                    id is 'content';
                    # Status and error messages.
                    if (my $msg = $args->{status_msg}) {
                        span { class is 'message'; $msg };
                    }
                    if (my $err = $args->{error_msg}) {
                        span { class is 'error'; $err };
                    }

                    # Output the template contents.
                    $code->($args);
                }; # end content

            }; # end bodyblock
        };
    };
};

This looks like more work than it is because of the copious use of whitespace I’ve used. Personally, I find the pure Perl syntax easier to read than the mix of HTML and TT syntax in the Template Toolkit wrapper. Anyway, this is a nearly direct port of the Template Toolkit wrapper from the tutorial. Template::Declare wrappers expect at least one argument: a code reference that will output the content of the main template. You can see it used here near the end of the code, with the line $code->($args).

Unfortunately, Template::Declare doesn’t make such wrappers easily available to templates. So we have to add the Sub::Exporter line at the top to export the wrapper function it creates, named wrapper.

Next, open up lib/MyApp/Templates/HTML/Books.pm and edit the list template to take advantage of the wrapper. The new code looks like this:

use MyApp::Templates::HTML 'wrapper';

template list => sub {
    my ($self, $args) = @_;
    wrapper {
        table {
            row {
                th { 'Title'  };
                th { 'Rating' };
                th { 'Author' };
            };
            my $sth = $args->{books};
            while (my $book = $sth->fetchrow_hashref) {
                row {
                    cell { $book->{title}  };
                    cell { $book->{rating} };
                    cell { $book->{author} };
                };
            };
        };
    } $self->c, $args;
};

First we import the wrapper function from MyApp::Templates::HTML, and then we simply use it to wrap the contents of our template. Note that the context object and template arguments must be passed on to the wrapper; they’re not provided to the wrapper by Template::Declare. That’s something else I’d like to adjust.

In the meantime, contrary to the tutorial, I don’t think the template should set the title of the page. It seems to me that’s more the responsibility of the controller. So while this template could easily add a title key to the $args hash before passing it on to the wrapper, I recommend editing the list action in MyApp::Controller::Books instead:

sub list : Local {
    my ($self, $c) = @_;
    my $stash = $c->stash;
    $stash->{title} = 'Book List';
    $stash->{books} = $c->conn->run(fixup => sub {
        my $sth = $_->prepare('SELECT isbn, title, rating FROM books');
        $sth->execute;
        $sth;
    });
}

So, with the wrapper in place, let’s create the stylesheet the wrapper uses:

$ mkdir root/static/css

Then open root/static/css/main.css and add the following content:

#header {
    text-align: center;
}
#header h1 {
    margin: 0;
}
#header img {
    float: right;
}
#footer {
    text-align: center;
    font-style: italic;
    padding-top: 20px;
}
#menu {
    font-weight: bold;
    background-color: #ddd;
    float: left;
    padding: 0 0 50% 5px;
}
#menu ul {
    margin: 0;
    padding: 0;
    list-style: none;
    font-weight: normal;
    background-color: #ddd;
    width: 100px;
}
#content {
    margin-left: 120px;
}
.message {
    color: #390;
}
.error {
    color: #f00;
}

Now restart the app as usual and reload the books list at http://localhost:3000/books/list. You should now see a nicely formatted page with the navigation and header stuff, as well as the book list. You can change the CSS and the wrapper to modify the overall look of your site, and then use the wrapper in all of your page request templates to get the same look and feel across your site.

While this works, I’m not satisfied with the overall interface for Template::Declare wrappers. The need to explicitly export them and pass arguments is annoying. Maybe the Jifty guys have some other approach that works better. But if not, I’ll likely go back to the drawing board on wrappers and see how they can be made better.

Next up: More database fun!

Looking for the comments? Try the old layout.

Catalyst with DBIx::Connector and Template::Declare

Following up on my post yesterday introducing Catalyst::View::TD, today I’d like to continue with the next step in chapter 3 of the Catalyst tutorial. The twist here is that I’m going to use PostgreSQL for the database back-end and start introducing some database best practices. I’m also going to make use of my DBIx::Connector module to interact with the database.

Create the Database

Picking up with the database creation section of the tutorial, the first change I’d like to make is to use a natural key for the books table. All books have unique identifiers, known as ISBNs, so creating a surrogate key (the typical id column in ORM-managed applications) is redundant. One of the nice things about PostgreSQL is that it ships with a contributed library, isn, which validates ISBN and other international identifiers. So we use this contrib module (usually included in package-installed PostgreSQL servers) for the primary key for books. If you need to install it from source, it’s pretty easy:

cd postgresql-8.4.1/contrib/isn
make
make install

Ideally I’d use a natural key for the authors table too, but despite some attempts to create universal identifiers for authors, nothing has really caught on as far as I know. So I’ll just stick to a surrogate key for now.

First step: create the database and install isn if it’s not already included in the template database:

createdb -U postgres myapp
psql -U postgres -d myapp -f /usr/local/pgsql/share/contrib/isn.sql

The isn.sql file may be somewhere else on your system. Now let’s create the database. Create sql/001-books.sql in the MyApp directory and paste this into it:

BEGIN;

CREATE TABLE books (
    isbn   ISBN13   PRIMARY KEY,
    title  TEXT     NOT NULL DEFAULT '',
    rating SMALLINT NOT NULL DEFAULT 0 CHECK (rating BETWEEN 0 AND 5)
);

CREATE TABLE authors (
    id         BIGSERIAL PRIMARY KEY,
    surname    TEXT NOT NULL DEFAULT '',
    given_name TEXT NOT NULL DEFAULT ''
);

CREATE TABLE book_author (
    isbn       ISBN13 REFERENCES books(isbn),
    author_id  BIGINT REFERENCES authors(id),
    PRIMARY KEY (isbn, author_id)
);

INSERT INTO books
VALUES ('1587201534',        'CCSP SNRS Exam Certification Guide', 5),
       ('978-0201633467',    'TCP/IP Illustrated, Volume 1',       5),
       ('978-0130183804',    'Internetworking with TCP/IP Vol.1',  4),
       ('978-1-56592-243-3', 'Perl Cookbook',                      5),
       ('978-0735712010',    'Designing with Web Standards',       5)
;

INSERT INTO authors
VALUES (1, 'Greg',      'Bastien'),
       (2, 'Sara',      'Nasseh'),
       (3, 'Christian', 'Degu'),
       (4, 'Richard',   'Stevens'),
       (5, 'Douglas',   'Comer'),
       (6, 'Tom',       'Christiansen'),
       (7, 'Nathan',    'Torkington'),
       (8, 'Jeffrey',   'Zeldman')
;

INSERT INTO book_author
VALUES ('1587201534',        1),
       ('1587201534',        2),
       ('1587201534',        3),
       ('978-0201633467',    4),
       ('978-0130183804',    5),
       ('978-1-56592-243-3', 6),
       ('978-1-56592-243-3', 7),
       ('978-0735712010',    8)
;

COMMIT;

Yeah, I Googled for the ISBNs for those books. I found the ISBN-13 number for most of them, but it handles the old ISBN-10 format, too, automatically upgrading it to ISBN-13. I also added a CHECK constraint for the rating column, to be sure that the value is always BETWEEN 0 AND 5. I also like to include default values where it’s sensible to do so, and that syntax for inserting multiple rows at once is pretty nice to have.

Go ahead and run this against your database:

psql -U postgres -d myapp -f sql/001-books.sql

Now if you connect to the server, you should be able to query things like so:

$ psql -U postgres myapp
psql (8.4.1)
Type "help" for help.

myapp=# select * from books;
       isbn        |               title                | rating 
-------------------+------------------------------------+--------
 978-1-58720-153-0 | CCSP SNRS Exam Certification Guide |      5
 978-0-201-63346-7 | TCP/IP Illustrated, Volume 1       |      5
 978-0-13-018380-4 | Internetworking with TCP/IP Vol.1  |      4
 978-1-56592-243-3 | Perl Cookbook                      |      5
 978-0-7357-1201-0 | Designing with Web Standards       |      5
(5 rows)

Setup the Database Connection

Great! The database is set. Now we need a way for the app to talk to it. I’ve not yet decided how I’m going to integrate DBIx::Connector into a Catalyst model class; maybe I’ll figure it out as I write these posts. But since my mantra is “the database is the model,â€? for now I won’t bother with a model at all. Instead, I’ll create a simple accessor in MyApp so we can easily get at the database connection wherever we need it. To do that, add these lines to lib/MyApp.pm:

use Moose;
use DBIx::Connector;
use Exception::Class::DBI;

has conn => (is => 'ro', lazy => 1, default => sub {
    DBIx::Connector->new( 'dbi:Pg:dbname=myapp', 'postgres', '', {
        PrintError     => 0,
        RaiseError     => 0,
        HandleError    => Exception::Class::DBI->handler,
        AutoCommit     => 1,
        pg_enable_utf8 => 1,
    });
});

We load Moose to get the has keyword, the officially sanctioned interface for defining attributes in Catalyst classes. Then I use that keyword to create the conn attribute. This attribute is read-only and has a DBIx::Connector object for its default value. The nice thing about this is that the DBIx::Connector object won’t be instantiated until it’s actually needed, and then it will be kept forever. We never have to do anything else to use it.

Oh, and I like to make sure that text data coming back from PostgreSQL is properly encoded as UTF-8, and I like to use Exception::Class::DBI to turn DBI errors into exception objects.

Now it’s time to update our controller and template to fetch actual data from the database. Edit lib/MyApp/Controller/Books.pm and change the list method to:

sub list : Local {
    my ($self, $c) = @_;
    $c->stash->{books} = $c->conn->run(fixup => sub {
        my $sth = $_->prepare('SELECT isbn, title, rating FROM books');
        $sth->execute;
        $sth;
    });
}

All we’re doing here is creating a statement handle for the query, executing the query, and storing the statement handle in the stash. Now we need to update the template to use the statement handle. Open up lib/MyApp/Templates/HTML/Books.pm and change the list template to:

template list => sub {
    my ($self, $args) = @_;
    table {
        row {
            th { 'Title'  };
            th { 'Rating' };
            th { 'Author' };
        };
        my $sth = $args->{books};
        while (my $book = $sth->fetchrow_hashref) {
            row {
                cell { $book->{title}  };
                cell { $book->{rating} };
                cell { $book->{author} };
            };
        };
    };
};

All we do is fetch each row from the statement handle and output it. The only thing that’s changed is the use of the statement handle as an iterator rather than an array reference.

And now we’re set! Restart your server with script/myapp_server.pl and point your browser at http://localhost:3000/books/list. Now you should see the book titles and ratings, though the authors still aren’t present. We’ll fix that in a later post.

Takeaway

The takeaway from this post: Use PostgreSQL’s support for custom data types to create validated natural keys for your data, and use a stable, persistent database connection to talk directly to the database. No need for an ORM here, as the DBI provides a very Perlish access to a very capable DSL for models called SQL.

More soon.

Looking for the comments? Try the old layout.

Create Catalyst Views with Template::Declare

Following up on last week’s release of Template::Declare 0.41, this week I’m pleased to announce the release of a new Catalyst view class, Catalyst::View::TD.

Yes, I’m aware of Catalyst::View::Template::Declare. As I mentioned last week, it doesn’t make very good use of Template::Declare. I don’t blame jrock for that, though; Template::Declare had very poor documentation before 0.41. But now that it is properly documented and I have a pretty solid grasp of how it works, I wanted to create a new Catalyst View that could take proper advantage of its features.

If you’re a Catalyst developer, chances are that you currently use Template Toolkit or Mason for your templating needs. So why should you consider Catalyst::View::TD for your next project? How about:

  • Feature-parity with Catalyst::View::TT, the view class for Template Toolkit
  • Includes a myapp_create.pl helper for creating new template classes.
  • Intuitive, easy-to-use HTML and XML templating in Perl
  • All templates loaded at server startup time (great for forking servers like mod_perl)
  • Template paths that correspond to Controller URIs.

If you weren’t convinced by the first three points, that forth one is the killer. It’s the reason I wrote a new view. But here’s an even better reason: I’m going to show you exactly how to use it, right here in this blog post.

A Simple Hello

I’m borrowing from chapter 3 of the Catalyst tutorial. First, create a new app:

$ catalyst.pl MyApp
cd MyApp

Then update the list of plugins in MyApp.pm:

use Catalyst qw/
    -Debug
    ConfigLoader
    Static::Simple
    StackTrace
/;

Now create a controller:

$ script/myapp_create.pl controller Books

Then edit it and add this controller (see chapter 3 if you need explanation about what this does):

sub list : Local {
    my ($self, $c) = @_;
    $c->stash->{books} = [];
    $c->stash->{template} = '/books/list';
}

And now, create a view and a new template class:

$ script/myapp_create.pl view HTML TD
$ script/myapp_create.pl TDClass HTML::Books

Open lib/MyApp/Templates/HTML/Books.pm and add the list template:

my ($self, $args) = @_;
table {
    row {
        th { 'Title'  };
        th { 'Rating' };
        th { 'Author' };
    };
    for my $book (@{ $args->{books} }) {
        row {
            cell { $book->{title}  };
            cell { $book->{rating} };
            cell { $book->{author} };
        };
    }
};

Then point your browser to http://localhost:3000/books/list. If you have everything working so far, you should see a web page that displays nothing other than our column headers for “Title”, “Rating”, and “Author(s)” — we won’t see any books until we get the database and model working below.

A Few Comments

The first thing I want to draw your attention to in this example is that list template. Isn’t it a thing of beauty? It’s so easy for Perl hackers to read. Compare it to the TT example from the tutorial (with the comments removed, just to be fair):

<tr><th>Title</th><th>Rating</th><th>Author(s)</th></tr>
[% FOREACH book IN books -%]
    <tr>
    <td>[% book.title %]</td>
    <td>[% book.rating %]</td>
    <td></td>
    </tr>
[% END -%]
</table>

I mean, which would you rather have to maintain? And this is an extremely simple example. The comparison only becomes more stark when the HTML becomes more complex.

The other thing I want to point out is the name of the template class we created, MyApp::Template::HTML::Books and its template, list. They correspond perfectly with the controller, MyApp::Controller::Books, and its action list. See the parity there? The URI for the action is /books/list, and the template path, by coincidence is also /books/list. Nice, huh? Thanks to this parity, you can even remove the template specification in the controller, since by default Catalyst will render a template with the same name as the action:

sub list : Local {
    my ($self, $c) = @_;
    $c->stash->{books} = [];
}

This is the primary way in which Catalyst::View::TD differs from its predecessor. Whereas the latter would load all of the modules under the view’s namespace and shove all of their templates into root path, the former imports templates under paths that correspond to their class names. Hence the match with controller names.

Stay Tuned

It was kind of fun to subvert the Catalyst tutorial for my nefarious purposes. Maybe I’ll keep it up with more blog posts in the coming weeks that continues to do so. Not only will it let me show off how nice Template::Declare templates can be, but it will let me continue my rant against ORMs as well. Stay tuned.

Looking for the comments? Try the old layout.

Pg::Priv Hits CPAN (Thanks Etsy!)

Earlier this year, I was working on an administrative utility for Etsy that validates PostgreSQL database permissions. Of course, in order to verify that permissions were correct or needed updating, I had to have a way to examine PostgreSQL ACLs, which are arrays made of of strings that look like this:

my $acl = [
    'miriam=arwdDxt/miriam',
    '=r/miriam',
    'admin=arw/miriam',
];

So following the documentation, I wrote a module that iterates over an ACL, parses each privilege string, and returns an object describing it. Using it is pretty easy. If you wanted to see what the permissions looked like on all the tables in a database, you could do it like so:

#!/usr/bin/perl -w
use strict;
use warnings;
use DBI;
use Pg::Priv;

my $dbname = shift or die "Usage: $0 dbname\n";

my $dbh = DBI->connect("dbi:Pg:dbname=$dbname", 'postgres', '');
my $sth = $dbh->prepare(
    q{SELECT relname, relacl FROM pg_class WHERE relkind = 'r'}
);

$sth->execute;
print "Permissions for $dbname:\n";
while (my $row = $sth->fetchrow_hashref) {
    print "  Table $row->{relname}:\n";
    for my $priv ( Pg::Priv->parse_acl( $row->{relacl} ) ) {
        print '    ', $priv->by, ' granted to ', $priv->to, ': ',
            join( ', ', $priv->labels ), $/;
    }
}

And here’s what the output looks like:

Permissions for bric:
  Table media__output_channel:
    postgres granted to postgres: UPDATE, SELECT, INSERT, TRUNCATE, REFERENCE, DELETE, TRIGGER
    postgres granted to bric: UPDATE, SELECT, INSERT, DELETE
  Table media_uri:
    postgres granted to postgres: UPDATE, SELECT, INSERT, TRUNCATE, REFERENCE, DELETE, TRIGGER
    postgres granted to bric: UPDATE, SELECT, INSERT, DELETE
  Table media_fields:
    postgres granted to postgres: UPDATE, SELECT, INSERT, TRUNCATE, REFERENCE, DELETE, TRIGGER

There are a bunch of utility methods to make it pretty simple to examine PostgreSQL privileges.

And now, I’m pleased to announce the release yesterday of Pg::Priv. My thanks to Etsy for agreeing to the release, and particularly to Chad Dickerson for championing it. This module is a little thing compared to some things I’ve seen open-sourced by major players, but even the simplest utilities can save folks mountains of time. I hope you find Pg::Priv useful.

Looking for the comments? Try the old layout.

More about…

Template::Declare Explained

Today, Sartak uploaded a new version of Template::Declare. Why should you care? Well, in addition to the nice templating syntax, the new version features complete documentation. For everything.

This came about because I was trying to understand Template::Declare, with its occasional mentioning of “mixins” and “paths” and “roots,” and I just wasn’t getting it out of the existing documentation. Much of my confusion stemmed from how Catalyst::View::Template::Declare used Template::Declare. So I started peppering Jesse with questions and offering to fill in some gaps in the docs, and he was foolish enough to give me a commit bit.

I was particularly interested in the import_templates and alias methods. There was no documentation, and though there were tests, the two methods were so similar that I could barely tell the difference. I also wasn’t sure what the point was, though I had ideas. So I asked a bunch of questions and, through the discussion, I started to put the pieces together. I wrote more tests, and started refactoring things. I’d write some code, rename things, move them around, combine things, and then see who screamed. Jesse and Sartak were happy to run the Jifty test suite and even, I think, some Best Practical internal stuff to see what I broke. And then I’d think I got things just right and they would punch holes in it again.

But it finally came together, I understood what the methods were trying to do, and I documented the shit out of it. Then Sartak would copy-edit my docs, verifying my interpretations, and help me to understand where I got things wrong.

The new version features a glossary (useful for users of other templating packages) and extended examples that demonstrate XUL output, postprocessing, inheritance, and wrappers. And, most importantly, an explanation of aliasing (think delegation) and mixins (using the new name for import_templates: mix). I greatly appreciate the time the BPS team took to answer my noobish questions. And their patience as I ripped things apart and built them up again. The result is that, in addition to being better documented, the new version’s alias method creates build much better-performing and less memory-intensive aliases.

So why was I doing all this? Well, Catalyst::View::Template::Declare never seemed quite right to me. And in my discussions with the Jifty guys, it seemed clear that its use of Template::Declare was trying to alias kinda sorta, but not really. So as I tried to understand aliasing, I realized that a new view class was needed for catalyst. So I endeavored to really understand the features of Template::Declare so that I could do it right.

More news on that soon.

The upshot is that you have pretty nice control over mixing and aliasing Template::Declare templates into paths. For example, if you have this template class:

package MyApp::Templates::Util;
use base 'Template::Declare';
use Template::Declare::Tags;

template header => sub {
    my ($self, $args) = @_;
    head { title {  $args->{title} } };
};

template footer => sub {
    div {
        id is 'fineprint';
        p { 'Site contents licensed under a Creative Commons License.' }
    };
};

You can mix those templates into your primary template class like so:

package MyApp::Templates::Main;
use base 'Template::Declare';
use Template::Declare::Tags;
use MyApp::Template::Util;
mix MyApp::Template::Util under '/util';

template content => sub {
    show '/util/header';
    body {
        h1 { 'Hello world' };
        show '/util/footer';
    };
};

See how I’ve used the mixed in header and footer templates by referring to them under the /util path? This gives the invocation of the other templates the feel of calling Mason components or invoking Template Toolkit templates. You can use these templates like so:

Template::Declare->init( dispatch_to => ['MyApp::Templates::Main'] );
print Template::Declare->show('/content');

So MyApp::Templates::Main’s templates are in the “/” directory, so to speak, while the MyApp::Templates::Util’s templates are in the “/utils” subdirectory. Pretty cool, eh?

So with this understanding in place, I had a much better feel for Template::Declare, and could better think of it in normal templating terms. Now I’m this much closer to my ideal Catalyst development environment. More soon.

Looking for the comments? Try the old layout.

Pod::Simple 3.09 Hits the CPAN

I spent some time over the last few days helping Allison fix bugs and close tickets for a new version of Pod::Simple. I’m not sure how I convinced Allison to suddenly dedicate her day to fixing Pod::Simple bugs and putting out a new release. She must’ve had some studies or Parrot spec work she wanted to get out of or something.

Either way, it’s got some useful fixes and improvements:

  • The XHTML formatter now supports tables of contents (via the poorly-named-but-consistent-with-the-HTML-formatter index parameter).

  • You can now reformat verbatim blocks via the strip_verbatim_indent parameter/method. Because you have to indent verbatim blocks (code examples) with one or more spaces, you end up with those spaces remaining in output. Just have a look at an example on search.cpan.org. See how the code in the Synopsis is indented? That’s because it’s indented in the POD. But maybe you don’t want it to be indented in your final output. If not, you can strip out leading spaces via strip_verbatim_indent. Pass in the text to strip out:

    $parser->strip_verbatim_indent('  ');

    Or a code reference that figures out what to strip out. I’m fond of stripping based on the indentation of the first line, like so:

    $new->strip_verbatim_indent(sub {
        my $lines = shift;
        (my $indent = $lines->[0]) =~ s/\S.*//;
        return $indent;
    });
  • You can now use the nocase parameter to Pod::Simple::PullParser to tell the parser to ignore the case of POD blocks when searching for author, title, version, and description information. This is a hack that Graham has used for a while on search.cpan.org, in part because I nagged him about my modules, which don’t use uppercase =head1 text. Thanks Graham!

  • Fixed entity encoding in the XHTML formatter. It was failing to encode entities everywhere except code spans and verbatim blocks. Oops. It also now properly encodes E<sol> and E<verbar>, as well as numeric entities.

  • Multiparagraph items now work properly in the XHTML formatter, as do text items (definition lists).

  • A POD tag found inside a complex POD tag (e.g., C<<< C<foo> >>>) is now properly parsed as text and entities instead of a tag embedded in a tag (e.g., <foo>). This is in compliance with perlpod.

This last item is the only change I think might lead to problems. I fixed it in response to a bug report from Schwern. The relevant bit from the perlpod spec is:

A more readable, and perhaps more “plain” way is to use an alternate set of delimiters that doesn’t require a single “>” to be escaped. With the Pod formatters that are standard starting with perl5.5.660, doubled angle brackets (“<<” and “>>”) may be used if and only if there is whitespace right after the opening delimiter and whitespace right before the closing delimiter! For example, the following will do the trick:

C<< $a <=> $b >>

In fact, you can use as many repeated angle‐brackets as you like so long as you have the same number of them in the opening and closing delimiters, and make sure that whitespace immediately follows the last ’<’ of the opening delimiter, and immediately precedes the first “>” of the closing delimiter. (The whitespace is ignored.) So the following will also work:

C<<< $a <=> $b >>>
C<<<<  $a <=> $b     >>>>

And they all mean exactly the same as this:

C<$a E<lt>=E<gt> $b>

Although all of the examples use C<< >>, it seems pretty clear that it applies to all of the span tags (B<< >>, I<< >>, F<< >>, etc.). So I made the change so that tags embedded in these “complex” tags, as comments in Pod::Simple call them, are not treated as tags. That is, all < and > characters are encoded.

Unfortunately, despite what the perlpod spec says (at least in my reading), Sean had quite a few pathological examples in the tests that expected POD tags embedded in complex POD tags to work. Here’s an example:

L<<< Perl B<Error E<77>essages>|perldiag >>>

Before I fixed the bug, that was expected to be output as this XML:

<L to="perldiag" type="pod">Perl <B>Error Messages</B></L>

After the bug fix, it’s:

<L content-implicit="yes" section="Perl B&#60;&#60;&#60; Error E&#60;77&#62;essages" type="pod">&#34;Perl B&#60;&#60;&#60; Error E&#60;77&#62;essages&#34;</L>

Well, there’s a lot more crap that Pod::Simple puts in there, but the important thing to note is that neither the B<> nor the E<> is evaluated as a POD tag inside the L<<< >>> tag. If that seems inconsistent at all, just remember that POD tags still work inside non-complex POD tags (that is, when there is just one set of angle brackets):

L<Perl B<Error E<77>essages>|perldiag>

I’m pretty sure that few users were relying on POD tags working inside complex POD tags anyway. At least I hope so. I’m currently working up a patch for blead that updates Pod::Simple in core, so it will be interesting to see if it breaks anyone’s POD. Here’s to hoping it doesn’t!

Looking for the comments? Try the old layout.

DBIx::Connector Updated

After much gnashing of teeth, heated arguments with @robkinon and @mst, lots of deep thesaurus spelunking, and three or four iterations, I finally came up with an an improved API for DBIx::Connector that I believe is straight-forward and easy to explain.

Following up on my post last week, I explored, oh I dunno, a hundred different terms for the various methods? I’ve never spent so much time on thesaurus.com in my life. Part of what added to the difficulty was that @mst seemed to think that there should actually be three modes for each block method: one that pings, one that doesn’t, and one that tries again if a block dies and the connection is down. So I went from six to nine methods with that assertion.

What I finally came up with was to name the three basic methods run(), txn_run(), and svp_run(), and these would neither ping nor retry in the event of failure. Then I added variations on these methods that would ping and that would try to fix failures. I called these “ping runs” and “fixup runs,” respectively. It was the latter term, “fixup,” that had been so hard for me to nail down, as “retry” seemed to say that the method was a retry, while “fixup” more accurately reflects that the method would try to fix up the connection in the event of a failure.

Once I’d implemented this interface, I now had nine methods:

  • run()
  • txn_run()
  • svp_run()
  • ping_run()
  • txn_ping_run()
  • svp_ping_run()
  • fixup_run()
  • txn_fixup_run()
  • svp_fixup_run()

This worked great. Then I went about documenting it. Jesus Christ what a pain! I realized that all these similarly-named methods would require a lot of explanation. I duly wrote up said explanation, and just wasn’t happy with it. It just felt to me like all the explanation made it too difficult to decide what methods to use and when. Such confusion would make the module less likely to be used – and certainly less likely to be used efficiently.

So I went back to the API drawing board and, reflecting on @robkinyon’s browbeating about decorating methods and @mst’s coming to that conclusion as well, I finally came up with just three methods:

  • run()
  • txn()
  • svp()

For any one of these, you can call it by passing a block, of course:

$conn->txn( sub { $_->do('SELECT some_function()') } );

In addition, you can now have any one of them run in one of three modes: the default (no ping), “ping”, or “fixup”:

$conn->txn( fixup => sub { $_->do('SELECT some_function()') } );

It’s much easier to explain the three methods in terms of how the block is transactionally encapsulated, as that’s the only difference between them. Once that’s understood, it’s pretty easy to explain how to change the “connection mode” of each by passing in a leading string. It even looks pretty nice. I’m really happy with this

One thing that increased the difficulty in coming up with this API was that @mst felt that by default the methods should neither ping nor try to fix up a failure. I was resistant to this because it’s not how Apache::DBI or connect_cached() work: they always ping. It turns out that DBIx::Class doesn’t cache connections at all. I thought it had. Rather, it creates a connection and simply hangs onto it as a scalar variable. It handles the connection for as long as it’s in scope, but includes no magic global caching. This reduces the action-at-a-distance issues common with caching while maintaining proper fork- and thread-safety.

At this point, I took a baseball bat to my desk.

Figuratively, anyway. I did at least unleash a mountain of curses upon @mst and various family relations. Because it took me a few minutes to see it: It turns out that DBIx::Class is right to do it this way. So I ripped out the global caching from DBIx::Connector, and suddenly it made much more sense not to ping by default – just as you wouldn’t ping if you created a DBI handle yourself.

DBIx::Connector is no longer a caching layer over the DBI. It’s now a proxy for a connection. That’s it. There is no magic, no implicit behavior, so it’s easier to use. And because it ensures fork- and thread-safety, you can instantiate a connector and hold onto it for whenever you need it, unlike using the DBI itself.

And one more thing: I also added a new method, with(). For those who always want to use the same connection mode, you can use this method to create a proxy object that has a different default mode. (Yes, a proxy for a proxy for a database handle. Whatever!) Use it like this:

$conn->with('fixup')->run( sub { ... } );

And if you always want to use the same mode, hold onto the proxy instead of the connection object:

my $proxy = DBIx::Connector->(@args)->with('fixup');

# later ...
$proxy->txn( sub { ... } ); # always in fixup mode

So while fixup mode is no longer the default, as Tim requested, but it can optionally be made the default, as DBIx::Class requires. The with() method will also be the place to add other global behavioral modifications, such as DBIx::Class’s auto_savepoint feature.

So for those of you who were interested in the first iteration of this module, my apologies for changing things so dramatically in this release (ripping out the global caching, deprecating methods, adding a new block method API, etc.). But I think that, for all the pain I went through to come up with the new API – all the arguing on IRC, all the thesaurus spelunking – that this is a really good API, easy to explain and understand, and easy to use. And I don’t expect to change it again. I might improve exceptions (use objects instead of strings?) add block method exception handling (perhaps adding a catch keyword?), but the basics are finally nailed down and here to stay.

Thanks to @mst, @robkinyon, and @ribasushi, in particular, for bearing with me and continuing to hammer on me when I was being dense.

Looking for the comments? Try the old layout.

Suggest Method Names for DBIx::Connector

Thanks to feedback from Tim Bunce and Peter Rabbitson in a DBIx::Class bug report, I’ve been reworking DBIx::Connector’s block-handling methods. Tim’s objection is that the the feature of do() and txn_do() that executes the code reference a second time in the event of a connection failure can be dangerous. That is, it can lead to action-at-a-distance bugs that are hard to find and fix. Tim suggested renaming the methods do_with_retry() and txn_do_with_retry() in order to make explicit what’s going on, and to have non-retry versions of the methods.

I’ve made this change in the repository. But I wasn’t happy with the method names; even though they’re unambiguous, they are also overly long and not very friendly. I want people to use the retrying methods, but felt that the long names make the non-retrying preferable to users. While I was at it, I also wanted to get rid of do(), since it quickly became clear that it could cause some confusion with the DBI’s do() method.

I’ve been thesaurus spelunking for the last few days, and have come up with a few options, but would love to hear other suggestions. I like using run instead of do to avoid confusion with the DBI, but otherwise I’m not really happy with what I’ve come up with. There are basically five different methods (using Tim’s suggestions for the moment):

run( sub {} )
Just run a block of code.
txn_run( sub {} )
Run a block of code inside a transaction.
run_with_retry( sub {} )
Run a block of code without pinging the database, and re-run the code if it throws an exception and the database turned out to be disconnected.
txn_run_with_rerun( sub {} )
Like run_with_retry(), but run the block inside a transaction.
svp_run( sub {} )
Run a block of code inside a savepoint (no retry for savepoints).

Here are some of the names I’ve come up with so far:

Run block Run in txn Run in savepoint Run with retry Run in txn with retry Retry Mnemonic
run txn_run svp_run runup txn_runup Run assuming the db is up, retry if not.
run txn_run svp_run run_up txn_run_up Same as above.
run txn_run svp_run rerun txn_rerun Run assuming the db is up, rerun if not.
run txn_run svp_run run::retry txn_run::retry :: means “with”

That last one is a cute hack suggested by Rob Kinyon on IRC. As you can see, I’m pretty consistent with the non-retrying method names; it’s the methods that retry that I’m not satisfied with. A approach I’ve avoided is to use an adverb for the non-retry methods, mainly because there is no retry possible for the savepoint methods, so it seemed silly to have svp_run_safely() to complement do_safely() and txn_do_safely().

Brilliant suggestions warmly appreciated.

Looking for the comments? Try the old layout.

Database Handle and Transaction Management with DBIx::Connector

As part of my ongoing effort to wrestle Catalyst into working the way that I think it should work, I’ve just uploaded DBIx::Connector to the CPAN. See, I was using Catalyst::Model::DBI, but it turned out that I wanted to use the database handle in places other than the Catalyst parts of my app. I was bitching about this to mst on #catalyst, and he said that Catalyst::Model::DBI was actually a fork of DBIx::Class’s handle caching, and quite out of date. I said, “But this already exists. It’s called connect_cached().” I believe his response was, “OH FUCK OFF!”

So I started digging into what Catalyst::Model::DBI and DBIx::Class do to cache their database handles, and how it differs from connect_cached(). It turns out that they were pretty smart, in terms of checking to see if the process had forked or a new thread had been spawned, and if so, deactivating the old handle and then returning a new one. Otherwise, things are just cached. This approach works well in Web environments, including under mod_perl; in forking applications, like POE apps; and in plain Perl scripts. Matt said he’d always wanted to pull that functionality out of DBIx::Class and then make DBIx::Class depend on the external implementation. That way everyone could take advantage of the functionality, including people like me who don’t want to use an ORM.

So I did it. Maybe it was crazy (mmmmm…yak meat), but I can now use the same database interface in the Catalyst and POE parts of my application without worry:

my $dbh = DBIx::Connector->connect(
    'dbi:Pg:dbname=circle', 'postgres', '', {
        PrintError     => 0,
        RaiseError     => 0,
        AutoCommit     => 1,
        HandleError    => Exception::Class::DBI->handler,
        pg_enable_utf8 => 1,
    },
);

$dbh->do($sql);

But it’s not just database handle caching that I’ve included in DBIx::Connector; no, I’ve also stolen some of the transaction management stuff from DBIx::Class. All you have to do is grab the connector object which encapsulates the database handle, and take advantage of its txn_do() method:

my $conn = DBIx::Connector->new(@args);
$conn->txn_do(sub {
    my $dbh = shift;
    $dbh->do($_) for @queries;
});

The transaction is scoped to the code reference passed to txn_do(). Not only that, it avoids the overhead of calling ping() on the database handle unless something goes wrong. Most of the time, nothing goes wrong, the database is there, so you can proceed accordingly. If it is gone, however, txn_do() will re-connect and execute the code reference again. The cool think is that you will never notice that the connection was dropped – unless it’s still gone after the second execution of the code reference.

And finally, thanks to some pushback from mst, ribasushi, and others, I added savepoint support. It’s a little different than that provided by DBIx::Class; instead of relying on a magical auto_savepoint attribute that subtly changes the behavior of txn_do(), you just use the svp_do() method from within txn_do(). The scoping of subtransactions is thus nicely explicit:

$conn->txn_do(sub {
    my $dbh = shift;
    $dbh->do('INSERT INTO table1 VALUES (1)');
    eval {
        $conn->svp_do(sub {
            shift->do('INSERT INTO table1 VALUES (2)');
            die 'OMGWTF?';
        });
    };
    warn "Savepoint failed\n" if $@;
    $dbh->do('INSERT INTO table1 VALUES (3)');
});

This transaction will insert the values 1 and 3, but not 2. If you call svp_do() outside of txn_do(), it will call txn_do() for you, with the savepoint scoped to the entire transaction:

$conn->svp_do(sub {
    my $dbh = shift;
    $dbh->do('INSERT INTO table1 VALUES (4)');
    $conn->svp_do(sub {
        shift->do('INSERT INTO table1 VALUES (5)');
    });
});

This transaction will insert both 3 and 4. And note that you can nest savepoints as deeply as you like. All this is dependent on whether the database supports savepoints; so far, PostgreSQL, MySQL (InnoDB), Oracle, MSSQL, and SQLite do. If you know of others, fork the repository, commit changes to a branch, and send me a pull request!

Overall I’m very happy with this module, and I’ll probably use it in all my Perl database projects from here on in. Perhaps later I’ll build a model class on it (something like Catalyst::Model::DBI, only better!), but next up, I plan to finish documenting Template::Declare and writing some views with it. More on that soon.

"Keep DBI's connect_cached From Horking
Transactions"

Looking for the comments? Try the old layout.

Keep DBI's connect_cached From Horking Transactions

I’ve been on a bit of a Perl hacking tear lately. In addition to knocking out Test::XPath last week, I’ve been experimenting with TAP::Harness sources, Template::Declare, Catalyst views, a new Module::Build subclass for building database-backed applications, and, last but not least, an IRC logging bot. Oh, and that application I’m working on for PGX with Quinn Weaver. So much is crowding my mind these days that I’m having trouble sleeping. Tonight I’m up late hacking to try to get some of this stuff out of my head.

But before I crash, I wanted to share a useful DBI hack. I use connect_cached in a lot of my applications, because it’s a nice way to reuse database handles without having to figure out my own caching algorithm. The code I have to use it looks like this:

sub dbh {
    my $self = shift;
    DBI->connect_cached( @{ $self->_dbi }{qw(dsn username password)}, {
        PrintError     => 0,
        RaiseError     => 0,
        HandleError    => Exception::Class::DBI->handler,
        AutoCommit     => 1,
    });
}

Very simple. I just call the dbh() method whenever I need to talk to the database, and I’m set. Except for one problem: transactions.

Say I have a method that grabs the handle, starts a transaction with the begin_work method, and then inserts a row. Then another method grabs the handle from dbh() on the assumption that it’s in the same transaction, and does its own work. Only, it’s not the same transaction, because, unfortunately, DBI sets the attributes passed to connect_cached every single time it’s called!. So even though that second method may think it’s in the middle of a transaction, it’s really not, because when connect_cached sets AutoCommit back to 1, the transaction gets committed.

Oops.

This really fucks with my tests, where I’m often fetching the database handle to start a transaction, running some tests, and then wanting to rollback the transaction when I’m done. It’s irritating as all hell to discover that data has been inserted into the database. And the DBI, alas, just gives me this warning:

rollback ineffective with AutoCommit enabled at t/botinst.t line 67.

I’m likely not to notice until I get a duplicate key error the next time I run the tests.

As I was dealing with this today, my memory started poking at me, telling me that I’ve dealt with this before. And sure enough, a quick Google shows that Tim Bunce and I had an extensive conversation on this very topic – over four years ago. If you’re patient enough to dig through that thread, you’ll note that this issue is due to some architectural difficulties in the DBI, to be worked out in DBI 2.

Over the last four years, I’ve implemented a couple of solutions to this problem, all involving my code tracking the transaction state and modifying the AutoCommit attribute in the appropriate places. It’s all rather fragile. But as I dug through the thread, I discovered a much cleaner fix, using a little-known and so-far undocumented feature of the DBI: callbacks. This is actually I feature I half-way implemented in the DBI years ago, getting it just far enough that Tim was willing to finish it. And it’s just sitting there, waiting to be used.

So here’s the trick: Specify a callback for the connect_cached() method that’s used only when an existing file handle is retrieved from the cache. A bunch of stuff is passed to the callback, but the important one is the fifth argument, the attributes. All it has to do is delete the AutoCommit attribute. Since this callback is called before the DBI looks at the attributes to set them on the handle, the callback effectively prevents the DBI from horking up your transactions.

Here’s the modified code:

my $cb = {
    'connect_cached.reused' => sub { delete $_[4]->{AutoCommit} },
};

sub dbh {
    my $self = shift;
    DBI->connect_cached( @{ $self->_dbi }{qw(dsn username password)}, {
        PrintError     => 0,
        RaiseError     => 0,
        HandleError    => Exception::Class::DBI->handler,
        AutoCommit     => 1,
        Callbacks      => $cb,
    });
}

Callbacks are passed as a hash reference, with the keys being the names of the DBI methods that should trigger the callbacks, such as ping, data_sources, or connect_cached. The values are, of course, code references. When the DBI calls the code callbacks, it passes in stuff relevant to the method.

In the case of connect_cached, there are two additional special-case callbacks, connect_cached.new and connect_cached.reused, so that you can have different callbacks execute depending on whether connect_cached used a cached database handle or had to create a new one. Here, I’ve used connect_cached.reused, of course, and all I do is kill off the AutoCommit attribute before the DBI gets its greedy hands on it. Et voilà, problem solved!

And before you ask, no, you can’t simply omit AutoCommit from the attributes passed to connect_cached, because the DBI helpfully adds it for you.

So now this is here so I can find it again when it bites me next year, and I hope it helps you, too. Meanwhile, perhaps someone could take it upon themselves to document DBI’s callbacks? At this point, the closest thing to documentation is in the tests. (Hrm. I think I might have had a hand in writing them.) Check ‘em out.

Looking for the comments? Try the old layout.

More about…

Use Rubyish Blocks with Test::XPath

Thanks to the slick Devel::Declare-powered PerlX::MethodCallWithBlock created by gugod, the latest version of Test::XPath supports Ruby-style blocks. The Ruby version of assert_select, as I mentioned previously, looks like this:

assert_select "ol" { |elements|
  elements.each { |element|
    assert_select element, "li", 4
  }
}

I’ve switched to the brace syntax for greater parity with Perl. Test::XPath, meanwhile, looks like this:

my @css = qw(foo.css bar.css);
$tx->ok( '/html/head/style', sub {
    my $css = shift @css;
    shift->is( './@src', $css, "Style src should be $css");
}, 'Should have style' );

But as of Test::XPath 0.13, you can now just use PerlX::MethodCallWithBlock to pass blocks in the Rubyish way:

use PerlX::MethodCallWithBlock;
my @css = qw(foo.css bar.css);
$tx->ok( '/html/head/style', 'Should have style' ) {
    my $css = shift @css;
    shift->is( './@src', $css, "Style src should be $css");
};

Pretty slick, eh? It required a single-line change to the source code. I’m really happy with this sugar. Thanks for the great hack, gugod!

Looking for the comments? Try the old layout.

Test XML and HTML with XPath

When I was hacking Rails projects back in 2006-2007, there was a lot of stuff about Rails that drove me absolutely batshit (<cough>ActiveRecord</cough>), but there were also a (very) few things that I really liked. One of those things was the assert_select test method. There was a bunch of magic involved in sending a request to your Rails app and stuffing the body someplace hidden (hrm, that sounds kind of evil; intentional?), but then you could call assert_select to use CSS selectors to test the structure and content of the document (assuming, of course, that it was HTML or XML). For example, (to borrow from the Rails docs), if you wanted to test that a response contains two ordered lists, each with four list elements then you’d do something like this:

assert_select "ol" do |elements|
    elements.each do |element|
    assert_select element, "li", 4
    end
end

What it does is select all of the <ol> elements and pass them to the do block, where you can call assert_select on each of them. Nice, huh? You can also implicitly call assert_select on the entire array of passed elements, like so:

assert_select "ol" do
    assert_select "li", 8
end

Slick, right? I’ve always wanted to have something like this in Perl, but until last week, I didn’t really have an immediate need for it. But I’ve started on a Catalyst project with my partners at PGX, and of course I’m using a view to generate XHTML output. So I started asking around for advice on proper unit testing for Catalyst views. The answer I got was, basically, Test::WWW::Mechanize::Catalyst. But I found it insufficient:

$mech->get_ok("/");
$mech->html_lint_ok( "HTML should be valid" );
$mech->title_is( "Root", "On the root page" );
$mech->content_contains( "This is the root page", "Correct content" );

Okay, I can check the title of the document directly, which is kind of cool, but there’s no other way to examine the structure? Really? And to check the content, there’s just content_contains(), which concatenates all of the content without any tags! This is useful for certain very simple tests, but if you want to make sure that your document is properly structured, and the content is in all the right places, you’re SOL.

Furthermore, the html_link_ok() method didn’t like the Unicode characters output by my view:

#   Failed test 'HTML should be valid (http://localhost/)'
#   at t/view_TD.t line 30.
# HTML::Lint errors for http://localhost/
#  (4:3) Invalid character \x2019 should be written as &rsquo;
#  (18:5) Invalid character \xA9 should be written as &copy;
# 2 errors on the page

Of course, those characters aren’t invalid, they’re perfectly good UTF-8 characters. In some worlds, I suppose, they should be wrong, but I actually want them in my document.

So I switched to Test::XML, which uses a proper XML parser to validate a document:

ok my $res = request("http://localhost:3000/"), "Request home page";
ok $res->is_success, "Request should have succeeded";

is_well_formed_xml $res->content, "The HTML should be well-formed";

Cool, so now I know that my XHTML document is valid, it’s time to start examining the content and structure in more detail. Thinking fondly on assert_select, I went looking for a test module that uses XPath to test an XML document, and found Test::XML::XPath right in the Test::XML distribution, which looked to be just what I wanted. So I added it to my test script and added this line to test the content of the <title> tag:

is_xpath $res->content, "/html/head/title", "Welcome!";

I ran the test…and waited. It took around 20 seconds for that test to run, and then it failed!

#   Failed test at t/view_TD.t line 25.
#          got: ''
#     expected: 'Welcome!'
#   evaluating: /html/head/title
#      against: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
# <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
#  <head>
#   <title>Welcome!</title>
#  </head>
# </html>

No doubt the alert among my readership will spot the problem right away, but I was at a loss. Fortunately, Ovid was over for dinner last week, and he pointed out that it was due to the namespace. That is, the xmlns attribute of the <html> element requires that one register a namespace prefix to use in the XPath expression. He pointed me to his fork of XML::XPath, called Test::XHTML::XPath, in his Escape project. It mostly duplicates Test::XML::XPath, but contains this crucial line of code:

$xpc->registerNs( x => "http://www.w3.org/1999/xhtml" );

By registering the prefix “x” for the XHTML namespace, he’s able to write tests like this:

is_xpath $res->content, "/x:html/x:head/x:title", "Welcome!";

And that works. It seems that the XPath spec requires that one use prefixes when referring to elements within a namespace. Test::XML::XPath, alas, provides no way to register a namespace prefix.

Perhaps worse is the performance problem. I discovered that if I stripped out the DOCTYPE declaration from the XHTML before I passed it to is_xpath, the test was lightning fast. Here the issue is that XML::LibXML, used by Test::XML::XPath, is fetching the DTD from the w3.org Web site as the test runs. I can disable this by setting the no_network and recover_silently XML::LibXML options, but, again, Test::XML::XPath provides no way to do so.

Add to that the fact that Test::XML::XPath has no interface for recursive testing like assert_select and I was ready to write my own module. One could perhaps update Test::XML::XPath to be more flexible, but for the fact that it falls back on XML::XPath when it can’t find XML::LibXML, and XML::XPath, alas, behaves differently than XML::LibXML (it didn’t choke on my lack of a namespace prefix, for example). So if you ship an application that uses Test::XML::XPath, tests might fail on other systems where it would use a different XPath module than you used.

And so I have written a new test module.

Introducing Test::XPath, your Perl module for flexibly running XPath-powered tests on the content and structure of your XML and HTML documents. With this new module, the test for my Catalyst application becomes:

my $tx = Test::XPath->new( xml => $res->content, is_html => 1 );
$tx->is("/html/head/title", "Welcome", "Title should be correct" );

Notice how I didn’t need a namespace prefix there? That’s because the is_html parameter coaxes XML::LibXML into using its HTML parser instead of its XML parser. One of the side-effects of doing so is that the namespace appears to be assumed, so I can ignore it in my tests. The HTML parser doesn’t bother to fetch the DTD, either. For tests where you really need namespaces, you’d do this:

my $tx = Test::XPath->new(
    xml     => $res->content,
    xmlns   => { x => "http://www.w3.org/1999/xhtml" },
    options => { no_network => 1, recover_silently => 1 },
);
$tx->is("/x:html/x:head/x:title", "Welcome", "Title should be correct" );

Yep, you can specify XML namespace prefixes via the xmlns parameter, and pass options to XML::LibXML via the options parameter. Here I’ve shut off the network, so that XML::LibXML prevents network access, and told it to recover silently when it tries to fetch the DTD, but fails (because, you know, it can’t access the network). Not bad, eh?

Of course, the module provides the usual array of Test::More-like test methods, including ok(), is(), like() and cmp_ok(). They all work just like in Test::More, except that the first argument must be an XPath expressions. Some examples borrowed from the documentation:

$tx->ok( '//foo/bar', 'Should have bar element under foo element' );
$tx->ok( 'contains(//title, "Welcome")', 'Title should "Welcome"' );

$tx->is( '/html/head/title', 'Welcome', 'Title should be welcoming' );
$tx->isnt( '/html/head/link/@type', 'hello', 'Link type should not' );

$tx->like( '/html/head/title', qr/^Foobar Inc.: .+/, 'Title context' );
$tx->unlike( '/html/head/title', qr/Error/, 'Should be no error in title' );

$tx->cmp_ok( '/html/head/title', 'eq', 'Welcome' );
$tx->cmp_ok( '//story[1]/@id', '==', 1 );

But the real gem is the recursive testing feature of the ok() test method. By passing a code reference as the second argument, you can descend into various parts of your XML or HTML document to test things more deeply. ok() will pass if the XPath expression argument selects one or more nodes, and then it will call the code reference for each of those nodes, passing the Test::XPath object as the first argument. This is a bit different than assert_select, but I view the reduced magic as a good thing.

For example, if you wanted to test for the presence of <story> elements in your document, and to test that each such element had an incremented id attribute, you’d do something like this:

my $i = 0;
$tx->ok( '//assets/story', sub {
    shift->is('./@id', ++$i, "ID should be $i in story $i");
}, 'Should have story elements' );

For convenience, the XML::XPath object is also assigned to $_ for the duration of the call to the code reference. Either way, you can call ok() and pass code references anywhere in the hierarchy. For example, to ensure that an Atom feed has entries and that each entry has a title, a link, and a very specific author element with name, uri, and email subnodes, you can do this:

$tx->ok( '/feed/entry', sub {
    $_->ok( './title', 'Should have a title' );
    $_->ok( './author', sub {
        $_->is( './name',  'Larry Wall',       'Larry should be author' );
        $_->is( './uri',   'http://wall.org/', 'URI should be correct' );
        $_->is( './email', 'perl@example.com', 'Email should be right' );
    }, 'Should have author elements' );
}, 'Should have entry elements' );

There are a lot of core XPath functions you can use, too. For example, I’m going to write a test for every page returned by my application to make sure that I have the proper numbers of various tags:

$tx->is('count(/html)',     1, 'Should have 1 html element' );
$tx->is('count(/html/head') 1, 'Should have 1 head element' );
$tx->is('count(/html/body)  1, 'Should have 1 body element' );

I’m going to use this module to the hilt in all my tests for HTML and XML documents from here on in. The only thing I’m missing from assert_select is that it supports CSS 2 selectors, rather than XPath expressions, and the implementation offers quite a few other features including regular expression operators for matching attributes, pseudo-classes, and other fun stuff. Still, XPath gets me all that I need; the rest is just sugar, really. And with the ability to define custom XPath functions in Perl, I can live without the extra sugar.

Maybe you’ll find it useful, too.

Looking for the comments? Try the old layout.