Just a Theory

By David E. Wheeler

Posts about Unit Testing

Introducing MyTAP

I gave my OSCON tutorial (slides) last week. It went okay. I spent way too much time helping to get everyone set up with pgTAP, and then didn’t have time to have the attendees do the exercises, and I had to rush through 2.5 hours of material in 1.5 hours. Yikes! At least the video will be better when it’s released (more when that happens).

But as often happens, I was asked whether something like pgTAP exists for MySQL. But this time I was asked by MySQL Community Manager Giuseppe Maxia, who also said that he’d tried to create a test framework himself (a fellow Perl hacker!), but that it wasn’t as nice as pgTAP. Well, since I was at OSCON and tend to like to hack on side projects while at conferences, and since I hoped that Giuseppe will happily take it over once I’ve implemented the core, I started hacking on it myself. And today, I’m pleased to announce the release of MyTAP 0.01 (downloads).

Once you’ve downloaded it, install it against your MySQL server like so:

mysql -u root < mytap.sql

Here’s a very simple example script:

-- Start a transaction.
BEGIN;

-- Plan the tests.
SELECT tap.plan(1);

-- Run the tests.
SELECT tap.pass( 'My test passed, w00t!' );

-- Finish the tests and clean up.
CALL tap.finish();
ROLLBACK;

You can run this test from a .sql file using the mysql client like so:

mysql -u root --disable-pager --batch --raw --skip-column-names --unbuffered --database try --execute 'source test.sql'

But that’s a PITA and can only run one test at a time. Instead, put all of your tests into a directory, perhaps named tests, each with the suffix “.my”, and use my_prove (install TAP::Parser::SourceHandler::MyTAP from CPAN to get it) instead:

my_prove -u root --database try tests/

For MyTAP’s own tests, the output looks like this:

tests/eq.my ........ ok
tests/hastap.my .... ok
tests/matching.my .. ok
tests/moretap.my ... ok
tests/todotap.my ... ok
tests/utils.my ..... ok
All tests successful.
Files=6, Tests=137,  1 wallclock secs
(0.06 usr  0.03 sys +  0.01 cusr  0.02 csys =  0.12 CPU)
Result: PASS

Nice, eh? Of course there are quite a few more assertion functions. See the complete documentation for details.

Now, I did my best to keep the interface the same as pgTAP, but there are a few differences:

  • MySQL temporary tables are teh suck, so I had to use permanent tables to track test state. To make this more feasible, MyTAP is always installed in its own database, (named “tap” by default), and you must always schema-qualify your use of the MyTAP functions.
  • Another side-effect of permanent tables is that MyTAP must keep track of test outcomes without colliding with the state from tests running in multiple concurrent connections. So MyTAP uses connection_id() to keep track of state for a single test run. It also deletes the state when tests finish(), but if there’s a crash before then, data can be left in those tables. If the connection ID is ever re-used, this can lead to conflicts. This seems mostly avoidable by using InnoDB tables and transactions in the tests.
  • The word “is” is strictly reserved by MySQL, so the function that corresponds to pgTAP’s is() is eq() in MyTAP. Similarly, isnt() is called not_eq() in MyTAP.
  • There is no way to throw an exception in MySQL functions an procedures, so the code cheats by instead performing an illegal operation: selecting from a non-existent column, where the name of that column is the error message. Hinky, but should get the point across.

Other than these issues, things went fairly smoothly. I finished up the 0.01 version last night and released it today with most of the core functionality in place. And now I want to find others to take over, as I am not a MySQL hacker myself and thus unlikely ever to use it. If you’re interested, my recommendations for things to do next are:

So fork on GitHub or contact me if you’d like to be added as a collaborator (I’m looking at you, Giuseppe!).

Hope you find it useful.

Looking for the comments? Try the old layout.

pgTAP Best Practices Slides Available

Last month I gave two presentations at the PostgreSQL Conference West. The first was a variation on Unit Test Your Database!, which I’ve now given about six times (and will at least two more times, including tomorrow night for Portland PLUG and in two weeks at the JPUG 10th Anniversary Conference). The second was a new talk, a 90-minute tutorial, called “pgTAP Best Practices.” And here it is, published on Slideshare. Enjoy.

You can also download a 76MB PDF if that’s more your style.

Enjoy!

Looking for the comments? Try the old layout.

Use Rubyish Blocks with Test::XPath

Thanks to the slick Devel::Declare-powered PerlX::MethodCallWithBlock created by gugod, the latest version of Test::XPath supports Ruby-style blocks. The Ruby version of assert_select, as I mentioned previously, looks like this:

assert_select "ol" { |elements|
  elements.each { |element|
    assert_select element, "li", 4
  }
}

I’ve switched to the brace syntax for greater parity with Perl. Test::XPath, meanwhile, looks like this:

my @css = qw(foo.css bar.css);
$tx->ok( '/html/head/style', sub {
    my $css = shift @css;
    shift->is( './@src', $css, "Style src should be $css");
}, 'Should have style' );

But as of Test::XPath 0.13, you can now just use PerlX::MethodCallWithBlock to pass blocks in the Rubyish way:

use PerlX::MethodCallWithBlock;
my @css = qw(foo.css bar.css);
$tx->ok( '/html/head/style', 'Should have style' ) {
    my $css = shift @css;
    shift->is( './@src', $css, "Style src should be $css");
};

Pretty slick, eh? It required a single-line change to the source code. I’m really happy with this sugar. Thanks for the great hack, gugod!

Looking for the comments? Try the old layout.

Test XML and HTML with XPath

When I was hacking Rails projects back in 2006-2007, there was a lot of stuff about Rails that drove me absolutely batshit (<cough>ActiveRecord</cough>), but there were also a (very) few things that I really liked. One of those things was the assert_select test method. There was a bunch of magic involved in sending a request to your Rails app and stuffing the body someplace hidden (hrm, that sounds kind of evil; intentional?), but then you could call assert_select to use CSS selectors to test the structure and content of the document (assuming, of course, that it was HTML or XML). For example, (to borrow from the Rails docs), if you wanted to test that a response contains two ordered lists, each with four list elements then you’d do something like this:

assert_select "ol" do |elements|
    elements.each do |element|
    assert_select element, "li", 4
    end
end

What it does is select all of the <ol> elements and pass them to the do block, where you can call assert_select on each of them. Nice, huh? You can also implicitly call assert_select on the entire array of passed elements, like so:

assert_select "ol" do
    assert_select "li", 8
end

Slick, right? I’ve always wanted to have something like this in Perl, but until last week, I didn’t really have an immediate need for it. But I’ve started on a Catalyst project with my partners at PGX, and of course I’m using a view to generate XHTML output. So I started asking around for advice on proper unit testing for Catalyst views. The answer I got was, basically, Test::WWW::Mechanize::Catalyst. But I found it insufficient:

$mech->get_ok("/");
$mech->html_lint_ok( "HTML should be valid" );
$mech->title_is( "Root", "On the root page" );
$mech->content_contains( "This is the root page", "Correct content" );

Okay, I can check the title of the document directly, which is kind of cool, but there’s no other way to examine the structure? Really? And to check the content, there’s just content_contains(), which concatenates all of the content without any tags! This is useful for certain very simple tests, but if you want to make sure that your document is properly structured, and the content is in all the right places, you’re SOL.

Furthermore, the html_link_ok() method didn’t like the Unicode characters output by my view:

#   Failed test 'HTML should be valid (http://localhost/)'
#   at t/view_TD.t line 30.
# HTML::Lint errors for http://localhost/
#  (4:3) Invalid character \x2019 should be written as &rsquo;
#  (18:5) Invalid character \xA9 should be written as &copy;
# 2 errors on the page

Of course, those characters aren’t invalid, they’re perfectly good UTF-8 characters. In some worlds, I suppose, they should be wrong, but I actually want them in my document.

So I switched to Test::XML, which uses a proper XML parser to validate a document:

ok my $res = request("http://localhost:3000/"), "Request home page";
ok $res->is_success, "Request should have succeeded";

is_well_formed_xml $res->content, "The HTML should be well-formed";

Cool, so now I know that my XHTML document is valid, it’s time to start examining the content and structure in more detail. Thinking fondly on assert_select, I went looking for a test module that uses XPath to test an XML document, and found Test::XML::XPath right in the Test::XML distribution, which looked to be just what I wanted. So I added it to my test script and added this line to test the content of the <title> tag:

is_xpath $res->content, "/html/head/title", "Welcome!";

I ran the test…and waited. It took around 20 seconds for that test to run, and then it failed!

#   Failed test at t/view_TD.t line 25.
#          got: ''
#     expected: 'Welcome!'
#   evaluating: /html/head/title
#      against: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
# <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
#  <head>
#   <title>Welcome!</title>
#  </head>
# </html>

No doubt the alert among my readership will spot the problem right away, but I was at a loss. Fortunately, Ovid was over for dinner last week, and he pointed out that it was due to the namespace. That is, the xmlns attribute of the <html> element requires that one register a namespace prefix to use in the XPath expression. He pointed me to his fork of XML::XPath, called Test::XHTML::XPath, in his Escape project. It mostly duplicates Test::XML::XPath, but contains this crucial line of code:

$xpc->registerNs( x => "http://www.w3.org/1999/xhtml" );

By registering the prefix “x” for the XHTML namespace, he’s able to write tests like this:

is_xpath $res->content, "/x:html/x:head/x:title", "Welcome!";

And that works. It seems that the XPath spec requires that one use prefixes when referring to elements within a namespace. Test::XML::XPath, alas, provides no way to register a namespace prefix.

Perhaps worse is the performance problem. I discovered that if I stripped out the DOCTYPE declaration from the XHTML before I passed it to is_xpath, the test was lightning fast. Here the issue is that XML::LibXML, used by Test::XML::XPath, is fetching the DTD from the w3.org Web site as the test runs. I can disable this by setting the no_network and recover_silently XML::LibXML options, but, again, Test::XML::XPath provides no way to do so.

Add to that the fact that Test::XML::XPath has no interface for recursive testing like assert_select and I was ready to write my own module. One could perhaps update Test::XML::XPath to be more flexible, but for the fact that it falls back on XML::XPath when it can’t find XML::LibXML, and XML::XPath, alas, behaves differently than XML::LibXML (it didn’t choke on my lack of a namespace prefix, for example). So if you ship an application that uses Test::XML::XPath, tests might fail on other systems where it would use a different XPath module than you used.

And so I have written a new test module.

Introducing Test::XPath, your Perl module for flexibly running XPath-powered tests on the content and structure of your XML and HTML documents. With this new module, the test for my Catalyst application becomes:

my $tx = Test::XPath->new( xml => $res->content, is_html => 1 );
$tx->is("/html/head/title", "Welcome", "Title should be correct" );

Notice how I didn’t need a namespace prefix there? That’s because the is_html parameter coaxes XML::LibXML into using its HTML parser instead of its XML parser. One of the side-effects of doing so is that the namespace appears to be assumed, so I can ignore it in my tests. The HTML parser doesn’t bother to fetch the DTD, either. For tests where you really need namespaces, you’d do this:

my $tx = Test::XPath->new(
    xml     => $res->content,
    xmlns   => { x => "http://www.w3.org/1999/xhtml" },
    options => { no_network => 1, recover_silently => 1 },
);
$tx->is("/x:html/x:head/x:title", "Welcome", "Title should be correct" );

Yep, you can specify XML namespace prefixes via the xmlns parameter, and pass options to XML::LibXML via the options parameter. Here I’ve shut off the network, so that XML::LibXML prevents network access, and told it to recover silently when it tries to fetch the DTD, but fails (because, you know, it can’t access the network). Not bad, eh?

Of course, the module provides the usual array of Test::More-like test methods, including ok(), is(), like() and cmp_ok(). They all work just like in Test::More, except that the first argument must be an XPath expressions. Some examples borrowed from the documentation:

$tx->ok( '//foo/bar', 'Should have bar element under foo element' );
$tx->ok( 'contains(//title, "Welcome")', 'Title should "Welcome"' );

$tx->is( '/html/head/title', 'Welcome', 'Title should be welcoming' );
$tx->isnt( '/html/head/link/@type', 'hello', 'Link type should not' );

$tx->like( '/html/head/title', qr/^Foobar Inc.: .+/, 'Title context' );
$tx->unlike( '/html/head/title', qr/Error/, 'Should be no error in title' );

$tx->cmp_ok( '/html/head/title', 'eq', 'Welcome' );
$tx->cmp_ok( '//story[1]/@id', '==', 1 );

But the real gem is the recursive testing feature of the ok() test method. By passing a code reference as the second argument, you can descend into various parts of your XML or HTML document to test things more deeply. ok() will pass if the XPath expression argument selects one or more nodes, and then it will call the code reference for each of those nodes, passing the Test::XPath object as the first argument. This is a bit different than assert_select, but I view the reduced magic as a good thing.

For example, if you wanted to test for the presence of <story> elements in your document, and to test that each such element had an incremented id attribute, you’d do something like this:

my $i = 0;
$tx->ok( '//assets/story', sub {
    shift->is('./@id', ++$i, "ID should be $i in story $i");
}, 'Should have story elements' );

For convenience, the XML::XPath object is also assigned to $_ for the duration of the call to the code reference. Either way, you can call ok() and pass code references anywhere in the hierarchy. For example, to ensure that an Atom feed has entries and that each entry has a title, a link, and a very specific author element with name, uri, and email subnodes, you can do this:

$tx->ok( '/feed/entry', sub {
    $_->ok( './title', 'Should have a title' );
    $_->ok( './author', sub {
        $_->is( './name',  'Larry Wall',       'Larry should be author' );
        $_->is( './uri',   'http://wall.org/', 'URI should be correct' );
        $_->is( './email', 'perl@example.com', 'Email should be right' );
    }, 'Should have author elements' );
}, 'Should have entry elements' );

There are a lot of core XPath functions you can use, too. For example, I’m going to write a test for every page returned by my application to make sure that I have the proper numbers of various tags:

$tx->is('count(/html)',     1, 'Should have 1 html element' );
$tx->is('count(/html/head') 1, 'Should have 1 head element' );
$tx->is('count(/html/body)  1, 'Should have 1 body element' );

I’m going to use this module to the hilt in all my tests for HTML and XML documents from here on in. The only thing I’m missing from assert_select is that it supports CSS 2 selectors, rather than XPath expressions, and the implementation offers quite a few other features including regular expression operators for matching attributes, pseudo-classes, and other fun stuff. Still, XPath gets me all that I need; the rest is just sugar, really. And with the ability to define custom XPath functions in Perl, I can live without the extra sugar.

Maybe you’ll find it useful, too.

Looking for the comments? Try the old layout.

pgTAP 0.22: Test Your Results

I’m pleased to announce that, after much thinking, committing, and not an insignificant amount of hair-pulling, pgTAP 0.22 has finally landed. Download it here. Many, many thanks to all who commented on my previous posts, made suggestions, and helped me on IRC to figure out how to get all this stuff to work. The crazy thing is that it does, quite well, all the way back to PostgreSQL 8.0.

So here’s what I’ve come up with: ten simple functions. Sound like a lot? Well, it’s simpler than it might at first appear. Here’s a quick overview:

results_eq()

Compares two queries row-for-row. Pass in strings with SQL queries, strings with prepared statement names, or cursors. If the query you’re testing returns a single column, the expected results can be passed as an array. If a test fails, you get useful diagnostics:

# Failed test 146
#     Results differ beginning at row 3:
#         have: (1,Anna)
#         want: (22,Betty)

If a row is missing, the diagnostics will show it as a NULL:

# Failed test 147
#     Results differ beginning at row 5:
#         have: (1,Anna)
#         want: NULL
results_ne()

Just like results_eq(), except that it tests that the results of the two arguments are *not* equivalent. May not be very useful, but it’s cute.

set_eq()

Tests that two queries return the same results, without regard to the order of the results or duplicates. Pass in strings with SQL queries or strings with prepared statement names. As with results_eq() the expected results can be passed as an array if the test query returns a single column. Failure diagnostics look like this:

# Failed test 146
#     Extra records:
#         (87,Jackson)
#         (1,Jacob)
#     Missing records:
#         (44,Anna)
#         (86,Angelina)

If the failure is due to incompatible column types, the diagnostics will help you out there, too:

# Failed test 147
#     Columns differ between queries:
#         have: (integer,text)
#         want: (text,integer)
set_ne()

The inverse of set_eq(), the test passes if the results of the two queries are different, without regard to order or duplicate rows. No diagnostics on failure, though; if it fails, it’s because the results are the same.

set_has()

Tests that a query contains a subset of results without regard to order or duplicates. Useful if you need to ensure that a query returns at least some set of rows. Failure diagnostics are useful again:

# Failed test 122
#     Missing records:
#         (44,Anna)
#         (86,Angelina)
set_hasnt()

Tests that a query does not contain a subset of results, without regard to order or duplicates.

bag_eq()

Just like set_eq(), except that duplicates matter. So if the first query has duplicate rows, the second must have the same dupes. Diagnostics are equally useful.

bag_ne()

Just like set_ne(), except that duplicates matter.

bag_has()

Just like set_has(), except that duplicates matter.

bag_hasnt()

Just like set_hasnt(), except that duplicates matter.

Be sure to look at my previous post for usage examples. Since I wrote it, I’ve also added the ability to pass an array as the second argument to these functions. This is specifically for the case when the query you’re testing results a single column of results; the array just makes it easier to specify expected values in a common case:

SELECT results_eq(
    'SELECT * FROM active_user_ids()',
    ARRAY[ 2, 3, 4, 5]
);

Check the documentation for all the details on how to use these functions.

I’m really happy with these functions. It was definitely worth it to really think things through, look at prior art, and spend the time to try different approaches. In the process, I’ve found an approach that works in nearly all circumstances.

The one exception is in results_eq() and results_ne() on PostgreSQL 8.3 and down. The issue there is that there were no operators to compare two record objects before PostgreSQL 8.4. So for earlier versions, the code has to cast the records representing each row to text. This means that two rows can be different but appear to be the same to 8.3 and down. In practice this should be pretty rare, but I’m glad that record comparisons are more correct in 8.4

The only other issue is performance. Although you can write your tests in SQL, rather than strings containing SQL, the set and bag functions use the PL/pgSQL EXECUTE statement to execute each SQL statement and insert it into a temporary table. Then they select the data from the temporary tables once or twice to do the comparisons. That’s a lot more processing than simply running the query itself, and it slows down the performance significantly.

Similarly, the results functions use cursors and fetch each row one-at-a-time. The nice thing is that, in the event of a failure for results_eq() or a pass for results_ne(), the functions can stop fetching results before reaching the end of the queries. But either way, a fair bit of processing goes on.

I’m not sure which is slower, the set and bag functions or the results functions, but, short of adding new syntax to SQL (not an option), I could see no other way to adequately do the comparisons and emit useful diagnostics.

But those are minor caveats, I think. I’m pretty pleased with the function names and the interfaces I’ve created for them. Please download the latest and let me know what you think.

So what’s next? Well, there are a few more schema-testing functions I’d like to add, but after that, I’d like to declare pgTAP stable and start using it in new projects. I’m thinking about writing a test suite for database normalization, starting with testing that all tables have primary keys.

But that’s after my vacation. Back in two weeks.

Looking for the comments? Try the old layout.

pgTAP Set-Testing Update

I’ve been thinking more about testing SQL result sets and how to name functions that do such testing, and I’ve started to come to some conclusions. Some of the constraints I’m looking at:

  • Cursors are required for tests where the order of the results returned is important. It might be best for such functions to create the cursors themselves.

  • For comparisons where order isn’t important, the results of each SQL statement must be inserted into a temporary table and then the table used for the comparisons. Otherwise, each statement would be executed twice, as is required to calculate symmetric difference. By executing each once and storing the results in a temporary table, we get around this issue (and indeed, this one of the cases that Epic’s global() function addresses).

  • When order is not important, the most efficient way to compare result sets is with symmetric difference. However, said comparison is a set comparison, meaning that duplicate rows are ignored. So if set A has 3 rows and set B has four, but two of those four are identical, then sets A and B can still be equivalent.

I’m starting to think that I would have two basic result set testing functions, set_eq() and bag_eq(). The former would do a set comparison, while the latter would require that duplicate rows be present in both result sets. Unfortunately, that would mean that it would be difficult for set_eq to have a variation that tests ordered sets, as symmetric difference ignores relational ordering. And a simple bag_eq() function would require that the relations be ordered, as it would iterate over each row in each relation in turn and compare row to row. But as I pointed out to commenter “@dave0,” bags are not inherently ordered, so it would be imposing a requirement that’s not necessarily appropriate.

This is starting to drive me a bit nuts.

I think that there are ways to enforce an ordered comparison on a set and an unordered comparison on a bag, but both would be pretty inefficient. Maybe I should do it anyway, include the appropriate caveats in the documentation, and then improve when feasible in the future. In that case, what I’d be looking at is something like this:

set_eq( sql, sql )
Test for set equivalence of two SQL statements.
oset_eq( sql, sql )
Test for ordered set equivalence of two SQL statements.
bag_eq( sql, sql )
Test for row equivalence of two SQL statements.
obag_eq( sql, sql )
Test for ordered row equivalence of two SQL statements.

The preferred tests would be set_eq() and obag_eq(). If a single word is passed to any of these functions, it’s assumed to be a prepared statement. Cursors would be created internally for the functions that require ordered comparison. The non-ordered versions would create temporary tables to hold the values and then use those tables for the comparisons. bag_eq() would also construct cursors on the temporary tables to ensure that rows could be compared in the same order in which they were generated by the SQL statement.

Interface-wise, perhaps a boolean would be preferred to indicate whether or not to compare the rows in an ordered fashion? That would be:

set_eq( sql, sql, bool )
Test for set equivalence of two SQL statements. The sets must be in the same order if the boolean argument is true.
bag_eq( sql, sql, bool )
Test for row equivalence of two SQL statements. The bags must have their rows in the same order if the boolean argument is true.

I like that there are fewer functions this way, but is it harder to remember what the boolean is for? (It would not be required, and would default to false for both functions). Thoughts?

By the way, I would likely throw in a couple of other resultset-comparing functions:

set_includes( sql, sql )
Test that the set returned by the first statement includes the rows returned by the second statement.
set_excludes( sql, sql )
Test that the set returned by the first statement excludes the rows returned by the second statement.
bag_includes( sql, sql )
Test that the bag returned by the first statement includes the rows returned by the second statement, including duplicates.
bag_excludes( sql, sql )
Test that the bag returned by the first statement excludes the rows returned by the second statement, including duplicates.

Seem useful? Please leave a comment with your thoughts.

Looking for the comments? Try the old layout.

Need Help Naming Result Set Testing Functions

I’ve been thinking more since I posted about testing SQL result sets, and I think I’ve settled on two sets of functions for pgTAP: one that tests two SQL queries (though you will be encouraged to use a prepared statement), and one to test two cursors. I’m thinking of naming them:

  • query_gets()
  • cursor_gets()

I had been planning on *_returns() or *_yields(), but they didn’t feel right. “Returns” implies that I would be passing a query and a data structure (to me at least), and while I want to support that, too, it’s not what I was looking for right now. “Yield,” on the other hand, is more related to set-returning functions in my mind (even if PL/pgSQL doesn’t use that term). Anyway, I like the use of “gets” because it’s short and pretty unambiguous.

These function will compare query results as unordered sets, but I want variants that test ordered sets, as well. I’ve been struggling to come up with a decent name for these variants, but not liking any very well. The obvious ones are:

  • ordered_query_gets()
  • ordered_cursor_gets()

And:

  • sorted_query_gets()
  • sorted_cursor_gets()

But these are kind of long for functions that will be, I believe, used frequently. I could just add a character to get the same idea, in the spirit of sprintf:

  • oquery_gets()
  • ocursor_gets()

Or:

  • squery_gets()
  • scursor_gets()

I think that these are okay, but might be somewhat confusing. I think that the “s” variant probably won’t fly, since for sprintf and friends, the “s” stands for “string.” So I’m leaning towards the “o” variants.

But I’m throwing it out there for the masses to make suggestions: Got any ideas for better function names? Are there some relational terms for ordered sets, for example, that might make more sense? What do you think?

As a side note, I’m also considering:

  • col_is() to compare the result of a single column query to an array or other query. This would need an ordered variant, as well.
  • row_is(), although I have no idea how I’d be able to support passing a row expression to a function, since PostgreSQL doesn’t allow RECORDs to be passed to functions.

Looking for the comments? Try the old layout.

pgTAP 0.21 Drops

Mmmmm…pgTAP

I just dropped a new version of pgTAP following a few marathon hack sessions since my talk at PGCon (movie here, BTW). Actually, the new performs_ok() function came about as I wrote the presentation, but all the rest came on the flight home and in the few days since. Not sure when I’ll hack on it this much again (it’s getting a bit big at 5,758 lines of PL/pgSQL and SQL).

Overall I’m very happy with this release, as it adds a lot of new assertion functions. In particular, I added a slew of functions that test that the objects in a given schema (or visible in the search path, if you prefer) are exactly the objects that should be there. This is useful for a couple of things. For one, Norman Yamada, in his PGCon talk, mentioned that his team was using pgTAP to compare database objects between replicated databases. I like this because it’s a great example of using pgTAP for system testing, rather than just unit testing as I’ve been advocating. See, pgTAP can be used for any kind of testing!

Another use for these functions is in a large organization where many different people might be making changes to a schema. In this scenario, you might have application developers adding new objects to the database (or dropping objects) without necessarily informing the DBAs. Using, for example, tables_are() and functions_are() and continuous testing, the DBAs can see when objects have been modified by the developers. Even better, if the developers are running the pgTAP tests themselves (as they should be!), they will be reminded to add new tests for their changes when the existing tests notice that things have been added or dropped and thus fail.

Beyond that, I added a bunch of new functions for testing functions and a number of other goodies. Check out the release notes for all the details.

With these changes, I’ve finished nearly everything I’ve thought of for pgTAP. There are only a few sequence-testing functions left on the To Do list, as well as a call to add a throws_like() function, which I’ll throw in soon if no one else steps up. Beyond these changes, I have a few ideas of where to take it next, but so far I’m kind of stumped. Mainly what I think should be done is to add an interface that makes it easier to compare relations (or result sets, if you prefer). Epic does this by allowing query strings to be passed to a function, but I’d really like to keep queries in SQL rather than in SQL strings. I’ll be giving it some more thought and will post about it soon.

Looking for the comments? Try the old layout.

Unit Test Your Database!

Gave my talk at PGCon today. I felt that it went well, and was well-received. So here it is for everyone else, for posterity, thanks to SlideShare. Enjoy!

Looking for the comments? Try the old layout.

More on Database Testing

I’ve been meaning for a while to come back to the topic of database testing. After posting that entry, I thought more about the quote from a PostgreSQL core hacker, which I think bears repeating:

Well, you are testing for bugs, and bugs are pretty specific in where they appear. Writing the tests is 90% of the job; writing the infrastructure is minor. If the infrastructure has limitations, which all do, you might as well write that extra 10% too.

I had been so focused on the first sentence, on the why of database testing, that I’d not rally addressed the rest. I failed to notice that he was questioning the utility of a testing infrastructure, or what I would call a framework, like pgTAP. So let me rectify that right now by addressing his actual point.

The idea of using an established framework and protocol is to be able to focus exclusively on the task of writing tests, rather than worrying about how to analyze test results. I agree that writing tests can be time-consuming, but that doesn’t mean that one should write one’s own testing framework. The great thing about pgTAP is that it emits TAP, which can then be analyzed along with any other TAP-emitting test framework in any environment, including Perl, C, JavaScript, PHP, and even Oracle, among others.

The other argument that might support writing one’s own testing infrastructure is if it’s too hard to apply one style of testing to a given application. For example, most of the existing TAP producers provide a functional interface to writing tests. SQL, on the other hand, is not a functional language. So–leaving aside for the moment that one can provide an effective functional interface for writing database tests–even if one wanted to write a relational-style testing framework, it could still emit TAP! TAP is, after all, just a stream of text. So as long as a SQL SELECT statement returns a stream of TAP, then you can take advantage of the myriad of test analysis tools out there.

Now, I was discussing the use of TAP with a different PostgreSQL contributor, who was asking me about modifying the output of pg_regress to be TAP. The way that pg_regress works–and therefore how PostgreSQL core tests work–is simple: One writes SQL statements into a test script, and then one writes an expected output file. If the output of the tests might vary by platform, database setting, or compile-time feature, one just creates more expected files, each with the appropriate variations.

The PostgreSQL test runner, pg_regress then simply runs the script through psql and diffs the output against each expected file. If one of the files is identical to the output, the test passes. Otherwise it fails. When the tests run, the output looks like this:

parallel group (2 tests):  copyselect copy
   copy                 ... ok
   copyselect           ... ok

My core hacker correspondent was thinking of modifying this output to be TAP, something like this:

# Parallel group (2 tests):  copyselect copy
1..2
ok 1 - copy
ok 2 - copyselect

With this change, he could then run the regression tests through TAP::Harness in a cron job and send failure reports when a test failed. This is good as far as it goes, but it has a couple of significant limitations. For one, there are no diagnostics if something goes wrong. This is because, and this is the second shortcoming, it just turns the result of testing a single script into TAP, not individual assertions. There might be 1000s of SQL statements in one script, but if the test fails, one won’t know what failed until one looks at regression.diff.

One of the great features of TAP is the support for diagnostics. For example, if an assertion fails, you might see output something like this:

not ok 38 - The frobnitz should be named "foo"
# Failed test 38: "The frobnitz should be named "foo""
#         have: NULL
#         want: foo

Just changing the listing of the test scripts run does not get you this advantage. That’s not to say that it doesn’t make certain things easier, or that one couldn’t simply have shorter test scripts in order to limit the scope of what’s being tested and what’s a result. But a single test script does not make for a good assertion. In short, pg_regress tests don’t do assertions at all. They simply compare actual and expected output from very verbose scripts. This is a hell of a lot better than nothing, but is still quite limited.

I suggested to my correspondent that he consider modifying the tests he was working on to use pgTAP, instead. Of course, if you have a lot of existing tests, it might be more trouble than it’s worth to rewrite them all. But that doesn’t mean that you can’t write new tests going forward using something that’s more granular, and gives you a lot more control over the output.

His answer rather shocked me:

I’m lucky enough to only be dealing with really good developers, so I can produce software that works even without focusing specifically on low-level unit tests.

To me, it’s a misconception to think “really good developers” don’t need tests. As I said in reply, I consider myself a “really good developer,” and I’d have a whole lot more pain with the code I maintain if it weren’t for the tests I’ve written. Tests do not compensate for poor coders. Rather, they make it easier to maintain, modify, and refactor code, to fix bugs, and to add new features. Not to mention testing my code on new versions of software, such as testing my CPAN modules with Perl 5.10 and testing my client’s PostgreSQL databases with 8.3 or 8.4 as they look to upgrade from 8.2.

One place where my correspondent agreed that pg_regress could use improvement is in the whole approach to matching different outputs. Using pgTAP, one can write tests in such a way that there are different expected results depending on database settings right in the test scripts themselves! This is because the TAP output can vary in any number of ways, and the harness doesn’t care, as long as the tests pass. pg_regress is extremely strict about what it considers a passing test, and this lacks the necessary flexibility for testing some advanced features.

This is why there are no tests for multibyte character semantics (such as locale-aware sorting) in the PostgreSQL core tests.

Are you a database testing doubter? Have I convinced you yet that a rock-solid test suite can actually make your job easier and more enjoyable? Perhaps I never will, but I am still very interested in your reasons for doubting the utility of database testing. What other points should I be thinking about as I prepare for my PGCon presentation?

Looking for the comments? Try the old layout.

pgTAP 0.20 Infiltrates Community

I did all I could to stop it, but it just wasn’t possible. pgTAP 0.20 has somehow made its way from my Subversion server and infiltrated the PostgreSQL community. Can nothing be done to stop this menace? Its use leads to cleaner, more stable, and more-safely refactored code. This insanity must be stopped! Please review the following list of its added vileness since 0.19 to determine how you can stop the terrible, terrible influence on your PostgreSQL unit-testing practices that is pgTAP:

  • Changed the names of the functions tested in sql/do_tap.sql and sql/runtests.sql so that they are less likely to be ordered differently given varying collation orders provided in different locales and by different vendors. Reported by Ingmar Brouns.
  • Added the --formatter and --archive options to pg_prove.
  • Fixed the typos in pg_prove where the output of --help listed --test-match and --test-schema instead of --match and --schema.
  • Added has_cast(), hasnt_cast(), and cast_context_is().
  • Fixed a borked function signature in has_trigger().
  • Added has_operator(), has_leftop(), and has_rightop().
  • Fixed a bug where the order of columns found for multicolum indexes by has_index() could be wrong. Reported by Jeff Wartes. Thanks to Andrew Gierth for help fixing the query.

Don’t make the same mistake I did, where I wrote a lot of pgTAP tests for a client, and now testing database upgrades from 8.2 to 8.3 is just too reliable! YOU HAVE BEEN WARNED.

Good luck with your mission.

Looking for the comments? Try the old layout.

Why Test Databases?

I’m going to be doing some presentations on testing for database administrators. I’ve been really excited to be working on pgTAP, and have started using it extensively to write tests for a client with a bunch of PostgreSQL databases. But as I start to push the idea of testing within the PostgreSQL community, I’m running into some surprising resistance.

I asked a major figure in the community about this, someone who has expressed skepticism in my past presentations. He feels that it’s hard to create a generic testing framework. He told me:

Well, you are testing for bugs, and bugs are pretty specific in where they appear. Writing the tests is 90% of the job; writing the infrastructure is minor. If the infrastructure has limitations, which all do, you might as well write that extra 10% too.

I have to say that this rather surprised me. I guess I just thought that everyone was on board with the idea of testing. The PostgreSQL core, after all, has a test suite. But the idea that one writes test to test for bugs seems like a major misconception about testing: I don’t write tests to test bugs (that is what regression tests are for, but there is much more to testing than regression tests); I write tests to ensure consistent and correct behavior in my code as development continues over time.

It has become clear to me that I need to reframe my database testing presentations to emphasize not the how to go about testing; I think that pgTAP does a pretty good job of making it a straight-forward process (at least as straight-forward as when writing tests in Ruby or Perl, for example). What I have to address first is the why of testing. I need to convince database administrators that testing is an essential tool in their kits, and the way to do that is to show them why it’s essential.

With this in mind, I asked, via Twitter, why should database people test their databases? I got some great answers (and, frankly, the 140 character limit of Twitter made them admirably pithy, which is a huge help):

  • chromatic_x: @theory, because accidents that happen during tests are much easier to recover from than accidents that happen live.
  • caseywest: @Theory When you write code that’s testable you tend to write better code: cleaner interfaces, reusable components, composable pieces.
  • depesz_com: @Theory testing prevents repeating mistakes.
  • rjbs: @Theory The best ROI for me is “never ship the same bug twice.”
  • elein: @Theory trust but verify
  • cwinters: @Theory so they can change the system without feeling like they’re on a suicide mission
  • caseywest: @Theory So they can document how the system actually works.
  • hanekomu: @Theory Regression tests - to see whether, after having changed something here, something else over there falls over.
  • robrwo: @Theory Show them a case where bad data is inserted into/deleted from database because constraints weren’t set up.

Terrific ideas there. I thank you, Tweeps. But I ask here, too: Why should we write tests against our databases? Leave a comment with your (brief!) thoughts.

And thank you!

Looking for the comments? Try the old layout.

pgTAP 0.14 Released

I’ve just released pgTAP 0.14. This release focuses on getting more schema functions into your hands, as well as fixing a few issues. Changes:

  • Added SET search_path statements to uninstall_pgtap.sql.in so that it will work properly when TAP is installed in its own schema. Thanks to Ben for the catch!
  • Added commands to drop pg_version() and pg_version_num() touninstall_pgtap.sql.in.
  • Added has_index(), index_is_unique(), index_is_primary(), is_clustered(), and index_is_type().
  • Added os_name(). This is somewhat experimental. If you have uname, it’s probably correct, but assistance in improving OS detection in the Makefile would be greatly appreciated. Notably, it does not detect Windows.
  • Made ok() smarter when the test result is passed as NULL. It was dying, but now it simply fails and attaches a diagnostic message reporting that the test result was NULL. Reported by Jason Gordon.
  • Fixed an issue in check_test() where an extra character was removed from the beginning of the diagnostic output before testing it.
  • Fixed a bug comparing name[]s on PostgreSQL 8.2, previously hacked around.
  • Added has_trigger() and trigger_is().
  • Switched to pure SQL implementations of the pg_version() and pg_version_num() functions, to simplify including pgTAP in module distributions.
  • Added a note to README.pgtap about the need to avoid pg_typeof() and cmp_ok() in tests run as part of a distribution.

Enjoy!

Looking for the comments? Try the old layout.

pgTAP 0.12 Released

In anticipation of my PostgreSQL Conference West 2008 talk on Sunday, I’ve just released pgTAP 0.12. This is a minor release with just a few tweaks:

  • Updated plan() to disable warnings while it creates its tables. This means that plan() no longer send NOTICE messages when they run, although tests still might, depending on the setting of client_min_messages.
  • Added hasnt_table(), hasnt_view(), and hasnt_column().
  • Added hasnt_pk(), hasnt_fk(), col_isnt_pk(), and col_isnt_fk().
  • Added missing DROP statements to uninstall_pgtap.sql.in.

I also have an idea to add functions that return the server version number (and each of the version number parts) and an OS string, to make testing things on various versions of PostgreSQL and on various operating systems a lot simpler.

I think I’ll also spend some time in the next few weeks on an article explaining exactly what pgTAP is and why you’d want to use it. Provided, of course, I can find the tuits for that.

Looking for the comments? Try the old layout.

pgTAP 0.11 Released

So I’ve just released pgTAP 0.11. I know I said I wasn’t going to work on it for a while, but I changed my mind. Here’s what’s changed:

  • Simplified the tests so that they now load test_setup.sql instead of setting a bunch of stuff themselves. Now only test_setup.sql needs to be created from test_setup.sql.in, and the other .sql files depend on it, meaning that one no longer has to specify TAPSCHEMA for any make target other than the default.
  • Eliminated all uses of E'' in the tests, so that we don’t have to process them for testing on 8.0.
  • Fixed the spelling of ON_ROLLBACK in the test setup. Can’t believe I had it with one L in all of the test files before! Thanks to Curtis “Ovid” Poe for the spot.
  • Added a couple of variants of todo() and skip(), since I can never remember whether the numeric argument comes first or second. Thanks to PostgreSQL’s functional polymorphism, I don’t have to. Also, there are variants where the numeric value, if not passed, defaults to 1.
  • Updated the link to the pgTAP home page in pgtap.sql.in.
  • TODO tests can now nest.
  • Added todo_start(), todo_end(), and in_todo().
  • Added variants of throws_ok() that test error messages as well as error codes.
  • Converted some more tests to use check_test().
  • Added can() and can_ok().
  • Fixed a bug in check_test() where the leading whitespace for diagnostic messages could be off by 1 or more characters.
  • Fixed the installcheck target so that it properly installs PL/pgSQL into the target database before the tests run.

Now I really am going to do some other stuff for a bit, although I do want to see what I can poach from Epic Test. And I do have that talk on pgTAP next month. So I’ll be back with more soon enough.

Looking for the comments? Try the old layout.

Introducing pgTAP

So I started working on a new PostgreSQL data type this week. More on that soon; in the meantime, I wanted to create a test suite for it, and wasn’t sure where to go. The only PostgreSQL tests I’ve seen are those distributed with Elein Mustain’s tests for the email data type she created in a PostgreSQL General Bits posting from a couple of years ago. I used the same approach myself for my GTIN data type, but it was rather hard to use: I had to pay very close attention to what was output in order to tell the description output from the test output. It was quite a PITA, actually.

This time, I started down the same path, then then started thinking about Perl testing, where each unit test, or “assertion,” in the xUnit parlance, triggers output of a single line of information indicating whether or not a test succeeded. It occurred to me that I could just run a bunch of queries that returned booleans to do my testing. So my first stab looked something like this:

\set ON_ERROR_STOP 1
\set AUTOCOMMIT off
\pset format unaligned
\pset tuples_only
\pset pager
\pset null '[NULL]'

SELECT foo() = 'bar';
SELECT foo(1) = 'baz';
SELECT foo(2) = 'foo';

The output looked like this:

% psql try -f ~/Desktop/try.sql
t
t
t

Once I started down that path, and had written ten or so tests, It suddenly dawned on me that the Perl Test::More module and its core ok() subroutine worked just like that. It essentially just depends on a boolean value and outputs text based on that value. A couple minutes of hacking and I had this:

CREATE TEMP SEQUENCE __tc__;
CREATE OR REPLACE FUNCTION ok ( boolean, text ) RETURNS TEXT AS $$
    SELECT (CASE $1 WHEN TRUE THEN '' ELSE 'not ' END) || 'ok'
        || ' ' || NEXTVAL('__tc__')
        || CASE $2 WHEN '' THEN '' ELSE COALESCE( ' - ' || $2, '' ) END;
$$ LANGUAGE SQL;

I then rewrote my test queries like so:

\echo 1..3
SELECT ok( foo() = 'bar'   'foo() should return "bar"' );
SELECT ok( foo(1) = 'baz', 'foo(1) should return "baz"' );
SELECT ok( foo(2) = 'foo', 'foo(2) should return "foo"' );

Running these tests, I now got:

% psql try -f ~/Desktop/try.sql
1..3
ok 1 - foo() should return "bar"
ok 2 - foo(1) should return "baz"
ok 3 - foo(2) should return "foo"

And, BAM! I had the beginning of a test framework that emits pure TAP output.

Well, I was so excited about this that I put aside my data type for a few hours and banged out the rest of the framework. Why was this exciting to me? Because now I can use a standard test harness to run the tests, even mix them in with other TAP tests on any project I might work on. Just now, I quickly hacked together a quick script to run the tests:

use TAP::Harness;

my $harness = TAP::Harness->new({
    timer   => $opts->{timer},
    exec    => [qw( psql try -f )],
});

$harness->runtests( @ARGV );

Now I’m able to run the tests like so:

% try ~/Desktop/try.sql        
/Users/david/Desktop/try........ok   
All tests successful.
Files=1, Tests=3,  0 wallclock secs ( 0.00 usr  0.00 sys +  0.01 cusr  0.00 csys =  0.01 CPU)
Result: PASS

Pretty damn cool! And lest you wonder whether such a suite of TAP-emitting test functions is suitable for testing SQL, here are a few examples of tests I’ve written:

-- Plan the tests.
SELECT plan(4);

-- Emit a diagnostic message for users of different locales.
SELECT diag(
    E'These tests expect LC_COLLATE to be en_US.UTF-8,\n'
    || 'but yours is set to ' || setting || E'.\n'
    || 'As a result, some tests may fail. YMMV.'
)
    FROM pg_settings
    WHERE name = 'lc_collate'
    AND setting <> 'en_US.UTF-8';

SELECT is( 'a', 'a', '"a" should = "a"' );
SELECT is( 'B', 'B', '"B" should = "B"' );

CREATE TEMP TABLE try (
    name lctext PRIMARY KEY
);

INSERT INTO try (name)
VALUES ('a'), ('ab'), ('â'), ('aba'), ('b'), ('ba'), ('bab'), ('AZ');

SELECT ok( 'a' = name, 'We should be able to select the value' )
    FROM try
    WHERE name = 'a';

SELECT throws_ok(
    'INSERT INTO try (name) VALUES (''a'')',
    '23505',
    'We should get an error inserting a lowercase letter'
);

-- Finish the tests and clean up.
SELECT * FROM finish();

As you can see, it’s just SQL. And yes, I have ported most of the test functions from Test::More, as well as a couple from Test::Exception.

So, without further ado, I’d like to introduce pgTAP, a lightweight test framework for PostgreSQL implemented in PL/pgSQL and PL/SQL. I’ll be hacking on it more in the coming days, mostly to get a proper client for running tests hacked together. Then I think I’ll see if pgFoundry is interested in it.

Whaddya think? Is this something you could use? I can see many uses, myself, not only for testing a custom data type as I develop it, but also custom functions in PL/pgSQL or PL/Perl, and, heck, just regular schema stuff. I’ve had to write a lot of Perl tests to test my database schema (triggers, rules, functions, etc.), all using the DBI and being very verbose. Being able to do it all in a single psql script seems so much cleaner. And if I can end up mixing the output of those scripts in with the rest of my unit tests, so much the better!

Anyway, feedback welcome. Leave your comments, suggestions, complaints, patches, etc., below. Thanks!

Looking for the comments? Try the old layout.