Just a Theory

Black lives matter

Posts about pgTAP

Templating Tests with Sqitch

Back in September, I described how to create custom deploy, revert, and verify scripts for various types of Sqitch changes, such as adding a new table. Which is cool and all, but what I’ve found while developing databases at work is that I nearly always want to create a test script with the same name as a newly-added change.

So for the recent v0.990 release, the add command gained the ability to generate arbitrary script files from templates. To get it to work, we just have to create template files. Templates can go into ~/.sqitch/templates (for personal use) or in $(sqitch --etc-path)/templates (for use by everyone on a system). The latter is where templates are installed by default. Here’s what it looks like:

> ls $(sqitch --etc-path)/templates
deploy  revert  verify
> ls $(sqitch --etc-path)/templates/deploy
firebird.tmpl  mysql.tmpl  oracle.tmpl  pg.tmpl  sqlite.tmpl

Each directory defines the type of script and the name of the directory in which it will be created. The contents are default templates, one for each engine.

To create a default test template, all we have to do is create a template for our preferred engine in a directory named test. So I created ~/.sqitch/templates/test/pg.tmpl. Here it is:

SET client_min_messages TO warning;
CREATE EXTENSION IF NOT EXISTS pgtap;
RESET client_min_messages;

BEGIN;
SELECT no_plan();
-- SELECT plan(1);

SELECT pass('Test [% change %]!');

SELECT finish();
ROLLBACK;

This is my standard boilerplate for tests, more or less. It just loads pgTAP, sets the plan, runs the tests, finishes and rolls back. See this template in action:

> sqitch add whatever -n 'Adds whatever.'
Created deploy/whatever.sql
Created revert/whatever.sql
Created test/whatever.sql
Created verify/whatever.sql
Added "whatever" to sqitch.plan

Cool, it added the test script. Here’s what it looks like:

SET client_min_messages TO warning;
CREATE EXTENSION IF NOT EXISTS pgtap;
RESET client_min_messages;

BEGIN;
SELECT no_plan();
-- SELECT plan(1);

SELECT pass('Test whatever!');

SELECT finish();
ROLLBACK;

Note that it replaced the change variable in the call to pass(). All ready to start writing tests! Nice, right? If we don’t want the test script created – for example when adding a column to a table for which a test already exists – we use the --without option to omit it:

> sqitch add add_timestamp_column --without test -n 'Adds whatever.'
Created deploy/add_timestamp_column.sql
Created revert/add_timestamp_column.sql
Created verify/add_timestamp_column.sql
Added "add_timestamp_column" to sqitch.plan

Naturally you’ll want to update the existing test to validate the new column.

In the previous templating post, we added custom scripts as for CREATE TABLE changes; now we can add a test template, too. This one takes advantage of the advanced features of Template Toolkit. We name it ~/.sqitch/templates/test/createtable.tmpl to complement the deploy, revert, and verify scripts created previously:

-- Test [% change %]
SET client_min_messages TO warning;
CREATE EXTENSION IF NOT EXISTS pgtap;
RESET client_min_messages;

BEGIN;
SELECT no_plan();
-- SELECT plan(1);

SET search_path TO [% IF schema %][% schema %],[% END %]public;

SELECT has_table('[% table or change %]');
SELECT has_pk( '[% table or change %]' );

[% FOREACH col IN column -%]
SELECT has_column(        '[% table or change %]', '[% col %]' );
SELECT col_type_is(       '[% table or change %]', '[% col %]', '[% type.item( loop.index ) or 'text' %]' );
SELECT col_not_null(      '[% table or change %]', '[% col %]' );
SELECT col_hasnt_default( '[% table or change %]', '[% col %]' );

[% END %]
SELECT finish();
ROLLBACK;

As before, we tell the add command to use the createtable templates:

> sqitch add corp_widgets --template createtable \
  -s schema=corp -s table=widgets \
  -s column=id -s type=SERIAL \
  -s column=name -s type=TEXT \
  -s column=quantity -s type=INTEGER \
  -n 'Add corp.widgets table.'

This yields a very nice test script to get you going:

-- Test corp_widgets
SET client_min_messages TO warning;
CREATE EXTENSION IF NOT EXISTS pgtap;
RESET client_min_messages;

BEGIN;
SELECT no_plan();
-- SELECT plan(1);

SET search_path TO corp,public;

SELECT has_table('widgets');
SELECT has_pk( 'widgets' );

SELECT has_column(        'widgets', 'id' );
SELECT col_type_is(       'widgets', 'id', 'SERIAL' );
SELECT col_not_null(      'widgets', 'id' );
SELECT col_hasnt_default( 'widgets', 'id' );

SELECT has_column(        'widgets', 'name' );
SELECT col_type_is(       'widgets', 'name', 'TEXT' );
SELECT col_not_null(      'widgets', 'name' );
SELECT col_hasnt_default( 'widgets', 'name' );

SELECT has_column(        'widgets', 'quantity' );
SELECT col_type_is(       'widgets', 'quantity', 'INTEGER' );
SELECT col_not_null(      'widgets', 'quantity' );
SELECT col_hasnt_default( 'widgets', 'quantity' );


SELECT finish();
ROLLBACK;

I don’t know about you, but I’ll be using this functionality a lot.

Agile Database Development Tutorial

I gave a tutorial at PGCon a couple weeks back, entitled “Agile Database Development with Git, Sqitch, and pgTAP.” It went well, I think. The Keynote document and an exported PDF have been posted on PGCon.org, and also uploaded here and to Speaker Deck. And embedded below, too. Want to follow along? Clone the tutorial Git repository and follow along. Here’s the teaser:

Hi, I’m David. I like to write database apps. Just as much as I like to write web apps. (Maybe more!) How? Not by relying on bolted-on, half-baked database integration tools like migrations, I’ll tell you that!. Instead, I make extensive use of best-of-breed tools for source control (Git), database unit testing (pgTAP), and database change management and deployment (Sqitch). If you’d like to get as much pleasure out of database development as you do application development, join me for this tutorial. We’ll develop a sample application using the processes and tools I’ve come to depend on, and you’ll find out whether they might work for you. Either way, I promise it will at least be an amusing use of your time.

Looking for the comments? Try the old layout.

Introducing MyTAP

I gave my OSCON tutorial (slides) last week. It went okay. I spent way too much time helping to get everyone set up with pgTAP, and then didn’t have time to have the attendees do the exercises, and I had to rush through 2.5 hours of material in 1.5 hours. Yikes! At least the video will be better when it’s released (more when that happens).

But as often happens, I was asked whether something like pgTAP exists for MySQL. But this time I was asked by MySQL Community Manager Giuseppe Maxia, who also said that he’d tried to create a test framework himself (a fellow Perl hacker!), but that it wasn’t as nice as pgTAP. Well, since I was at OSCON and tend to like to hack on side projects while at conferences, and since I hoped that Giuseppe will happily take it over once I’ve implemented the core, I started hacking on it myself. And today, I’m pleased to announce the release of MyTAP 0.01 (downloads).

Once you’ve downloaded it, install it against your MySQL server like so:

mysql -u root < mytap.sql

Here’s a very simple example script:

-- Start a transaction.
BEGIN;

-- Plan the tests.
SELECT tap.plan(1);

-- Run the tests.
SELECT tap.pass( 'My test passed, w00t!' );

-- Finish the tests and clean up.
CALL tap.finish();
ROLLBACK;

You can run this test from a .sql file using the mysql client like so:

mysql -u root --disable-pager --batch --raw --skip-column-names --unbuffered --database try --execute 'source test.sql'

But that’s a PITA and can only run one test at a time. Instead, put all of your tests into a directory, perhaps named tests, each with the suffix “.my”, and use my_prove (install TAP::Parser::SourceHandler::MyTAP from CPAN to get it) instead:

my_prove -u root --database try tests/

For MyTAP’s own tests, the output looks like this:

tests/eq.my ........ ok
tests/hastap.my .... ok
tests/matching.my .. ok
tests/moretap.my ... ok
tests/todotap.my ... ok
tests/utils.my ..... ok
All tests successful.
Files=6, Tests=137,  1 wallclock secs
(0.06 usr  0.03 sys +  0.01 cusr  0.02 csys =  0.12 CPU)
Result: PASS

Nice, eh? Of course there are quite a few more assertion functions. See the complete documentation for details.

Now, I did my best to keep the interface the same as pgTAP, but there are a few differences:

  • MySQL temporary tables are teh suck, so I had to use permanent tables to track test state. To make this more feasible, MyTAP is always installed in its own database, (named “tap” by default), and you must always schema-qualify your use of the MyTAP functions.
  • Another side-effect of permanent tables is that MyTAP must keep track of test outcomes without colliding with the state from tests running in multiple concurrent connections. So MyTAP uses connection_id() to keep track of state for a single test run. It also deletes the state when tests finish(), but if there’s a crash before then, data can be left in those tables. If the connection ID is ever re-used, this can lead to conflicts. This seems mostly avoidable by using InnoDB tables and transactions in the tests.
  • The word “is” is strictly reserved by MySQL, so the function that corresponds to pgTAP’s is() is eq() in MyTAP. Similarly, isnt() is called not_eq() in MyTAP.
  • There is no way to throw an exception in MySQL functions an procedures, so the code cheats by instead performing an illegal operation: selecting from a non-existent column, where the name of that column is the error message. Hinky, but should get the point across.

Other than these issues, things went fairly smoothly. I finished up the 0.01 version last night and released it today with most of the core functionality in place. And now I want to find others to take over, as I am not a MySQL hacker myself and thus unlikely ever to use it. If you’re interested, my recommendations for things to do next are:

So fork on GitHub or contact me if you’d like to be added as a collaborator (I’m looking at you, Giuseppe!).

Hope you find it useful.

Looking for the comments? Try the old layout.

Important Announcement at OSCON Next Week

Flipr Antisocial Networking

Image: Logo design by Strongrrl.

A sneak peak at what I’m working on for my tutorial session at OSCON. Be there at 8:30 Monday morning for the important details. You’re sure to find my new venture exciting—perhaps the most important social media announcement of 2010. You can’t afford to miss that, can you?

Looking for the comments? Try the old layout.

JPUG Talk Posted

No Perl content today, I’m afraid. I’m just back from my trip to Japan and wanted to post this very nice video of my talk [Update 2018: Sadly gone now]. Unlike the versions from other conferences, this one focuses primarily on the slides, with me appearing in audio only. This makes it really easy to follow. Enjoy.

Looking for the comments? Try the old layout.

Test Everything with TAP Source Handlers

I’ve just arrived in Japan with my family. We’re going to be spending several days in Tokyo, during which time I’ll be at the JPUG 10th Anniversary PostgreSQL Conference for a couple of days (giving the usual talk), but mainly I’ll be on vacation. We’ll be visiting Kyoto, too. We’re really excited about this trip; it’ll be a great experience for Anna. I’ll be back in the saddle in December, so for those of you anxiously awaiting the next installment of my Catalyst tutorial, I’m afraid you’ll have to wait a bit longer.

In the meantime, I wanted to write about a little something that’s been cooking for a while. Over the last several months, Steve Purkis has been working on a new feature for TAP::Parser: source handlers. The idea is to make it easier for developers to add support for TAP emitters other than Perl. The existing implementation did a decent job of handling Perl test scripts, of course, and executable files (useful for compiled tests in C using libtap, for example), but anything else was difficult.

As the author of pgTAP, I was of course greatly interested in this work, because I had to bend over backwards to get pg_prove to work nicely. It’s even uglier to get a Module::Build-based distribution to run pgTAP and Perl tests all at once in during ./Build test: You had to subclass Module::Build to do it.

Steve wanted to solve this problem, and he did. Then he was kind enough to listen to my bitching an moaning and rewrite his fix so that it was simpler for third parties (read: me) to add new source handlers. What’s a source handler, you ask? Check out the latest dev release of Test::Harness and you’ll see it: TAP::Parser::SourceHandler. As soon as Steve committed it, I jumped in and implemented a new handler for pgTAP. The cool thing is that it took me only three hours to do, including tests. And here’s how you use it in a Build.PL, so that you can have pgTAP tests named *.pg run at the same time as your *.t Perl tests:

Module::Build->new(
    module_name        => 'MyApp',
    test_file_exts     => [qw(.t .pg)],
    use_tap_harness    => 1,
    tap_harness_args   => {
        sources => {
            Perl  => undef,
            pgTAP => {
                dbname   => 'try',
                username => 'postgres',
                suffix   => '.pg',
            },
        }
    },
    build_requires     => {
        'Module::Build'                      => '0.30',
        'TAP::Parser::SourceHandler::pgTAP' => '3.19',
    },
)->create_build_script;

To summarize, you just have to:

  • Tell Module::Build the extensions of your test scripts (that’s qw(.t .pg) here)
  • Specify the Perl source with its defaults (that’s what the undef does)
  • Specify the pgTAP options (database name, username, suffix, and lots of other potential settings)

And that’s it. You’re done! Run your tests with the usual incantation:

perl Build.PL
./Build test

You can use pgTAP and its options with prove, too, via the --source and --pgtap-option options:

prove --source pgTAP --pgtap-option dbname=try \
                     --pgtap-option username=postgres \
                     --pgtap-option suffix=.pg \
                     t/sometest.pg

It’s great that it’s now so much easier to support pgTAP tests, but what if you want to have Ruby tests? Or PHP? Well, it’s a simple process to write your own source handler. Here’s how:

  • Subclass TAP::Parser::SourceHandler. The final part of the package name is the name of the source. Thus if you wrote TAP::Parser::SourceHandler::Ruby, the name of your source would be “Ruby”.

  • Load the necessary modules and register your source handler. For a Ruby source handler, it might look like this:

    package TAP::Parser::SourceHandler::Ruby;
    use strict;
    use warnings;
    
    use TAP::Parser::IteratorFactory   ();
    use TAP::Parser::Iterator::Process ();
    TAP::Parser::IteratorFactory->register_handler(__PACKAGE__);
    
  • Implement the can_handle() method. The task of this method is to return a score between 0 and 1 for how likely it is that your source handler can handle a given source. A bunch of information is passed in a hash to the method, so you can check it all out. For example, if you wanted to run Ruby tests ending in .rb, you might write something like this:

    sub can_handle {
        my ( $class, $source ) = @_;
        my $meta = $source->meta;
    
        # If it's not a file (test script), we're not interested.
        return 0 unless $meta->{is_file};
    
        # Get the file suffix, if any.
        my $suf = $meta->{file}{lc_ext};
    
        # If the config specifies a suffix, it's required.
        if ( my $config = $source->config_for('Ruby') ) {
            if ( defined $config->{suffix} ) {
                # Return 1 for a perfect score.
                return $suf eq $config->{suffix} ? 1 : 0;
            }
        }
    
        # Otherwise, return a score for our supported suffix.
        return $suf eq '.rb' ? 0.8 : 0;
    }
    

    The last line is the most important: it returns 0.8 if the suffix is .rb, saying that it’s likely that this handler can handle the test. But the middle bit is interesting, too. The $source->config_for('Ruby') call is seeing if the user specified a suffix, either via the command-line or in the options. So in a Build.PL, that might be:

        tap_harness_args => {
            sources => {
                Perl => undef,
                Ruby => { suffix => '.rub' },
            }
        },
    

    Meaning that the user wanted to run tests ending in .rub as Ruby tests. It can also be done on the command-line with prove:

    prove --source Ruby --ruby-option suffix=.rub
    

    Cool, eh? We have a reasonable default for Ruby tests, .rb, but the user can override however she likes.

  • And finally, implement the make_iterator() method. The job of this method is simply to create a TAP::Parser::Iterator object to actually run the test. It might look something like this:

    sub make_iterator {
        my ( $class, $source ) = @_;
        my $config = $source->config_for('Ruby');
    
        my $fn = ref $source->raw ? ${ $source->raw } : $source->raw;
        $class->_croak(
            'No such file or directory: ' . defined $fn ? $fn : ''
        ) unless $fn && -e $fn;
    
        return TAP::Parser::Iterator::Process->new({
            command => [$config->{ruby} || 'ruby', $fn ],
            merge   => $source->merge
        });
    }
    

    Simple, right? Just make sure we have a valid file to execute, then instantiate and return a TAP::Parser::Iterator::Process object to actually run the test.

That’s it. Just two methods and you’re ready to go. I’ve even added support for a suffix option and a ruby option (so that you can point to the ruby executable in case it’s not in your path). Using it is easy. I wrote a quick TAP-emitting Ruby script like so:

puts 'ok 1 - This is a test'
puts 'ok 2 - This is another test'
puts 'not ok 3 - This is a failed test'

And to run this test (assuming that TAP::Parser::SourceHandler::Ruby has been installed somewhere where Perl can find it), it’s just:

% prove --source Ruby ~/try.rb --verbose
/Users/david/try.rb .. 
ok 1 - This is a test
ok 2 - This is another test
not ok 3 - This is a failed test
Failed 1/3 subtests 

Test Summary Report
-------------------
/Users/david/try.rb (Wstat: 0 Tests: 3 Failed: 1)
  Failed test:  3
  Parse errors: No plan found in TAP output
Files=1, Tests=3,  0 wallclock secs ( 0.02 usr +  0.01 sys =  0.03 CPU)
Result: FAIL

It’s so easy to create new source handlers now, especially if all you have to do is support a new dynamic language. I’ve put the simple Ruby example over here; feel free to take it and run with it!

Looking for the comments? Try the old layout.

pgTAP Best Practices Slides Available

Last month I gave two presentations at the PostgreSQL Conference West. The first was a variation on Unit Test Your Database!, which I’ve now given about six times (and will at least two more times, including tomorrow night for Portland PLUG and in two weeks at the JPUG 10th Anniversary Conference). The second was a new talk, a 90-minute tutorial, called “pgTAP Best Practices.” And here it is (download and Slideshare). Enjoy!

Looking for the comments? Try the old layout.

pgTAP 0.22: Test Your Results

I’m pleased to announce that, after much thinking, committing, and not an insignificant amount of hair-pulling, pgTAP 0.22 has finally landed. Download it here. Many, many thanks to all who commented on my previous posts, made suggestions, and helped me on IRC to figure out how to get all this stuff to work. The crazy thing is that it does, quite well, all the way back to PostgreSQL 8.0.

So here’s what I’ve come up with: ten simple functions. Sound like a lot? Well, it’s simpler than it might at first appear. Here’s a quick overview:

results_eq()

Compares two queries row-for-row. Pass in strings with SQL queries, strings with prepared statement names, or cursors. If the query you’re testing returns a single column, the expected results can be passed as an array. If a test fails, you get useful diagnostics:

# Failed test 146
#     Results differ beginning at row 3:
#         have: (1,Anna)
#         want: (22,Betty)

If a row is missing, the diagnostics will show it as a NULL:

# Failed test 147
#     Results differ beginning at row 5:
#         have: (1,Anna)
#         want: NULL
results_ne()

Just like results_eq(), except that it tests that the results of the two arguments are *not* equivalent. May not be very useful, but it’s cute.

set_eq()

Tests that two queries return the same results, without regard to the order of the results or duplicates. Pass in strings with SQL queries or strings with prepared statement names. As with results_eq() the expected results can be passed as an array if the test query returns a single column. Failure diagnostics look like this:

# Failed test 146
#     Extra records:
#         (87,Jackson)
#         (1,Jacob)
#     Missing records:
#         (44,Anna)
#         (86,Angelina)

If the failure is due to incompatible column types, the diagnostics will help you out there, too:

# Failed test 147
#     Columns differ between queries:
#         have: (integer,text)
#         want: (text,integer)
set_ne()

The inverse of set_eq(), the test passes if the results of the two queries are different, without regard to order or duplicate rows. No diagnostics on failure, though; if it fails, it’s because the results are the same.

set_has()

Tests that a query contains a subset of results without regard to order or duplicates. Useful if you need to ensure that a query returns at least some set of rows. Failure diagnostics are useful again:

# Failed test 122
#     Missing records:
#         (44,Anna)
#         (86,Angelina)
set_hasnt()

Tests that a query does not contain a subset of results, without regard to order or duplicates.

bag_eq()

Just like set_eq(), except that duplicates matter. So if the first query has duplicate rows, the second must have the same dupes. Diagnostics are equally useful.

bag_ne()

Just like set_ne(), except that duplicates matter.

bag_has()

Just like set_has(), except that duplicates matter.

bag_hasnt()

Just like set_hasnt(), except that duplicates matter.

Be sure to look at my previous post for usage examples. Since I wrote it, I’ve also added the ability to pass an array as the second argument to these functions. This is specifically for the case when the query you’re testing results a single column of results; the array just makes it easier to specify expected values in a common case:

SELECT results_eq(
    'SELECT * FROM active_user_ids()',
    ARRAY[ 2, 3, 4, 5]
);

Check the documentation for all the details on how to use these functions.

I’m really happy with these functions. It was definitely worth it to really think things through, look at prior art, and spend the time to try different approaches. In the process, I’ve found an approach that works in nearly all circumstances.

The one exception is in results_eq() and results_ne() on PostgreSQL 8.3 and down. The issue there is that there were no operators to compare two record objects before PostgreSQL 8.4. So for earlier versions, the code has to cast the records representing each row to text. This means that two rows can be different but appear to be the same to 8.3 and down. In practice this should be pretty rare, but I’m glad that record comparisons are more correct in 8.4

The only other issue is performance. Although you can write your tests in SQL, rather than strings containing SQL, the set and bag functions use the PL/pgSQL EXECUTE statement to execute each SQL statement and insert it into a temporary table. Then they select the data from the temporary tables once or twice to do the comparisons. That’s a lot more processing than simply running the query itself, and it slows down the performance significantly.

Similarly, the results functions use cursors and fetch each row one-at-a-time. The nice thing is that, in the event of a failure for results_eq() or a pass for results_ne(), the functions can stop fetching results before reaching the end of the queries. But either way, a fair bit of processing goes on.

I’m not sure which is slower, the set and bag functions or the results functions, but, short of adding new syntax to SQL (not an option), I could see no other way to adequately do the comparisons and emit useful diagnostics.

But those are minor caveats, I think. I’m pretty pleased with the function names and the interfaces I’ve created for them. Please download the latest and let me know what you think.

So what’s next? Well, there are a few more schema-testing functions I’d like to add, but after that, I’d like to declare pgTAP stable and start using it in new projects. I’m thinking about writing a test suite for database normalization, starting with testing that all tables have primary keys.

But that’s after my vacation. Back in two weeks.

Looking for the comments? Try the old layout.

Committed: pgTAP Result Set Assertion Functions

Regular readers will know that I’ve been thinking a lot about testing SQL result sets and how to how to name result testing functions, and various implementation issues. I am very happy to say that I’ve now committed the first three such test functions to the Git repository. They’ve been tested on 8.4 and 8.3. Here’s what I came up with.

I had a pretty good idea how to compare sets and how to compare ordered bags, but ordered sets and unordered bags of results escaped me. During two days of intense hacking and experimentation, I quickly wrote set_eq(), which performs a set comparison of the results of two queries, and obag_eq(), which performs a row-by-row comparison of the results of two queries. I then set to work on bag_eq(), which would do a set comparison but require the same number of duplicate rows between the two queries. set_eq() was easy because I just needed to create temporary tables of the two queries and then execute two EXCEPT queries against them to see where they differ, if at all. bag_eq() was getting kind of hairy, though, so I asked about it on the Freenode #postgresql channel, where depesz looked at my example and pointed out that EXCEPT ALL would do just want I needed.

Hot damn, all it took was the addition a single extra word to the same queries used by set_eq() and I was set. This made me very happy, and such well-thought-out features are the reason I love PostgreSQL. My main man depesz made my day.

But oset_eq(), which was to compare ordered sets of results was proving much harder. The relational operators that operate on sets don’t care about order, so I would have to write the code to care myself. And because dupes needed to be ignored, it got even harder. In fact, it proved just not worth the effort. The main reason I abandoned this test function, though, was not difficulties of implementation (which were significant), but ambiguity of interpretation. After all, if duplicates are allowed but ignored, how does one deal with their effect on order? For example, say that I have two queries that order people based on name. One query might order them like so:

select * from people order by name;
  name  | age 
--------+-----
 Damian |  19
 Larry  |  53
 Tom    |  35
 Tom    |  44
 Tom    |  35

Another run of the same query could give me a different order:

select * from people order by name;
  name  | age 
--------+-----
 Damian |  19
 Larry  |  53
 Tom    |  35
 Tom    |  35
 Tom    |  44

Because I ordered only on “name,” the database was free to sort records with the same name in an undefined way. Meaning that the rows could be in different orders. This is known, if I understand correctly, as a “Partially ordered set,” or “poset.” Which is all well and good, but from my point of view makes it damn near impossible to be able to do a row-by-row comparison and ignore dupes, because they could be in different orders!

So once I gave up on that, I was down to three functions instead of four, and only one depends on ordering. So I also dropped the idea of having the “o” in the function names. Instead, I changed obag_eq() to results_eq(), and now I think I have three much more descriptive names. To summarize, the functions are:

results_eq
Compares two result sets row by row, meaning that they must be in the same order and have the same number of duplicate rows in the same places.
set_eq
Compares result sets to ensure they have the same rows, without regard to order or duplicate rows.
bag_eq
Compares result sets without regard to order, but each must have the same duplicate rows.

I’m very happy with this, because I was able to give up on the stupid function names with the word “order” included or implicit in them. Plus, I have different names for functions that are similar, which is nicely in adherence to the principle of distinction. They all provide nice diagnostics on failure, as well, like this from results_eq():

# Failed test 146
#     Results differ beginning at row 3:
#         have: (1,Anna)
#         want: (22,Betty)

Or this from set_eq() or bag_eq()

# Failed test 146
#     Extra records:
#         (87,Jackson)
#         (1,Jacob)
#     Missing records:
#         (44,Anna)
#         (86,Angelina)

set_eq() and bag_eq() also offer up useful diagnostics when the data types of the rows vary:

# Failed test 147
#     Columns differ between queries:
#         have: (integer,text)
#         want: (text,integer)

results_eq() doesn’t have access to such data, though if I can find some tuits (got any to give me?), I’ll write a quick C function that can return an array of the data types in a record object.

Now, as for the issue of arguments, what I settled on is, like Epic, passing strings of SQL to these functions. However, unlike Epic, if you pass in a simple string with no spaces, or a double-quoted string, pgTAP assumes that it’s the name of a prepared statement. The documentation now recommends prepared statements, which you can use like this:

PREPARE my_test AS SELECT * FROM active_users() WHERE name LIKE 'A%';
PREPARE expect AS SELECT * FROM users WHERE active = $1 AND name LIKE $2;
SELECT results_eq('my_test', 'expect');

This allows you to keep your SQL written as SQL, keeping your test, um, SQLish. But in those cases where you have some really simple SQL, you can just use that, too:

SELECT set_eq(
    'SELECT * FROM active_users()',
    'SELECT * FROM users ORDER BY id'
);

This feels like a good compromise to me, allowing the best of both worlds: keeping things in pure SQL to avoid quoting ugliness in SQL strings, while letting users pass in SQL strings if they really want to.

It turns out that I wasn’t able to support cursors for set_eq() or bag_eq(), because they use the statements passed to them to create temporary tables and then compare the records in those temporary tables. But results_eq() uses cursors internally. And it turns out that there’s a data type for cursors, refcursor. So it was easy to add cursor support to results_eq() for those who want to use it:

DECLARE cwant CURSOR FOR SELECT * FROM active_users();
DECLARE chave CURSOR FOR SELECT * FROM users WHERE active ORDER BY name;
SELECT results_eq('cwant'::refcursor, 'chave'::refcursor );

Neat, huh? As I said, I’m very pleased with this approach overall. There are a few caveats, such as less strict comparisons in results_eq() on 8.3 and lower, and less useful diagnostics for data type differences in results_eq(), but overall, I think that the implementation is pretty good, and that these functions will be really useful.

So what do you think? Please clone the Git repository and take the functions for a test drive on 8.3 or 8.4. Let me know what you think!

In the meantime, before releasing a new version, I still plan to add:

  • set_includes() - Set includes records in another set.
  • set_excludes() - Set excludes records in another set.
  • bag_includes() - Bag includes records in another bag.
  • bag_excludes() - Bag excludes records in another bag.
  • col_eq() - Single column result set equivalent to an array of values.
  • row_eq() - Single row form a query equivalent to a record.
  • rowtype_is() - The data type of the rows in a query is equivalent to an array of types.

Hopefully I can find some time to work on those next week. The only challenging one is row_eq(), so I may skip that one for now.

Looking for the comments? Try the old layout.

pgTAP Set-Testing Update

I’ve been thinking more about testing SQL result sets and how to name functions that do such testing, and I’ve started to come to some conclusions. Some of the constraints I’m looking at:

  • Cursors are required for tests where the order of the results returned is important. It might be best for such functions to create the cursors themselves.

  • For comparisons where order isn’t important, the results of each SQL statement must be inserted into a temporary table and then the table used for the comparisons. Otherwise, each statement would be executed twice, as is required to calculate symmetric difference. By executing each once and storing the results in a temporary table, we get around this issue (and indeed, this one of the cases that Epic’s global() function addresses).

  • When order is not important, the most efficient way to compare result sets is with symmetric difference. However, said comparison is a set comparison, meaning that duplicate rows are ignored. So if set A has 3 rows and set B has four, but two of those four are identical, then sets A and B can still be equivalent.

I’m starting to think that I would have two basic result set testing functions, set_eq() and bag_eq(). The former would do a set comparison, while the latter would require that duplicate rows be present in both result sets. Unfortunately, that would mean that it would be difficult for set_eq to have a variation that tests ordered sets, as symmetric difference ignores relational ordering. And a simple bag_eq() function would require that the relations be ordered, as it would iterate over each row in each relation in turn and compare row to row. But as I pointed out to commenter “@dave0,” bags are not inherently ordered, so it would be imposing a requirement that’s not necessarily appropriate.

This is starting to drive me a bit nuts.

I think that there are ways to enforce an ordered comparison on a set and an unordered comparison on a bag, but both would be pretty inefficient. Maybe I should do it anyway, include the appropriate caveats in the documentation, and then improve when feasible in the future. In that case, what I’d be looking at is something like this:

set_eq( sql, sql )
Test for set equivalence of two SQL statements.
oset_eq( sql, sql )
Test for ordered set equivalence of two SQL statements.
bag_eq( sql, sql )
Test for row equivalence of two SQL statements.
obag_eq( sql, sql )
Test for ordered row equivalence of two SQL statements.

The preferred tests would be set_eq() and obag_eq(). If a single word is passed to any of these functions, it’s assumed to be a prepared statement. Cursors would be created internally for the functions that require ordered comparison. The non-ordered versions would create temporary tables to hold the values and then use those tables for the comparisons. bag_eq() would also construct cursors on the temporary tables to ensure that rows could be compared in the same order in which they were generated by the SQL statement.

Interface-wise, perhaps a boolean would be preferred to indicate whether or not to compare the rows in an ordered fashion? That would be:

set_eq( sql, sql, bool )
Test for set equivalence of two SQL statements. The sets must be in the same order if the boolean argument is true.
bag_eq( sql, sql, bool )
Test for row equivalence of two SQL statements. The bags must have their rows in the same order if the boolean argument is true.

I like that there are fewer functions this way, but is it harder to remember what the boolean is for? (It would not be required, and would default to false for both functions). Thoughts?

By the way, I would likely throw in a couple of other resultset-comparing functions:

set_includes( sql, sql )
Test that the set returned by the first statement includes the rows returned by the second statement.
set_excludes( sql, sql )
Test that the set returned by the first statement excludes the rows returned by the second statement.
bag_includes( sql, sql )
Test that the bag returned by the first statement includes the rows returned by the second statement, including duplicates.
bag_excludes( sql, sql )
Test that the bag returned by the first statement excludes the rows returned by the second statement, including duplicates.

Seem useful? Please leave a comment with your thoughts.

Looking for the comments? Try the old layout.

Need Help Naming Result Set Testing Functions

I’ve been thinking more since I posted about testing SQL result sets, and I think I’ve settled on two sets of functions for pgTAP: one that tests two SQL queries (though you will be encouraged to use a prepared statement), and one to test two cursors. I’m thinking of naming them:

  • query_gets()
  • cursor_gets()

I had been planning on *_returns() or *_yields(), but they didn’t feel right. “Returns” implies that I would be passing a query and a data structure (to me at least), and while I want to support that, too, it’s not what I was looking for right now. “Yield,” on the other hand, is more related to set-returning functions in my mind (even if PL/pgSQL doesn’t use that term). Anyway, I like the use of “gets” because it’s short and pretty unambiguous.

These function will compare query results as unordered sets, but I want variants that test ordered sets, as well. I’ve been struggling to come up with a decent name for these variants, but not liking any very well. The obvious ones are:

  • ordered_query_gets()
  • ordered_cursor_gets()

And:

  • sorted_query_gets()
  • sorted_cursor_gets()

But these are kind of long for functions that will be, I believe, used frequently. I could just add a character to get the same idea, in the spirit of sprintf:

  • oquery_gets()
  • ocursor_gets()

Or:

  • squery_gets()
  • scursor_gets()

I think that these are okay, but might be somewhat confusing. I think that the “s” variant probably won’t fly, since for sprintf and friends, the “s” stands for “string.” So I’m leaning towards the “o” variants.

But I’m throwing it out there for the masses to make suggestions: Got any ideas for better function names? Are there some relational terms for ordered sets, for example, that might make more sense? What do you think?

As a side note, I’m also considering:

  • col_is() to compare the result of a single column query to an array or other query. This would need an ordered variant, as well.
  • row_is(), although I have no idea how I’d be able to support passing a row expression to a function, since PostgreSQL doesn’t allow RECORDs to be passed to functions.

Looking for the comments? Try the old layout.

Thoughts on Testing SQL Result Sets

pgTAP: The Critique

I’ve been continuing hacking on pgTAP in order to add a lot more schema-testing functionality and a few other niceties. But back when I started the project, I using it to write tests for CITEXT, which was great for my sanity as I developed it, but proved a bit controversial. In a pgsql-hackers post, Tom Lane wrote:

There was some discussion earlier about whether the proposed regression tests for citext are suitable for use in contrib or not. After playing with them for awhile, I have to come down very firmly on the side of “not”. I have these gripes:

  1. The style is gratuitously different from every other regression test in the system. This is not a good thing. If it were an amazingly better way to do things, then maybe, but as far as I can tell the style pgTAP forces on you is really pretty darn poorly suited for SQL tests. You have to contort what could naturally be expressed in SQL as a table result into a scalar. Plus it’s redundant with the expected-output file.

  2. It’s ridiculously slow; at least a factor of ten slower than doing equivalent tests directly in SQL. This is a very bad thing. Speed of regression tests matters a lot to those of us who run them a dozen times per day —– and I do not wish to discourage any developers who don’t work that way from learning better habits ;–)

Because of #1 and #2 I find the use of pgTAP to be a nonstarter.

These are legitimate criticisms, of course. To take the second item first, I would eventually like to figure out a way to make pgTAP a lot faster (in my own benchmarks, I found it to be about 4 times slower than pure SQL, not ten times, but still). A number of functions can likely be rewritten in C, and maybe data can be stored in memory rather than in a temporary table. Overall, though, the speed of the tests doesn’t really concern me much. I’m quite used to large test suites, such as that for Bricolage, that take 5 or 10 minutes or more. This is because, compared to the time it would take me to maintain the code without tests, it’s nothing. I find and fix bugs much more quickly thanks to regression tests. And really, one should just run a subset of the tests for whatever one is working on, and then run the full suite before checking in. One could even have a larger, more comprehensive (read: slower) test suite that’s run via a cron job, so that it identifies bugs in checked in code but developers don’t have to spend a lot of time waiting for tests to finish running.

As a result, I wouldn’t advocate for converting the existing PostgreSQL regression test suite to pgTAP. I could see writing a new suite of tests on pgTAP that run on the build farm. This would be great, as they would complement the existing test suite, and be able to test stuff that can’t be tested with pg_regress.

So really, the performance issue can be addressed in a few ways, some technical, some social, some structural. Like I said, I’m not overly concerned about it, and I wouldn’t make Tom suffer unduly from it, either (I converted all of the CITEXT tests to plain SQL).

Coercing Composite Values

The first issue is tougher, however. Tom was responding to a test like this:

SELECT is(
    ARRAY( SELECT name FROM srt ORDER BY name )::text,
    ARRAY['AAA', 'aardvark', 'aba', 'ABC', 'abc']::text,
    'The words should be case-insensitively sorted'
);

Now, I agree that it’s redundant with the expected-output file, but the assumption with TAP is that there is no expected output file: you just analyze its output using a harness. The need for an expected output file is driven by the legacy of pg_regress.

A bigger issue, and the one I’ll focus on for the remainder of this post, is the requirement currently inherent in pgTAP to “contort what could naturally be expressed in SQL as a table result into a scalar.” The issue is apparent in the above example: even though I’m selecting a number of rows from a table, I use the ARRAY() constructor function to force them into a scalar value—an array—in order to easily do the comparison. It also results in a useful diagnostic message in case the test fails:

# Failed test 40: "The words should be case-insensitively sorted"
#         have: {AAA,aardvark,ABC,abc,aba}
#         want: {AAA,aardvark,aba,ABC,abc}

So for simple cases like this, it doesn’t bother me much personally. But I’ve also had to write tests for functions that return composite types—that is, rows—and again I had to fall back on coercing them into scalar values to do the comparison. For example, say that the fooey() function returns a dude value, which is a composite type with an integer and a text string. Here’s how to test it with pgTAP:

SELECT is(
    fooey()::text,
    ROW( 42, 'Bob' )::text,
    'Should get what we expect from fooey()'
);

So I’m again coercing a value into something else (of course, if I could pass records to functions, that issue goes away). And it does yield nice diagnostics on failure:

# Failed test 96: "Should get what we expect from fooey()"
#         have: (42,Fred)
#         want: (42,Bob)

It gets much worse with set returning functions—Tom’s “table result:” it requires both type and row coercion (or “contortion” if you’d prefer). Here’s an example of a fooies() function that returns a set of dudes:

SELECT is(
    ARRAY( SELECT ROW(f.*)::text FROM fooies() f ),
    ARRAY[
        ROW( 42, 'Fred' )::text,
        ROW( 99, 'Bob' )::text
    ],
    'Should get what we expect from fooies()'
);

As you can see, it’s do-able, but clumsy and error prone. We really are taking a table result and turning into a scalar value. And thanks to the casts to text, the test can actually incorrectly pass if, for example, the integer was actually stored as text (although, to be fair, the same is true of a pg_regress test, where everything is converted to text before comparing results).

What we really need is a way to write two queries and compare their result sets, preferably without any nasty casts or coercion into scalar values, and with decent diagnostics when a test fails.

As an aside, another approach is to use EXCEPT queries to make sure that two data sets are the same:

SELECT ok(
    NOT EXISTS (
        (SELECT 42, 'Fred' UNION SELECT 99, 'Bob')
        EXCEPT
        SELECT * from fooies()
    ),
    'Should get what we expect from fooies()'
);

SELECT ok(
    NOT EXISTS (
        SELECT * from fooies()
        EXCEPT
        (SELECT 42, 'Fred' UNION SELECT 99, 'Bob')
    ),
    'Should have no unexpected rows from fooies()'
);

Here I’ve created two separate tests. The first makes sure that fooies() returns all the expected rows, and the second makes sure that it doesn’t return any unexpected rows. But since this is just a boolean test (yes, we’ve coerced the results into booleans!), there are no diagnostics if the test fails: you’d have to go ahead and run the query yourself to see what’s unexpected. Again, this is do-able, and probably a more correct comparison than using the casts of rows to text, but makes it harder to diagnose failures. And besides, EXCEPT compares sets, which are inherently unordered. That means that if you need to test that results come back in a specific order, you can’t use this approach.

That said, if someone knows of a way to do this in one query—somehow make some sort of NOT EXCEPT operator work—I’d be very glad to hear it!

Prior Art

pgTAP isn’t the only game in town. There is also Dmitry Koterov’s PGUnit framework and Bob Brewer’s Epic. PGUnit seems to have one main assertion function, assert_same(), which works much like pgTAP’s is(). Epic’s assert_equal() does, too, but Epic also offers a few functions for testing result sets that neither pgTAP nor PGUnit support. One such function is assert_rows(), to which you pass strings that contain SQL to be evaluated. For example:

CREATE OR REPLACE FUNCTION test.test_fooies() RETURNS VOID AS $_$
BEGIN
    PERFORM test.assert_rows(
        $$ VALUES(42, 'Fred'), (99, 'Bob') $$,
        $$ SELECT * FROM fooies()          $$
    );
    RAISE EXCEPTION '[OK]';
END;
$_$ LANGUAGE plpgsql;

This works reasonably well. Internally, Epic runs each query twice, using EXCEPT to compare result sets, just as in my boolean example above. This yields a proper comparison, and because assert_rows() iterates over returned rows, it emits a useful message upon test failure:

psql:try_epic.sql:21: ERROR:  Record: (99,Bob) from: VALUES(42, 'Fred'), (99, 'Bob') not found in: SELECT * FROM fooies()
CONTEXT:  SQL statement "SELECT  test.assert_rows( $$ VALUES(42, 'Fred'), (99, 'Bob') $$, $$ SELECT * FROM fooies() $$ )"
PL/pgSQL function "test_fooies" line 2 at PERFORM

A bit hard to read with all of the SQL exception information, but at least the information is there. At PGCon, Bob told me that passing strings of SQL code made things a lot easier to implement in Epic, and I can certainly see how that could be (pgTAP uses SQL code strings too, with its throws_ok(), lives_ok(), and performs_ok() assertions). But it just doesn’t feel SQLish to me. I mean, if you needed to write a really complicated query, it might be harder to maintain: even using dollar quoting, it’s harder to track stuff. Furthermore, it’s slow, as PL/pgSQL’s EXECUTE must be called twice and thus plan twice. And don’t even try to test a query with side-effects—such as a function that inserts a row and returns an ID—as the second run will likely lead to test failure just might blow something up.

SQL Blocks?

One approach is to use blocks. I’m thinking here of something like Ruby blocks or Perl code references: a way to dynamically create some code that is compiled and planned when it loads, but its execution can be deferred. In Perl it works like this:

my $code = sub { say "woof!" };
$code->(); # prints "woof!"

In Ruby (and to a lesser extent in Perl), you can pass a block to a method:

foo.bar { puts "woof!" }

The bar method can then run that code at its leisure. We can sort of do this in PostgreSQL using PREPARE. To take advantage of it for Epic’s assert_rows() function, one can do something like this:

CREATE OR REPLACE FUNCTION test.test_fooies() RETURNS VOID AS $_$
BEGIN
    PREPARE want AS VALUES(42, 'Fred'), (99, 'Bob');
    PREPARE have AS SELECT * FROM public.fooies();
    PERFORM test.assert_rows(
        test.global($$ EXECUTE want $$),
        test.global($$ EXECUTE have $$)
    );
    RAISE EXCEPTION '[OK]';
END;
$_$ LANGUAGE plpgsql;

The nice thing about using a prepared statement is that you can actually write all of your SQL in SQL, rather than in an SQL string, and then pass the simple EXECUTE statement to assert_rows(). Also note the calls to test.global() in this example. This is a tricky function in Epic that takes an SQL statement, turns its results into a temporary table, and then returns the table name. This is required for the EXECUTE statements to work properly, but a nice side-effect is that the actual queries are executed only once each, to create the temporary tables. Thereafter, those temporary tables are used to fetch results for the test.

Another benefit of prepared statements is that you can write a query once and use it over and over again in your tests. Say that you had a few set returning functions that return different results from the users table. You could then test them all like so:

CREATE OR REPLACE FUNCTION test.test_user_funcs() RETURNS VOID AS $_$
BEGIN
    PREPARE want(bool) AS SELECT * FROM users WHERE active = $1;
    PREPARE active     AS SELECT * FROM get_active_users();
    PREPARE inactive   AS SELECT * FROM get_inactive_users();
    PERFORM test.assert_rows(
        test.global($$ EXECUTE want(true) $$),
        test.global($$ EXECUTE active     $$)
    );
    PERFORM test.assert_rows(
        test.global($$ EXECUTE want(false) $$),
        test.global($$ EXECUTE inactive    $$)
    );
    RAISE EXCEPTION '[OK]';
END;
$_$ LANGUAGE plpgsql;

Note how I’ve tested both the get_active_users() and the get_inactive_users() function by passing different values when executing the want prepared statement. Not bad. I think that this is pretty SQLish, aside from the necessity for test.global().

Still, the use of prepared statements with Epic’s assert_rows() is not without issues. There is still a lot of execution here (to create the temporary tables and to select from them a number of times). Hell, this last example reveals an inefficiency in the creation of the temporary tables, as the two different executions of have create two separate temporary tables for data that’s already in the users table. If you have a lot of rows to compare, a lot more memory will be used. And you still can’t check the ordering of your results, either.

So for small result sets and no need to check the ordering of results, this is a pretty good approach. But there’s another.

Result Set Handles

Rather than passing blocks to be executed by the tests, in many dynamic testing frameworks you can pass data structures be compared. For example, Test::More’s is_deeply() assertion allows you to test that two data structures contain the same values in the same structures:

is_deeply \@got_data, \@want_data, 'We should have the right stuff';

This does a deep comparison between the contents of the @got_data array and @want_data. Similarly, I could imagine a test to check the contents of a DBIx::Class result set object:

results_are( $got_resultset, $want_resultset );

In this case, the is_results() function would iterate over the two result sets, comparing each result to make sure that they were identical. So if prepared statements in SQL are akin to blocks in dynamic languages, what is akin to a result set?

The answer, if you’re still with me, is cursors.

Now, cursors don’t work with Epic’s SQL-statement style tests, but I could certainly see how a pgTAP function like this would be useful:

DECLARE want CURSOR FOR SELECT * FROM users WHERE active;
DECLARE have CURSOR FOR SELECT * FROM get_active_users();
SELECT results_are( 'want', 'have' );

The nice thing about this approach is that, even more than with prepared statements, everything is written in SQL. The results_are() function would simply iterate over each row returned from the two cursors to make sure that they were the same. In the event that there was a difference, the diagnostic output would be something like:

#   Failed test 42:
#     Results begin differing at row 3:
#          have: (3,Larry,t)
#          want: (3,Larry,f)

So there’s a useful diagnostic, ordering is preserved, no temporary tables are created, and the data is fetched directly from its sources (tables or functions or whatever) just as it would be in a straight SQL statement. You still have the overhead of PL/pgSQL’s EXECUTE, and iterating over the results, but, outside of some sort of NOT INTERSECT operator, I don’t see any other way around it.

The Plan

So I think I’ll actually look at adding support for doing this in two ways: one with prepared statements (or query strings, if that’s what floats your boat) like Epic does, though I’m going to look at avoiding the necessity for something like Epic’s global() function. But I’ll also add functions to test cursors. And maybe a few combinations of these things.

So, does an approach like this, especially the cursor solution, address Tom’s criticism? Does it feel more relational? Just to rewrite the kind of test Tom originally objected to, it would now look something like this:

DECLARE have CURSOR FOR SELECT name FROM srt ORDER BY name;
DECLARE want CURSOR FOR VALUES ('AAA'), ('aardvark'), ('aba'), ('ABC'), ('abc');
SELECT results_are(
    'have', 'want',
    'The words should be case-insensitively sorted'
);

Thoughts? I’m not going to get to it this week, so feedback would be greatly appreciated.

Looking for the comments? Try the old layout.

pgTAP 0.21 Drops

Mmmmm…pgTAP

I just dropped a new version of pgTAP following a few marathon hack sessions since my talk at PGCon (movie here, BTW). Actually, the new performs_ok() function came about as I wrote the presentation, but all the rest came on the flight home and in the few days since. Not sure when I’ll hack on it this much again (it’s getting a bit big at 5,758 lines of PL/pgSQL and SQL).

Overall I’m very happy with this release, as it adds a lot of new assertion functions. In particular, I added a slew of functions that test that the objects in a given schema (or visible in the search path, if you prefer) are exactly the objects that should be there. This is useful for a couple of things. For one, Norman Yamada, in his PGCon talk, mentioned that his team was using pgTAP to compare database objects between replicated databases. I like this because it’s a great example of using pgTAP for system testing, rather than just unit testing as I’ve been advocating. See, pgTAP can be used for any kind of testing!

Another use for these functions is in a large organization where many different people might be making changes to a schema. In this scenario, you might have application developers adding new objects to the database (or dropping objects) without necessarily informing the DBAs. Using, for example, tables_are() and functions_are() and continuous testing, the DBAs can see when objects have been modified by the developers. Even better, if the developers are running the pgTAP tests themselves (as they should be!), they will be reminded to add new tests for their changes when the existing tests notice that things have been added or dropped and thus fail.

Beyond that, I added a bunch of new functions for testing functions and a number of other goodies. Check out the release notes for all the details.

With these changes, I’ve finished nearly everything I’ve thought of for pgTAP. There are only a few sequence-testing functions left on the To Do list, as well as a call to add a throws_like() function, which I’ll throw in soon if no one else steps up. Beyond these changes, I have a few ideas of where to take it next, but so far I’m kind of stumped. Mainly what I think should be done is to add an interface that makes it easier to compare relations (or result sets, if you prefer). Epic does this by allowing query strings to be passed to a function, but I’d really like to keep queries in SQL rather than in SQL strings. I’ll be giving it some more thought and will post about it soon.

Looking for the comments? Try the old layout.

Unit Test Your Database!

Gave my talk at PGCon today. I felt that it went well, and was well-received. So here it is for everyone else, for posterity, via download or Slideshare. Enjoy!

Looking for the comments? Try the old layout.

More on Database Testing

I’ve been meaning for a while to come back to the topic of database testing. After posting that entry, I thought more about the quote from a PostgreSQL core hacker, which I think bears repeating:

Well, you are testing for bugs, and bugs are pretty specific in where they appear. Writing the tests is 90% of the job; writing the infrastructure is minor. If the infrastructure has limitations, which all do, you might as well write that extra 10% too.

I had been so focused on the first sentence, on the why of database testing, that I’d not rally addressed the rest. I failed to notice that he was questioning the utility of a testing infrastructure, or what I would call a framework, like pgTAP. So let me rectify that right now by addressing his actual point.

The idea of using an established framework and protocol is to be able to focus exclusively on the task of writing tests, rather than worrying about how to analyze test results. I agree that writing tests can be time-consuming, but that doesn’t mean that one should write one’s own testing framework. The great thing about pgTAP is that it emits TAP, which can then be analyzed along with any other TAP-emitting test framework in any environment, including Perl, C, JavaScript, PHP, and even Oracle, among others.

The other argument that might support writing one’s own testing infrastructure is if it’s too hard to apply one style of testing to a given application. For example, most of the existing TAP producers provide a functional interface to writing tests. SQL, on the other hand, is not a functional language. So–leaving aside for the moment that one can provide an effective functional interface for writing database tests–even if one wanted to write a relational-style testing framework, it could still emit TAP! TAP is, after all, just a stream of text. So as long as a SQL SELECT statement returns a stream of TAP, then you can take advantage of the myriad of test analysis tools out there.

Now, I was discussing the use of TAP with a different PostgreSQL contributor, who was asking me about modifying the output of pg_regress to be TAP. The way that pg_regress works–and therefore how PostgreSQL core tests work–is simple: One writes SQL statements into a test script, and then one writes an expected output file. If the output of the tests might vary by platform, database setting, or compile-time feature, one just creates more expected files, each with the appropriate variations.

The PostgreSQL test runner, pg_regress then simply runs the script through psql and diffs the output against each expected file. If one of the files is identical to the output, the test passes. Otherwise it fails. When the tests run, the output looks like this:

parallel group (2 tests):  copyselect copy
   copy                 ... ok
   copyselect           ... ok

My core hacker correspondent was thinking of modifying this output to be TAP, something like this:

# Parallel group (2 tests):  copyselect copy
1..2
ok 1 - copy
ok 2 - copyselect

With this change, he could then run the regression tests through TAP::Harness in a cron job and send failure reports when a test failed. This is good as far as it goes, but it has a couple of significant limitations. For one, there are no diagnostics if something goes wrong. This is because, and this is the second shortcoming, it just turns the result of testing a single script into TAP, not individual assertions. There might be 1000s of SQL statements in one script, but if the test fails, one won’t know what failed until one looks at regression.diff.

One of the great features of TAP is the support for diagnostics. For example, if an assertion fails, you might see output something like this:

not ok 38 - The frobnitz should be named "foo"
# Failed test 38: "The frobnitz should be named "foo""
#         have: NULL
#         want: foo

Just changing the listing of the test scripts run does not get you this advantage. That’s not to say that it doesn’t make certain things easier, or that one couldn’t simply have shorter test scripts in order to limit the scope of what’s being tested and what’s a result. But a single test script does not make for a good assertion. In short, pg_regress tests don’t do assertions at all. They simply compare actual and expected output from very verbose scripts. This is a hell of a lot better than nothing, but is still quite limited.

I suggested to my correspondent that he consider modifying the tests he was working on to use pgTAP, instead. Of course, if you have a lot of existing tests, it might be more trouble than it’s worth to rewrite them all. But that doesn’t mean that you can’t write new tests going forward using something that’s more granular, and gives you a lot more control over the output.

His answer rather shocked me:

I’m lucky enough to only be dealing with really good developers, so I can produce software that works even without focusing specifically on low-level unit tests.

To me, it’s a misconception to think “really good developers” don’t need tests. As I said in reply, I consider myself a “really good developer,” and I’d have a whole lot more pain with the code I maintain if it weren’t for the tests I’ve written. Tests do not compensate for poor coders. Rather, they make it easier to maintain, modify, and refactor code, to fix bugs, and to add new features. Not to mention testing my code on new versions of software, such as testing my CPAN modules with Perl 5.10 and testing my client’s PostgreSQL databases with 8.3 or 8.4 as they look to upgrade from 8.2.

One place where my correspondent agreed that pg_regress could use improvement is in the whole approach to matching different outputs. Using pgTAP, one can write tests in such a way that there are different expected results depending on database settings right in the test scripts themselves! This is because the TAP output can vary in any number of ways, and the harness doesn’t care, as long as the tests pass. pg_regress is extremely strict about what it considers a passing test, and this lacks the necessary flexibility for testing some advanced features.

This is why there are no tests for multibyte character semantics (such as locale-aware sorting) in the PostgreSQL core tests.

Are you a database testing doubter? Have I convinced you yet that a rock-solid test suite can actually make your job easier and more enjoyable? Perhaps I never will, but I am still very interested in your reasons for doubting the utility of database testing. What other points should I be thinking about as I prepare for my PGCon presentation?

Looking for the comments? Try the old layout.

pgTAP 0.20 Infiltrates Community

I did all I could to stop it, but it just wasn’t possible. pgTAP 0.20 has somehow made its way from my Subversion server and infiltrated the PostgreSQL community. Can nothing be done to stop this menace? Its use leads to cleaner, more stable, and more-safely refactored code. This insanity must be stopped! Please review the following list of its added vileness since 0.19 to determine how you can stop the terrible, terrible influence on your PostgreSQL unit-testing practices that is pgTAP:

  • Changed the names of the functions tested in sql/do_tap.sql and sql/runtests.sql so that they are less likely to be ordered differently given varying collation orders provided in different locales and by different vendors. Reported by Ingmar Brouns.
  • Added the --formatter and --archive options to pg_prove.
  • Fixed the typos in pg_prove where the output of --help listed --test-match and --test-schema instead of --match and --schema.
  • Added has_cast(), hasnt_cast(), and cast_context_is().
  • Fixed a borked function signature in has_trigger().
  • Added has_operator(), has_leftop(), and has_rightop().
  • Fixed a bug where the order of columns found for multicolum indexes by has_index() could be wrong. Reported by Jeff Wartes. Thanks to Andrew Gierth for help fixing the query.

Don’t make the same mistake I did, where I wrote a lot of pgTAP tests for a client, and now testing database upgrades from 8.2 to 8.3 is just too reliable! YOU HAVE BEEN WARNED.

Good luck with your mission.

Looking for the comments? Try the old layout.