Introducing MyTAP

I gave my OSCON tutorial (slides) last week. It went okay. I spent way too much time helping to get everyone set up with pgTAP, and then didn't have time to have the attendees do the exercises, and I had to rush through 2.5 hours of material in 1.5 hours. Yikes! At least the video will be better when it's released (more when that happens).

But as often happens, I was asked whether something like pgTAP exists for MySQL. But this time I was asked by MySQL Community Manager Giuseppe Maxia, who also said that he'd tried to create a test framework himself (a fellow Perl hacker!), but that it wasn't as nice as pgTAP. Well, since I was at OSCON and tend to like to hack on side projects while at conferences, and since I hoped that Giuseppe will happily take it over once I've implemented the core, I started hacking on it myself. And today, I'm pleased to announce the release of MyTAP 0.01 (downloads).

Once you've downloaded it, install it against your MySQL server like so:

mysql -u root < mytap.sql

Here's a very simple example script:

-- Start a transaction.
BEGIN;

-- Plan the tests.
SELECT tap.plan(1);

-- Run the tests.
SELECT tap.pass( 'My test passed, w00t!' );

-- Finish the tests and clean up.
CALL tap.finish();
ROLLBACK;

You can run this test from a .sql file using the mysql client like so:

mysql -u root --disable-pager --batch --raw --skip-column-names --unbuffered --database try --execute 'source test.sql'

But that's a PITA and can only run one test at a time. Instead, put all of your tests into a directory, perhaps named tests, each with the suffix “.my”, and use my_prove (install TAP::Parser::SourceHandler::MyTAP from CPAN to get it) instead:

my_prove -u root --database try tests/

For MyTAP's own tests, the output looks like this:

tests/eq.my ........ ok
tests/hastap.my .... ok
tests/matching.my .. ok
tests/moretap.my ... ok
tests/todotap.my ... ok
tests/utils.my ..... ok
All tests successful.
Files=6, Tests=137,  1 wallclock secs
(0.06 usr  0.03 sys +  0.01 cusr  0.02 csys =  0.12 CPU)
Result: PASS

Nice, eh? Of course there are quite a few more assertion functions. See the complete documentation for details.

Now, I did my best to keep the interface the same as pgTAP, but there are a few differences:

  • MySQL temporary tables are teh suck, so I had to use permanent tables to track test state. To make this more feasible, MyTAP is always installed in its own database, (named “tap” by default), and you must always schema-qualify your use of the MyTAP functions.
  • Another side-effect of permanent tables is that MyTAP must keep track of test outcomes without colliding with the state from tests running in multiple concurrent connections. So MyTAP uses connection_id() to keep track of state for a single test run. It also deletes the state when tests finish(), but if there's a crash before then, data can be left in those tables. If the connection ID is ever re-used, this can lead to conflicts. This seems mostly avoidable by using InnoDB tables and transactions in the tests.
  • The word “is” is strictly reserved by MySQL, so the function that corresponds to pgTAP's is() is eq() in MyTAP. Similarly, isnt() is called not_eq() in MyTAP.
  • There is no way to throw an exception in MySQL functions an procedures, so the code cheats by instead performing an illegal operation: selecting from a non-existent column, where the name of that column is the error message. Hinky, but should get the point across.

Other than these issues, things went fairly smoothly. I finished up the 0.01 version last night and released it today with most of the core functionality in place. And now I want to find others to take over, as I am not a MySQL hacker myself and thus unlikely ever to use it. If you're interested, my recommendations for things to do next are:

So fork on GitHub or contact me if you'd like to be added as a collaborator (I'm looking at you, Giuseppe!).

Hope you find it useful.

Intelligent MySQL Configuration

James Duncan Davidson's Configuring MySQL on MacOS X post earlier today reminded me that I wanted to blog about the configuration I came up with while installing MySQL 5 on my box. Nothing has irritated me more than when MySQL's syntax has violated the ANSI SQL standards in the most blatant ways, or when transactions have appeared to work, but mysteriously not worked. Yes, I use Duncan's settings to make sure that the MySQL box on my PowerBook only listens on local sockets, but I additionally add this configuration to /etc/my.cnf:

[mysqld]
sql-mode=ansi,strict_trans_tables,no_auto_value_on_zero,no_zero_date,no_zero_in_date,only_full_group_by
character-set-server=utf8
default-storage-engine=InnoDB
default-time-zone=utc

That last configuration can actually only be added after running this command:

/usr/local/mysql/bin/mysql_tzinfo_to_sql /usr/share/zoneinfo | \
mysql -u root mysql

But then the upshot is that I have everything configured to be as compliant as possible (although the time zone stuff is just my personal preference):

mysql> SELECT @@global.sql_mode;
mysql> SELECT @@global.sql_mode;
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| @@global.sql_mode                                                                                                                                     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE,ONLY_FULL_GROUP_BY,ANSI,NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show variables like '%character_set\_%';
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_client     | utf8   |
| character_set_connection | utf8   |
| character_set_database   | utf8   |
| character_set_filesystem | binary |
| character_set_results    | utf8   |
| character_set_server     | utf8   |
| character_set_system     | utf8   |
+--------------------------+--------+
7 rows in set (0.01 sec)

mysql> show variables like '%table_ty%';
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| table_type    | InnoDB |
+---------------+--------+
1 row in set (0.00 sec)

mysql> show variables like 'time_zone%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| time_zone     | utc   |
+---------------+-------+
1 row in set (0.00 sec)

Now that's the way things should be! Or at least as close as I'm going to get to it in MySQL 5.

Update 2006-11-04: Ask Bjørn Hansen turned me on to the strict_trans_tables mode, which prevents MySQL from trying to guess what you mean when you leave out a value for a required column. So I've now updated my configuration with sql-mode=ansi,strict_trans_tables.

Update 2009-11-05: I found myself configuring MySQL again today, and there were some other settings I found it useful to add:

  • no_auto_value_on_zero forces AUTO_INCREMENT columns to increment only when inserting a NULL, rather than when inserting a NULL or a zero(!).
  • no_zero_date and no_zero_in_date disallow dates where the the year or month are set to 0.
  • only_full_group_by requires that non-aggregated columns in a select list be included in a GROUP BY clause, as is mandated by the SQL standard. This only applies if an aggregate function is used in a query

I've added all of these to the example above.

MySQL's REPLACE Considered Harmful

So we've set up a client with an online poll application using MySQL. Polls are created in Bricolage, and when they're published, rather than writing data to files, the template writes data to the MySQL database. PHP code on the front-end server then uses the database records to manage the polls.

On the recommendation of one of my colleagues, I was using the MySQL REPLACE statement to insert and update poll answers in the database. At first, this seemed like a cool idea. All I had to do was create a unique index on the story_id and ord (for answer order) columns and I was set. Any time someone reordered the answers or changed their wording in Bricolage, the REPLACE statement would change the appropriate records and just do the right thing.

Or so I thought.

Come the day after the launch of the new site, I get a complaint from the customer that the percentage spread between the answers doesn't add up to 100%. After some investigation, I realized that the poll_results table is using the ID of each question to identify the votes submitted by readers. This makes sense, of course, and is excellent relational practice, but I have overlooked the fact that REPLACE essentially replaces rows every time it is used. This means that even when a poll answer hasn't changed, it gets a new ID. Yes, that's right, its primary key value was changing. Yow!

Now we might have caught this earlier, but the database was developed on MySQL 3.23.58 and, as is conventional among MySQL developers, there were no foreign key constraints. So the poll results were still happily pointing to non-existent records. So a poll might appear to have 800 votes, but the percentages might be counted for only 50 votes. Hence the problem with the percentages not adding up to 100% (nowhere near it, in fact).

Fortunately, the production application is on a MySQL 4.1 server, so I made a number of changes to correct this issue:

  • Added foreign key constraints
  • Exploited a little-known (mis)feature of Bricolage to store primary keys for all poll answers (and questions, for that matter)
  • Switched from REPLACE to INSERT, UPDATE, and DELETE statements using the primary keys

I also started using transactions when making all these updates when a poll is published so that changes are always atomic. Now it works beautifully.

But the lesson learned is that REPLACE is a harmful construct. Yes, it was my responsibility to recognize that it would create new rows and therefore new primary keys. But any construct that changes primary keys should be stricken from any database developer's toolbox. The fact that MySQL convention omits the use of foreign key constraints makes this a particularly serious issue that can appear to have mysterious consequences.

So my advice to you, gentle reader, is don't use it.