Add Regular Expression Operator to SQLite

As I discussed a couple of months ago, DBD::SQLite exposes the SQLite qlite3_create_function() API for adding Pure-Perl functions and aggregates to SQLite on a per-connection basis. This is cool, but in perusing the SQLite expression documentation, I came across this gem:

The REGEXP operator is a special syntax for the regexp() user function. No regexp() user function is defined by default and so use of the REGEXP operator will normally result in an error message. If a user-defined function named regexp is defined at run-time, that function will be called in order to implement the REGEXP operator.

Well hell! I thought. I can do that!

In a brief search, I could find no further documentation of this feature, but all it took was a little experimentation to figure it out. The regexp() function should expect two arguments. The first is the regular expression, and the second is the value to match. So it can be added to DBD::SQLite like this:

$dbh = DBI->connect('dbi:SQLite:dbfile=test.db');
$dbh->func('regexp', 2, sub {
    my ($regex, $string) = @_;
    return $string =~ /$regex/;
}, 'create_function');

Yep, that's it! Now, I have my own module for handling database connections, and I wanted to make sure that all of my custom functions are always present, every time I connect to the database. In a mod_perl environment, you can end up with a lot of connections, and a single process has the potential disconnect and reconnect more than once (due to exceptions thrown by the database and whatnot). The easiest way to ensure that the functions are always there as soon as you connect and every time you connect, I learned thanks to a tip from Tim Bunce, is to subclass the DBI and implement a connected() method. Here's what it looks like:

package MyApp::SQLite;
use base 'DBI';

package MyApp::SQLite::st;
use base 'DBI::st';

package MyApp::SQLite::db;
use base 'DBI::db';

sub connected {
    my $dbh = shift;
    # Add regexp function.
    $dbh->func('regexp', 2, sub {
        my ($regex, $string) = @_;
        return $string =~ /$regex/;
    }, 'create_function');
}

So how does this work? Here's a quick app I wrote to demonstrate the use of the REGEXP expression in SQLite using Perl regular expressions:

#!/usr/bin/perl -w

use strict;

my $dbfile = shift || die "Usage: $0 db_file\n";
my $dbh = MyApp::SQLite->connect(
    "dbi:SQLite:dbname=$dbfile", '', '',
    {
        RaiseError  => 1,
        PrintError  => 0,
    }
);

END {
    $dbh->do('DROP TABLE try');
    $dbh->disconnect;
}

$dbh->do('CREATE TABLE try (a TEXT)');

my $ins = $dbh->prepare('INSERT INTO try (a) VALUES (?)');
for my $val (qw(foo bar bat woo oop craw)) {
    $ins->execute($val);
}

my $sel = $dbh->prepare('SELECT a FROM try WHERE a REGEXP ?');

for my $regex (qw( ^b a w?oop?)) {
    print "'$regex' matches:\n  ";
    print join "\n  " =>
        @{ $dbh->selectcol_arrayref($sel, undef, $regex) };
    print "\n\n";
}

This script outputs:

'^b' matches:
  bar
  bat

'a' matches:
  bar
  bat
  craw

'w?oop?' matches:
  foo
  woo
  oop

Pretty slick, no? I wonder if it'd make sense for DBD::SQLite to add the regexp() function itself, in C, using the Perl API, so that it's just always available to DBD::SQLite apps?

Custom Aggregates in Perl

About a year ago, Josh Berkus was reviewing some Bricolage SQL code, looking to optimize it for PostgreSQL. One of the things he noticed was that we were fetching a lot more rows for an object than we needed to. The reason for this is that an object might be associated with one or more groups, and to get back a list of all of the group IDs, we were getting multiple rows. For example, if I wanted to fetch a single story with the ID 10, I might get back rows like this:

SELECT s.id, s.title, grp.id
FROM   story s, member m, grp g
WHERE  s.id = m.story_id
       AND m.grp_id = g.id
       AND s.id = 10;
s.id |        s.title      | grp.id
-----+---------------------+--------
  10 | The Princess Bride  | 23
  10 | The Princess Bride  | 24
  10 | The Princess Bride  | 25
  10 | The Princess Bride  | 26
  10 | The Princess Bride  | 27

Now, that's a lot of extra data to have to fetch for just a single row to be different; it's very wasteful, really. So Josh said, Why don't you use a custom aggregate for that? I knew nothing about aggregates, but I did some research, and figured out how to write PostgreSQL custom aggregates in SQL. I wrote a very simple one, called id_list(), that joins up all of the values in a column with an empty space. The aggregate code looks like this:

CREATE   FUNCTION append_id(TEXT, INTEGER)
RETURNS  TEXT AS '
    SELECT CASE WHEN $2 = 0 THEN
                $1
           ELSE
                $1 || '' '' || CAST($2 AS TEXT)
           END;'
LANGUAGE 'sql'
WITH     (ISCACHABLE, ISSTRICT);

CREATE AGGREGATE id_list (
    SFUNC    = append_id,
    BASETYPE = INTEGER,
    STYPE    = TEXT,
    INITCOND = ''
);

Now I was able to vastly simplify the results returned by the query:

SELECT s.id, s.title, id_list(grp.id)
FROM   story s, member m, grp g
WHERE  s.id = m.story_id
       AND m.grp_id = g.id
       AND s.id = 10;
GROUP BY s.id, s.title
s.id |        s.title      | id_list
-----+---------------------+---------------
  10 | The Princess Bride  | 23 24 25 26 27

So then I just had to split the id_list column on the white space and I was ready to go. Cool!

So recently, was thinking about how I might do something similar in SQLite. It turns out that SQLite has a way to add custom aggregates, too, via its sqlite_add_function function. But I don't know C, and had been wondering for a while how, even if I figured out how to write an aggregate function in C, whether I would have to require users to compile SQLite with my C aggregate in order to get it to work.

However, as a Perl developer, I thought it might be worthwhile to just quickly check the DBD::SQLite docs might have to say on the matter. And it turns out that the ability to add aggregates to SQLite is supported in DBD::SQLite via the create_aggregate custom function. And what's more, the aggregate can be written in Perl! Whoa! I couldn't believe that it could be that easy, but a quick test script demonstrated that it is:

#!/usr/bin/perl -w

use strict;

use DBI;

my $dbfile = shift;
my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile", '', '');

END {
    $dbh->disconnect;
    unlink $dbfile;
};

$dbh->do('CREATE TABLE foo (a int)');
$dbh->do('BEGIN');
$dbh->do('INSERT INTO foo (a) VALUES (?)', undef, $_) for 1..10;
$dbh->do('COMMIT');

# Create a new aggregate.
$dbh->func('joiner', 1, 'My::Join', 'create_aggregate');

my $sel = $dbh->prepare(q{SELECT joiner(a) FROM foo});
$sel->execute;
$sel->bind_columns(\my $val);
print "$val\n" while $sel->fetch;

The first argument to create_aggregate() (itself invoked via the DBI func() method) the name of the aggregate, the second is the number of arguments to the aggregate (use -1 for an unlimited number), and the third is the name of a Perl class that implements the aggregate. That class needs just three methods: new(), an object constructor; step(), called for each aggregate row, and finalize, which must return the value calculated by the aggregate. My simple implementation looks like this:

package My::Join;

sub new { bless [] }
sub step {
    my $self = shift;
    push @$self, @_;
}
sub finalize {
    my $self = shift;
    return join q{ }, @$self;
}

Yep, that's really it! When I run the script, the output looks like this:

% try foo
1 2 3 4 5 6 7 8 9 10

Keen! I mean, that is just so slick! And it really demonstrates the power of SQLite as an embeddable database, as well. Thanks Matt, for making the SQLite API available to us mere mortal Perl developers!

Enforce Foreign Key Integrity in SQLite with Triggers

After some some Googling and experimentation, I've figured out how to enforce foreign key constraints in SQLite. I got most of the code from Cody Pisto's sqlite_fk utility. I couldn't get it to work, but the essential code for the triggers was in its fk.c file, so I just borrowed from that (public domain) code to figure it out.

Since I couldn't find documentation for this elsewhere on the Net (though I'm sure it exists somewhere), I decided to just put an example here. Interested? Read on!

Say you have these two table declarations:

create table foo (
  id INTEGER NOT NULL PRIMARY KEY
);

CREATE TABLE bar (
  id INTEGER NOT NULL PRIMARY KEY,
  foo_id INTEGER NOT NULL
         CONSTRAINT fk_foo_id REFERENCES a(id) ON DELETE CASCADE
);

Table bar has a foreign key reference to the primary key column in the foo table. Although SQLite supports this syntax (as well as named foreign key constraints), it ignores them. So if you want the references enforced, you need to create triggers to do the job. Triggers were added to SQLite version 2.5, so most users can take advantage of this feature. Each constraint must have three triggers: one for INSERTs, one for UPDATESs, and one for DELETESs. The INSERT trigger looks like this:

CREATE TRIGGER fki_bar_foo_id
BEFORE INSERT ON bar
FOR EACH ROW BEGIN 
  SELECT CASE
     WHEN ((SELECT id FROM foo WHERE id = NEW.foo_id) IS NULL)
     THEN RAISE(ABORT, 'insert on table "bar" violates foreign key '
                || 'constraint "fk_foo_id"')
  END;
END;

(You can put the RAISE error string all on one line; I've concatenated two lines to keep line lengths reasonable here.) If your foreign key column is not NOT NULL, the trigger's SELECT CASE clause needs to an extra case:

CREATE TRIGGER fki_bar_foo_id
BEFORE INSERT ON bar
FOR EACH ROW BEGIN 
   SELECT CASE
     WHEN ((new.foo_id IS NOT NULL)
           AND ((SELECT id FROM foo WHERE id = new.foo_id) IS NULL))
     THEN RAISE(ABORT, 'insert on table "bar" violates foreign key '
                || 'constraint "fk_foo_id"')
  END;
END;

The UPDATE statements are almost identical; if your foreign key column is NOT NULL, then do this:

CREATE TRIGGER fku_bar_foo_id
BEFORE UPDATE ON bar
FOR EACH ROW BEGIN 
   SELECT CASE
     WHEN ((SELECT id FROM foo WHERE id = new.foo_id) IS NULL))
     THEN RAISE(ABORT, 'update on table "bar" violates foreign key '
                || 'constraint "fk_foo_id"')
  END;
END;

And if NULLs are allowed, do this:

CREATE TRIGGER fku_bar_foo_id
BEFORE UPDATE ON bar
FOR EACH ROW BEGIN 
   SELECT CASE
     WHEN ((new.foo_id IS NOT NULL)
           AND ((SELECT id FROM foo WHERE id = new.foo_id) IS NULL))
     THEN RAISE(ABORT, 'update on table "bar" violates foreign key '
                || 'constraint "fk_foo_id"')
  END;
END;

The DELETE trigger is, of course, the reverse of the INSERT and UPDATE triggers, in that it applies to the primary key table, rather than the foreign key table. To whit, in our example, it watches for DELETEs on the foo table:

CREATE TRIGGER fkd_bar_foo_id
BEFORE DELETE ON foo
FOR EACH ROW BEGIN 
  SELECT CASE
    WHEN ((SELECT foo_id FROM bar WHERE foo_id = OLD.id) IS NOT NULL)
    THEN RAISE(ABORT, 'delete on table "foo" violates foreign key '
               || ' constraint "fk_foo_id"')
  END;
END;

This trigger will prevent DELETEs on the foo table when there are existing foreign key references in the bar table. This is generally the default behavior in databases with referential integrity enforcement, sometimes specified explicitly as ON DELETE RESTRICT. But sometimes you want the deletes in the primary key table to cascade to the foreign key tables. Such is what our example declaration above specifies, and this is the trigger to to the job:

CREATE TRIGGER fkd_bar_foo_id
BEFORE DELETE ON foo
FOR EACH ROW BEGIN 
    DELETE from bar WHERE foo_id = OLD.id;
END;

Pretty simple, eh? The trigger support in SQLite is great for building your own referential integrity checks. Hopefully, these examples will get you started down the path of creating your own.