Just a Theory

Black lives matter

Posts about SQL

Issues with INSERT Execution Ordering

My latest experiments in my never-ending quest to move as much object/relational mapping into the database as possible has yielded more hacks to work around database features. This time, the problem is that I have two classes, Simple and Extend, where I want the latter to extend the former (hence its name). This is a little different from inheritance, in that, internally, an Extend just references a single Simple object, but externally, it just has a single interface, where attributes of a Simple object are just attributes of an Extend object. The benefit here is that I can have multiple Extend objects that reference the same Simple object—something you can’t do with simple inheritance

Now, how I wanted to implement this in the database is where the extend view has all of the columns from the _simple table and the _extend table. That’s pretty simple. It looks like this:

CREATE SEQUENCE seq_kinetic;

CREATE TABLE _simple (
    id INTEGER NOT NULL DEFAULT NEXTVAL('seq_kinetic'),
    uuid UUID NOT NULL DEFAULT UUID_V4(),
    state INTEGER NOT NULL DEFAULT 1,
    name TEXT NOT NULL,
    description TEXT
);

CREATE TABLE _extend (
    id INTEGER NOT NULL DEFAULT NEXTVAL('seq_kinetic'),
    uuid UUID NOT NULL DEFAULT UUID_V4(),
    state INTEGER NOT NULL DEFAULT 1,
    simple_id INTEGER NOT NULL
);

CREATE VIEW extend AS
  SELECT _extend.id AS id, _extend.uuid AS uuid, _extend.state AS state,
         _extend.simple_id AS simple__id, simple.uuid AS simple__uuid,
         simple.state AS simple__state, simple.name AS name,
         simple.description AS description
  FROM   _extend, simple
  WHERE  _extend.simple_id = simple.id;

Pretty simple, right? Well, I like to put RULEs on VIEWs like this, so that I can just use the VIEW for INSERTs, UPDATEs, AND DELETESs. For now I’m just going to talk about the INSERT RULEs for this this view, as they are where the trouble came in.

Why the plural, RULEs? Well, what I wanted was to have two behaviors on insert, depending on the value of the simple__id column. If it’s NOT NULL, it should assume that it references an existing record in the _simple table, UPDATE it, and then INSERT into the _extend table. If it’s NULL, however, then it should INSERT into both the _simple table and the _extend table. What I came up with looked like this:

CREATE RULE insert_extend AS
ON INSERT TO extend WHERE NEW.simple__id IS NULL DO INSTEAD (
  INSERT INTO _simple (id, uuid, state, name, description)
  VALUES (NEXTVAL('seq_kinetic'), UUID_V4(), NEW.simple__state, NEW.name,
          NEW.description);

  INSERT INTO _extend (id, uuid, state, simple_id)
  VALUES (NEXTVAL('seq_kinetic'), COALESCE(NEW.uuid, UUID_V4()), NEW.state,
          CURRVAL('seq_kinetic'));
);

CREATE RULE extend_extend AS
ON INSERT TO extend WHERE NEW.simple__id IS NOT NULL DO INSTEAD (
  UPDATE _simple
  SET    state = COALESCE(NEW.simple__state, state),
         name  = COALESCE(NEW.name, name),
         description = COALESCE(NEW.description, description)
  WHERE  id = NEW.simple__id;

  INSERT INTO _extend (id, uuid, state, simple_id)
  VALUES (NEXTVAL('seq_kinetic'), COALESCE(NEW.uuid, UUID_V4()),
          NEW.state, NEW.simple__id);
);

CREATE RULE insert_extend_dummy AS
ON INSERT TO extend DO INSTEAD NOTHING;

That third RULE is required, as a VIEW must have an unconditional DO INSTEAD RULE. The second RULE also works, for those situations where I want to to “extend” a Simple object into an Extend object. It’s that first rule that’s causing problems, when no Simple object yet exists and I need to create it. When I try to INSERT with simple__id set to NULL, I get an error. Can you guess what it is? Let me not keep you on the edge or your seat:

kinetic=# INSERT INTO EXTEND (simple__id, name) VALUES (NULL, 'Four');
ERROR:  insert or update on table "_extend" violates foreign key constraint "fk_simple_id"
DETAIL:  Key (simple_id)=(22) is not present in table "_simple".

Why’s that? Well, it turns out that NEXTVAL('seq_kinetic') is executed twice, once for the id in the _simple table, and once for the id in the _extend table. But by the time CURRVAL('seq_kinetic') is called to reference the value inserted into the _simple table, it the value has already been fetched from the sequence for insertion into the _extend table. So of course, it fails, because the current value in the sequence is not in the _simple table at all.

At first, I thought that this might be an order of execution problem with the INSERT statement, so I tried this:

CREATE RULE insert_extend AS
ON INSERT TO extend WHERE NEW.simple__id IS NULL DO INSTEAD (
  INSERT INTO _simple (id, uuid, state, name, description)
  VALUES (NEXTVAL('seq_kinetic'), UUID_V4(), NEW.simple__state, NEW.name,
          NEW.description);

  INSERT INTO _extend (simple_id, id, uuid, state)
  VALUES (CURRVAL('seq_kinetic'), NEXTVAL('seq_kinetic'),
          COALESCE(NEW.uuid, UUID_V4()), NEW.state);
);

Unfortunately, that yielded the same error. So if the order of the columns in the INSERT statement didn’t define the execution order, what did? Well, a little research helped me to figure it out: It’s the order of the columns in the table, as this example demonstrates:

test=# CREATE SEQUENCE s;
CREATE SEQUENCE
test=# CREATE TABLE a (a0 int, a1 int, a2 int, a3 int);
CREATE TABLE
test=# INSERT INTO a (a3, a2, a0, a1) VALUES (NEXTVAL('s'), NEXTVAL('s'),
test=# NEXTVAL('s'), NEXTVAL('s'));
INSERT 0 1
test=# SELECT * FROM a;
 a0 | a1 | a2 | a3 
----+----+----+----
  1 |  2 |  3 |  4
(1 row)

Even though the values from the sequence were inserted into the columns by the INSERT statement in a more or less random order, they ended up being inserted into the table in the order in which they were declared in the CREATE TABLE statement.

Damn SQL!

So what’s the solution to this? Well, I came up with three. The first, and perhaps simplest, is to use two sequences instead of one:

CREATE RULE insert_extend AS
ON INSERT TO extend WHERE NEW.simple__id IS NULL DO INSTEAD (
  INSERT INTO _simple (id, uuid, state, name, description)
  VALUES (NEXTVAL('seq_kinetic'), UUID_V4(), NEW.simple__state, NEW.name,
          NEW.description);

  INSERT INTO _extend (id, uuid, state, simple_id)
  VALUES (NEXTVAL('seq_kinetic_alt'), COALESCE(NEW.uuid, UUID_V4()), NEW.state,
          CURRVAL('seq_kinetic'));
);

This works very well, and if you have separate sequences for each table, this is what you would do. But I want to use just one sequence for every primary key in the database, so as to prevent any possibility of duplicates. I could use two mutually exclusive sequences, one for odd numbers and the other for even numbers:

CREATE SEQUENCE seq_kinetic_odd INCREMENT BY 2;
CREATE SEQUENCE seq_kinetic_odd INCREMENT BY 2 START WITH 2;

But then I have to keep track of which sequence I’m using where. If I just use the “even” sequence for this special case (which may be rare), then I’m essentially throwing out half the numbers in the sequence. And I like things to be somewhat orderly, and the skipping of even or odd values would annoy me when I had to work with the database. Yeah, I’m a freak.

The solution I’ve currently worked out is to create a PL/pgSQL function that can keep track of the sequence numbers ahead of time, and just call it from the RULE:

CREATE FUNCTION insert_extend(NEWROW extend) RETURNS VOID AS $$
    DECLARE
        _first_id  integer := NEXTVAL(''seq_kinetic'');
        _second_id integer := NEXTVAL(''seq_kinetic'');
    BEGIN
    INSERT INTO _simple (id, uuid, state, name, description)
    VALUES (_first_id, UUID_V4(), NEWROW.simple__state, NEWROW.name,
            NEWROW.description);

    INSERT INTO _extend (id, uuid, state, simple_id)
    VALUES (_second_id, COALESCE(NEWROW.uuid, UUID_V4()), NEWROW.state, _first_id);
    END;
$$ LANGUAGE plpgsql VOLATILE;
    
CREATE RULE insert_extend AS
ON INSERT TO extend WHERE NEW.simple__id IS NULL DO INSTEAD (
    SELECT insert_extend(NEW);
);

This approach works pretty nicely, and doesn’t add much more code than my original solution with the ordering problem. I think I’ll keep it.

One other solution is to use a TRIGGER instead of a rule, but in truth, it would amount to nearly the same thing:

CREATE FUNCTION insert_extend() RETURNS trigger AS $$
    DECLARE
        _first_id  integer := NEXTVAL(''seq_kinetic'');
        _second_id integer := NEXTVAL(''seq_kinetic'');
    BEGIN
    INSERT INTO _simple (id, uuid, state, name, description)
    VALUES (_first_id, UUID_V4(), NEW.simple__state, NEW.name, NEW.description);

    INSERT INTO _extend (id, uuid, state, simple_id)
    VALUES (_second_id, COALESCE(NEW.uuid, UUID_V4()), NEW.state, _first_id);
    END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER insert_extend BEFORE UPDATE ON extend
FOR EACH ROW EXECUTE PROCEDURE insert_extend();

Um, but looking at it now (I just now typed it up, I haven’t tested it), I don’t think it’d work, because you can’t put a condition on a rule. On the other hand, I could use it to combine the three rules I have (two conditional and mutually exclusive, one that does nothing) into a single trigger:

CREATE FUNCTION insert_extend() RETURNS trigger AS $$
    DECLARE
        _first_id  integer;
        _second_id integer;
    BEGIN
    IF NEW.simple__id IS NULL THEN
        _first_id  := NEXTVAL(''seq_kinetic'');
        _second_id := NEXTVAL(''seq_kinetic'');

        INSERT INTO _simple (id, uuid, state, name, description)
        VALUES (_first_id, UUID_V4(), NEW.simple__state, NEW.name, NEW.description);

        INSERT INTO _extend (id, uuid, state, simple_id)
        VALUES (_second_id, COALESCE(NEW.uuid, UUID_V4()), NEW.state, _first_id);
    ELSE
        UPDATE _simple
        SET    state = COALESCE(NEW.simple__state, state),
                name  = COALESCE(NEW.name, name),
                description = COALESCE(NEW.description, description)
        WHERE  id = NEW.simple__id;

        INSERT INTO _extend (id, uuid, state, simple_id)
        VALUES (NEXTVAL('seq_kinetic'), COALESCE(NEW.uuid, UUID_V4()),
                NEW.state, NEW.simple__id);
    END IF;
    END;
$$ LANGUAGE plpgsql;

Hrm. That just might be the best way to go, period. Thoughts? Have I missed some other obvious solution?

Looking for the comments? Try the old layout.

Let’s get Relational

Cover of “Database In Depth” by C.J. Date

The idea that the relational model could handle only rather simple kinds of data (like numbers, and strings, and dates, and times) is a huge misconception, and always was. In fact, the term object/relational, as such, is just a piece of marketing hype … As far as I’m concerned, an object/relational system done right would simply be a relational system done right, nothing more and nothing less.

And:

So I think it’s incumbent on people not to say “Tell me the business value of implementing the relational model.” I think they should explain what the business value is of not implementing it. Those who ask “What’s the value of the relational model?” are basically saying “What’s the value of theory?” And I want them to tell me what the value is of not abiding by the theory.

An Interview with Chris Date

Looking for the comments? Try the old layout.

What Advanced SQL Book Should I Buy?

So, what advanced SQL book should I buy? I’ve learned a lot about SQL over the last year or so, but I’m sure that Josh Berkus is tired of being my own personal advanced SQL reference. So I’d like to really learn more about triggers, stored procedures, rules, views, and whatnot, what they’re best used for and when to use them. And other typical database features that I’m not familiar with, of course.

What I don’t need is an introduction to SQL. There are a million of those, and they all have much the same stuff. I want to really get into advanced concepts.

So what’s the best choice? Leave me a comment with your opinion. Thanks!

Looking for the comments? Try the old layout.

Why Bricolage documents have UUIDs

Some time ago, I decided that all objects in Bricolage 2 would have Universally Unique Identifiers, also known as “UUIDs.” A UUID is guaranteed to be universally unique, never to be generated again by the same or any other system now or in the future. As anyone using Bricolage knows, all stories and media already have IDs, so why have UUIDs, as well? How does their purpose differ?

Well, first of all, the existing IDs are not really identifiers. What they are, instead, are primary keys. However, a primary key should ideally be a surrogate key, meaning that it has no other meaning outside of identifying a single database row. Sometimes you can use an “intelligent key,” meaning an attribute of the object being stored (such as a user login), for the primary key. But the problem with intelligent keys is that, should their values ever be changed (say a user’s name changes and company login name conventions dictate that the login must be changed to represent the new name), all foreign key references will be broken. It is therefore easier, and more agile, to use a surrogate key with no inherent meaning to the object with which it is associated.

Now, once you start using an object ID that is actually a surrogate key for something other than identifying a row in a database, you add new meaning to it. At that point, it is no longer a surrogate. In Bricolage, this comes up when users want to use IDs for story URIs. At that point, the ID is no longer just a primary key identifying a database row, but it is also an object identifier. What happens if that identifier changes? Well, in general, it won’t, so you’d be safe to use it for both purposes. But sometimes it does.

When? Some Bricolage users have decided to upgrade to a newer version of Bricolage by setting up the new version on a different server, exporting their data from the old server, and then importing it into the new. This can work reasonably well, but it has what may be an unintended side-effect for those who use the ID in the URI: all objects will get new primary keys when they’re inserted into the new system.

What? you cry! Yes, that’s right. Because the ID is used solely to identify a row in a database, when you insert an existing object into a new database, it gets stored in a new row. It therefore gets a new ID, and your URIs suddenly start to 404. Ouch.

The solution to this problem is to give Bricolage objects a universally unique identifier that can work anywhere, that means nothing other than “this is a unique identifier for this object,” and which are guaranteed not to change when you move an object from one system to another. Happily, the UUID standard exists for just this sort of thing. You are free to use a story’s UUID in its URI without having to worry about it ever changing. IDs may change, but you don’t have to worry about those.

For these reason, the forthcoming Bricolage 1.10.0 has added UUIDs to story and media objects, these being the objects most in need of UUIDs, and they are available for use in URIs. Looking to the future, the Kinetic Platform, currently under development and the platform to which Bricolage 2.0 will be ported, never exposes the primary key IDs at all. There is only the UUID for referencing objects externally. I judge this a very good thing.

Looking for the comments? Try the old layout.

MySQL’s REPLACE Considered Harmful

So we’ve set up a client with an online poll application using MySQL. Polls are created in Bricolage, and when they’re published, rather than writing data to files, the template writes data to the MySQL database. PHP code on the front-end server then uses the database records to manage the polls.

On the recommendation of one of my colleagues, I was using the MySQL REPLACE statement to insert and update poll answers in the database. At first, this seemed like a cool idea. All I had to do was create a unique index on the story_id and ord (for answer order) columns and I was set. Any time someone reordered the answers or changed their wording in Bricolage, the REPLACE statement would change the appropriate records and just do the right thing.

Or so I thought.

Come the day after the launch of the new site, I get a complaint from the customer that the percentage spread between the answers doesn’t add up to 100%. After some investigation, I realized that the poll_results table is using the ID of each question to identify the votes submitted by readers. This makes sense, of course, and is excellent relational practice, but I have overlooked the fact that REPLACE essentially replaces rows every time it is used. This means that even when a poll answer hasn’t changed, it gets a new ID. Yes, that’s right, its primary key value was changing. Yow!

Now we might have caught this earlier, but the database was developed on MySQL 3.23.58 and, as is conventional among MySQL developers, there were no foreign key constraints. So the poll results were still happily pointing to non-existent records. So a poll might appear to have 800 votes, but the percentages might be counted for only 50 votes. Hence the problem with the percentages not adding up to 100% (nowhere near it, in fact).

Fortunately, the production application is on a MySQL 4.1 server, so I made a number of changes to correct this issue:

  • Added foreign key constraints
  • Exploited a little-known (mis)feature of Bricolage to store primary keys for all poll answers (and questions, for that matter)
  • Switched from REPLACE to INSERT, UPDATE, and DELETE statements using the primary keys

I also started using transactions when making all these updates when a poll is published so that changes are always atomic. Now it works beautifully.

But the lesson learned is that REPLACE is a harmful construct. Yes, it was my responsibility to recognize that it would create new rows and therefore new primary keys. But any construct that changes primary keys should be stricken from any database developer’s toolbox. The fact that MySQL convention omits the use of foreign key constraints makes this a particularly serious issue that can appear to have mysterious consequences.

So my advice to you, gentle reader, is don’t use it.

Looking for the comments? Try the old layout.

Enforce Foreign Key Integrity in SQLite with Triggers

After some some Googling and experimentation, I’ve figured out how to enforce foreign key constraints in SQLite. I got most of the code from Cody Pisto’s sqlite_fk utility. I couldn’t get it to work, but the essential code for the triggers was in its fk.c file, so I just borrowed from that (public domain) code to figure it out.

Since I couldn’t find documentation for this elsewhere on the Net (though I’m sure it exists somewhere), I decided to just put an example here. Interested? Read on!

Say you have these two table declarations:

create table foo (
    id INTEGER NOT NULL PRIMARY KEY
);

CREATE TABLE bar (
    id INTEGER NOT NULL PRIMARY KEY,
    foo_id INTEGER NOT NULL
            CONSTRAINT fk_foo_id REFERENCES a(id) ON DELETE CASCADE
);

Table bar has a foreign key reference to the primary key column in the foo table. Although SQLite supports this syntax (as well as named foreign key constraints), it ignores them. So if you want the references enforced, you need to create triggers to do the job. Triggers were added to SQLite version 2.5, so most users can take advantage of this feature. Each constraint must have three triggers: one for INSERTs, one for UPDATESs, and one for DELETESs. The INSERT trigger looks like this:

CREATE TRIGGER fki_bar_foo_id
BEFORE INSERT ON bar
FOR EACH ROW BEGIN 
    SELECT CASE
        WHEN ((SELECT id FROM foo WHERE id = NEW.foo_id) IS NULL)
        THEN RAISE(ABORT, 'insert on table "bar" violates foreign key '
                || 'constraint "fk_foo_id"')
    END;
END;

(You can put the RAISE error string all on one line; I’ve concatenated two lines to keep line lengths reasonable here.) If your foreign key column is not NOT NULL, the trigger’s SELECT CASE clause needs to an extra case:

CREATE TRIGGER fki_bar_foo_id
BEFORE INSERT ON bar
FOR EACH ROW BEGIN 
    SELECT CASE
        WHEN ((new.foo_id IS NOT NULL)
            AND ((SELECT id FROM foo WHERE id = new.foo_id) IS NULL))
        THEN RAISE(ABORT, 'insert on table "bar" violates foreign key '
                || 'constraint "fk_foo_id"')
    END;
END;

The UPDATE statements are almost identical; if your foreign key column is NOT NULL, then do this:

CREATE TRIGGER fku_bar_foo_id
BEFORE UPDATE ON bar
FOR EACH ROW BEGIN 
    SELECT CASE
        WHEN ((SELECT id FROM foo WHERE id = new.foo_id) IS NULL))
        THEN RAISE(ABORT, 'update on table "bar" violates foreign key '
                || 'constraint "fk_foo_id"')
    END;
END;

And if NULLs are allowed, do this:

CREATE TRIGGER fku_bar_foo_id
BEFORE UPDATE ON bar
FOR EACH ROW BEGIN 
    SELECT CASE
        WHEN ((new.foo_id IS NOT NULL)
            AND ((SELECT id FROM foo WHERE id = new.foo_id) IS NULL))
        THEN RAISE(ABORT, 'update on table "bar" violates foreign key '
                || 'constraint "fk_foo_id"')
    END;
END;

The DELETE trigger is, of course, the reverse of the INSERT and UPDATE triggers, in that it applies to the primary key table, rather than the foreign key table. To whit, in our example, it watches for DELETEs on the foo table:

CREATE TRIGGER fkd_bar_foo_id
BEFORE DELETE ON foo
FOR EACH ROW BEGIN 
    SELECT CASE
    WHEN ((SELECT foo_id FROM bar WHERE foo_id = OLD.id) IS NOT NULL)
    THEN RAISE(ABORT, 'delete on table "foo" violates foreign key '
                || ' constraint "fk_foo_id"')
    END;
END;

This trigger will prevent DELETEs on the foo table when there are existing foreign key references in the bar table. This is generally the default behavior in databases with referential integrity enforcement, sometimes specified explicitly as ON DELETE RESTRICT. But sometimes you want the deletes in the primary key table to “cascade” to the foreign key tables. Such is what our example declaration above specifies, and this is the trigger to to the job:

CREATE TRIGGER fkd_bar_foo_id
BEFORE DELETE ON foo
FOR EACH ROW BEGIN 
    DELETE from bar WHERE foo_id = OLD.id;
END;

Pretty simple, eh? The trigger support in SQLite is great for building your own referential integrity checks. Hopefully, these examples will get you started down the path of creating your own.

Looking for the comments? Try the old layout.

How to Determine Your Transaction ID

Today I had reason to find out what PostgreSQL transaction I was in the middle of at any given moment in Bricolage. Why? I wanted to make sure that a single request was generating multiple transactions, instead of the normal one. It’s a long story, but suffice it to say that lengthy transactions were causing deadlocks. Read this if you’re really interested.

Anyway, here’s how to determine your current transaction using DBI. The query will be the same for any client, of course.

my $sth = $dbh->prepare(qq{
    SELECT transaction
    FROM   pg_locks
    WHERE  pid = pg_backend_pid()
            AND transaction IS NOT NULL
    LIMIT  1
});

$sth->execute;
$sth->bind_columns(\my $txid);
$sth->fetch;
print "Transaction: $txid\n";

Looking for the comments? Try the old layout.