Neither NULL nor NOT NULL: An SQL WTF

While working on result set testing functions, I ran into a bit of weirdness when comparing rows between two cursors. I had code that looked more or less like this:

FETCH have INTO rec_have;
FETCH want INTO rec_want;
WHILE rec_have IS NOT NULL OR rec_want IS NOT NULL LOOP
    IF rec_have IS DISTINCT FROM rec_want THEN
        RETURN FALSE;
    END IF;
    FETCH have INTO rec_have;
    FETCH want INTO rec_want;
END LOOP;
RETURN TRUE;

Basically, the idea is to return true if the two cursors return equivalent rows in the same order. However, things started to get weird when any of the rows included a NULL: it seemed that the loop exited as soon as a NULL was encountered, even if there were also non-NULL values in the row. I poked around a bit and discovered, to my astonishment, that such a record is neither NULL nor NOT NULL:

try=# select ROW(1, NULL) IS NULL;
 ?column? 
----------
 f
(1 row)

try=# select ROW(1, NULL) IS NOT NULL;
 ?column? 
----------
 f
(1 row)

I had thought that a value, even a composite value, had to be either NULL or NOT NULL, so I thought it was a bug. I mean, this isn't possible, is it? I dutifully asked on the pgsql-hackers list and was informed, to further astonishment, that this is, in fact, mandated by the SQL standard. WTF? As Jeff says, “The standard is what it is. If it says that some NULLs are red and some NULLs are blue, then we'd probably support it.”

Through the discussion, I learned that a record is considered NULL only if all of its values are NULL, and it's considered NOT NULL only if none of it s values are NULL:

try=# select ROW(NULL, NULL) IS NULL;
 ?column? 
----------
 t
(1 row)

try=# select ROW(1, 1) IS NOT NULL;
 ?column? 
----------
 t

The upshot is that composite values with at least one NULL and at least one NOT NULL value are ambiguous. It's insane, but there you have it.

Jeff thought that you could cheat the standard by moving the NOT in front of the value before checking its NULLness. I changed my code to reflect this, and things got better:

FETCH have INTO rec_have;
FETCH want INTO rec_want;
WHILE NOT rec_have IS NULL OR NOT rec_want IS NULL LOOP
    IF rec_have IS DISTINCT FROM rec_want THEN
        RETURN FALSE;
    END IF;
    FETCH have INTO rec_have;
    FETCH want INTO rec_want;
END LOOP;
RETURN TRUE;

Kind of confusing to read, but at least it's not too ugly. In truth, however, it's still inconsistent: it just makes it so that such records are both NULL and NOT NULL:

try=# select NOT ROW(1, NULL) IS NULL;
 ?column? 
----------
 t
(1 row)

try=# select NOT ROW(1, NULL) IS NOT NULL;
 ?column? 
----------
 t
(1 row)

But it was good enough for me. For a while. But then I started testing the pathological scenario where a row contains only NULLs. I call it pathological because no one should ever have rows with only NULLs. But the truth is that the SQL standard allows it (despite the objections of relational theorists) and, I've little doubt, application developers get such rows all the time.

The problem with such rows is that they are inherently NULL, but, according to the documentation for the use of cursors in PL/pgSQL, when fetching rows from a cursor, “if there is no next row, the target is set to NULL(s).” The upshot is that, because I'm using a WHILE loop to fetch rows from a cursor, and rows with only NULLs are themselves considered NULL, there is no way to tell the difference between a row that contains NULLs and the end of a cursor.

To demonstrate, I sent an example of two functions that process a cursor, one using a plain PL/pgSQL FOR rec IN stuff LOOP, which internally detects the difference between rows full of NULLs and the end of the cursor, and one using the WHILE NOT rec IS NULL LOOP syntax required by the pgTAP testing functions. The output looked like this:

    dob     |     ssn
------------+-------------
 1965-12-31 |
            |
            | 932-45-3456
 1963-03-23 | 123-45-6789
(4 rows)

    dob     | ssn
------------+-----
 1965-12-31 |
(1 row)

The two functions are processing the same query in cursors, but while the FOR loop properly returned all four rows, the WHILE loop stopped when it hit a row with only NULLs. I found this annoying, to say the least. Fortunately, other folks were paying better attention to the docs, pointing out that the special PL/pgSQL FOUND variable does just the trick, being set to TRUE when a row is fetched, even if the row is all NULLs, and false then there are no more rows in the cursor. In fact, had I read two more sentences in the relevant documentation, I would have noticed that it says, “As with SELECT INTO, the special variable FOUND can be checked to see whether a row was obtained or not.” D'oh!

So now my function looks more or less like this:

FETCH have INTO rec_have;
have_found := FOUND;
FETCH want INTO rec_want;
want_found := FOUND;
WHILE have_found OR want_found LOOP
    IF rec_have IS DISTINCT FROM rec_want THEN
        RETURN FALSE;
    END IF;
    FETCH have INTO rec_have;
    have_found := FOUND;
    FETCH want INTO rec_want;
    want_found := FOUND;
END LOOP;
RETURN TRUE;

Yeah, pretty verbose and full of a lot of explicit processing that I can just take for granted in more sane languages, but it does the trick. Don'tcha just love SQL?

That issue behind me, I'll do a bit more hacking on it this week, and hopefully I'll get a release of pgTAP out with the new result set testing support before I leave for vacation early next week.

Backtalk

Aristotle Pagaltzis wrote:

And here I thought that sort of insanity was reserved for MySQL. The mind boggles.

Erik wrote:

Oracle-style syntax would make it much more readable, e.g.

FETCH have INTO rec_have;
FETCH want INTO rec_want;
WHILE have%found OR want%found LOOP
    IF rec_have IS DISTINCT FROM rec_want THEN
        RETURN FALSE;
    END IF;
    FETCH have INTO rec_have;
    FETCH want INTO rec_want;
END LOOP;
RETURN TRUE;

or even

LOOP
    FETCH have INTO rec_have;
    FETCH want INTO rec_want;
    EXIT WHEN have%notfound OR want%notfound
    IF rec_have IS DISTINCT FROM rec_want THEN
        RETURN FALSE;
    END IF;
END LOOP;
RETURN TRUE;

Erik wrote:

I botched my second example a little, but I hope the concept is clear. Oracle cursor attributes are documented here.

Theory wrote:

Some Replies

@Aristotle—

Where the SQL spec is clear but bizarre, the implementation in PostgreSQL will be bizarre. Where the standard is ambiguous or intuitive, PostgreSQL's implementation tends to be intuitive. This is clearly an example of SQL spec pathology.

@Erik—

Oh, yeah, cursor attributes look nice, thanks for the link!

—Theory

Aristotle Pagaltzis wrote:

I never faulted Postgres.

Jeff Davis wrote:

The trick I was suggesting is to detect this specific case by doing something like:

not x is null and not x is not null

which would have to be made into a very ugly compound statement:

(not x is null and not x is not null or x is not null) OR (not y is null and not y is not null or y is not null)

[ might have made a mistake above, that is just too crazy ]

Theory wrote:

@Jeff—

I'm pretty sure I tried that and and it didn't work, because that variation just checks to see if a row is in the ambiguous state (both NULL and NOT NULL). It's the logical equivalent of:

x IS NOT NULL AND x IS NULL

Each variation I tried, aside from FOUND, failed to work properly in all circumstances. :-(

—Theory

Roland Bouman wrote:

Nice post, interesting stuff :)

For the record, the relevant part of the standard is:

ISO/IEC 9075-2:2003 (E) 8.7

studdugie wrote:

Your post is quite apropos to my current book. I'm reading, "Database In Depth", by C.J. Date; and I'm currently on the section where he is discussing the evils of NULL and how it violates the relational model.

I'm nowhere near the end of the book but so far I've been struck by the clarity of the relational model, as explained by Date. So now I'm scratching my head wondering, why is the Standard Query Language so far off from the relational model in such important areas as NULL[ability] and duplicates?

I think it premature to recommend it now because I haven't finished it yet but I plan on doing a full review on my blog when I'm done.