# pgTAP Set-Testing Update

I’ve been thinking more about testing SQL result sets and how to name functions that do such testing, and I’ve started to come to some conclusions. Some of the constraints I’m looking at:

Cursors are required for tests where the order of the results returned is important. It might be best for such functions to create the cursors themselves.

For comparisons where order isn’t important, the results of each SQL statement must be inserted into a temporary table and then the table used for the comparisons. Otherwise, each statement would be executed twice, as is required to calculate symmetric difference. By executing each once and storing the results in a temporary table, we get around this issue (and indeed, this one of the cases that Epic’s

`global()`

function addresses).When order is not important, the most efficient way to compare result sets is with symmetric difference. However, said comparison is a set comparison, meaning that duplicate rows are ignored. So if set A has 3 rows and set B has four, but two of those four are identical, then sets A and B can still be equivalent.

I’m starting to think that I would have two basic result set testing functions,
`set_eq()`

and `bag_eq()`

. The former would do a set comparison, while the
latter would require that duplicate rows be present in both result sets.
Unfortunately, that would mean that it would be difficult for `set_eq`

to have a
variation that tests ordered sets, as symmetric difference ignores relational
ordering. And a simple `bag_eq()`

function would require that the relations be
ordered, as it would iterate over each row in each relation in turn and compare
row to row. But as I pointed out to commenter “@dave0,” bags
are not inherently ordered, so it would be imposing a requirement that’s not
necessarily appropriate.

This is starting to drive me a bit nuts.

I think that there are ways to enforce an ordered comparison on a set and an unordered comparison on a bag, but both would be pretty inefficient. Maybe I should do it anyway, include the appropriate caveats in the documentation, and then improve when feasible in the future. In that case, what I’d be looking at is something like this:

`set_eq( sql, sql )`

- Test for set equivalence of two SQL statements.
`oset_eq( sql, sql )`

- Test for ordered set equivalence of two SQL statements.
`bag_eq( sql, sql )`

- Test for row equivalence of two SQL statements.
`obag_eq( sql, sql )`

- Test for ordered row equivalence of two SQL statements.

The preferred tests would be `set_eq()`

and `obag_eq()`

. If a single word is
passed to any of these functions, it’s assumed to be a prepared statement.
Cursors would be created internally for the functions that require ordered
comparison. The non-ordered versions would create temporary tables to hold the
values and then use those tables for the comparisons. `bag_eq()`

would also
construct cursors on the temporary tables to ensure that rows could be compared
in the same order in which they were generated by the SQL statement.

Interface-wise, perhaps a boolean would be preferred to indicate whether or not to compare the rows in an ordered fashion? That would be:

`set_eq( sql, sql, bool )`

- Test for set equivalence of two SQL statements. The sets must be in the same order if the boolean argument is true.
`bag_eq( sql, sql, bool )`

- Test for row equivalence of two SQL statements. The bags must have their rows in the same order if the boolean argument is true.

I like that there are fewer functions this way, but is it harder to remember what the boolean is for? (It would not be required, and would default to false for both functions). Thoughts?

By the way, I would likely throw in a couple of other resultset-comparing functions:

`set_includes( sql, sql )`

- Test that the set returned by the first statement includes the rows returned by the second statement.
`set_excludes( sql, sql )`

- Test that the set returned by the first statement excludes the rows returned by the second statement.
`bag_includes( sql, sql )`

- Test that the bag returned by the first statement includes the rows returned by the second statement, including duplicates.
`bag_excludes( sql, sql )`

- Test that the bag returned by the first statement excludes the rows returned by the second statement, including duplicates.

Seem useful? Please leave a comment with your thoughts.

Looking for the comments? Try the old layout.