Need Help Naming Result Set Testing Functions

I’ve been thinking more since I posted about testing SQL result sets, and I think I’ve settled on two sets of functions for pgTAP: one that tests two SQL queries (though you will be encouraged to use a prepared statement), and one to test two cursors. I’m thinking of naming them:

  • query_gets()
  • cursor_gets()

I had been planning on *_returns() or *_yields(), but they didn’t feel right. “Returns” implies that I would be passing a query and a data structure (to me at least), and while I want to support that, too, it’s not what I was looking for right now. “Yield,” on the other hand, is more related to set-returning functions in my mind (even if PL/pgSQL doesn’t use that term). Anyway, I like the use of “gets” because it’s short and pretty unambiguous.

These function will compare query results as unordered sets, but I want variants that test ordered sets, as well. I’ve been struggling to come up with a decent name for these variants, but not liking any very well. The obvious ones are:

  • ordered_query_gets()
  • ordered_cursor_gets()

And:

  • sorted_query_gets()
  • sorted_cursor_gets()

But these are kind of long for functions that will be, I believe, used frequently. I could just add a character to get the same idea, in the spirit of sprintf:

  • oquery_gets()
  • ocursor_gets()

Or:

  • squery_gets()
  • scursor_gets()

I think that these are okay, but might be somewhat confusing. I think that the “s” variant probably won’t fly, since for sprintf and friends, the “s” stands for “string.” So I’m leaning towards the “o” variants.

But I’m throwing it out there for the masses to make suggestions: Got any ideas for better function names? Are there some relational terms for ordered sets, for example, that might make more sense? What do you think?

As a side note, I’m also considering:

  • col_is() to compare the result of a single column query to an array or other query. This would need an ordered variant, as well.
  • row_is(), although I have no idea how I’d be able to support passing a row expression to a function, since PostgreSQL doesn’t allow RECORDs to be passed to functions.
  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Comments & Trackbacks

dave0 wrote:

My immediate thought was that something similar to the bag() or set() functions used by Perl's Test::Deep would be useful here. If that's not feasable in pgTAP, perhaps you could just name the comparison methods using _set and _bag suffixes?

Theory wrote:

@dave0—Well, no, because they’re all bags. A set is an unordered collection of things with no duplicates. A bag can have duplicates. SQL always allowed duplicates unless you use DISTINCT. The test functions don't really care about that, though; they just need to know whether or not the order of the results is important.

—Theory

Alex Clay wrote:

What about using the existing "matches" vernacular? If you wanted to built an assertion that handles either queries or cursors, how about:

results_match

Otherwise, maybe:

query_results_match cursor_results_match

A couple other ideas are "points_to" or maybe "finds".

studdugie wrote:

Don't go with shorter yet ambiguous naming. Stick to longer but unambiguous names. Typing a couple more characters never killed any programmer (but if it does none of us where in the room and therefore cannot be held accountable).

I like Alex's *match suggestion.

gabrielle wrote:

+1 for results_match

Theory wrote:

Hrm…I don't think that match is a good choice, as there is already a matches() function that compares a scalar value to a regular expression. But _finds() is pretty nice.

So I'm thinking:

  • query_finds()
  • cursor_finds()
  • sorted_query_finds()
  • sorted_cursor_finds()

I like results_*, but then there's no distinction between functions for queries and those for cursors. Maybe I should just choose one or the other and not offer both? If so, then, if I can't use "match", well maybe just "are"?

  • results_are()
  • sorted_results_are()

Thoughts?

Thanks for the feedback, everyone!

Matthew wrote:

I like what you are doing here!

The naming problem seems to stem from an ambiguity of what you are trying to assert. *_gets suggests an object will be returned, because you are using language typical to OO set_ and get_* methodology. But now we are in the realm of structured data, and the types of assertions are perhaps more varried:

  1. I like *_yields as a means to identify the returned data STRUCTURE, as opposed to the actual data. It smacks of "factory" language and seems appropriately abstract.

    SELECT results_yield( $SUBJECT$
                            SELECT * 
                            FROM get_my_type(x,y,'z')
                            LIMIT 1
                          $SUBJECT$
                        ,
                          "my_type"
                        )
  2. *_intersect might be used to assert that the result set provides data that partially matches a control set. This match would have to be field-to-field for 1+ record from both sets. Order is meaningless here, but the term is unambiguous.

    SELECT results_intersect( $SUBJECT$
                                SELECT * 
                                FROM get_my_stuff(x,y,'z')
                                LIMIT 5
                              $SUBJECT$
                            ,
                            ["control_class" 
                              |
                             $CONTROL$ 
                                SELECT * 
                                FROM my_control_class
                             $CONTROL$ 
                            ]
                        );
  3. *_union speaks loudly of a full match between the test subject and a control set. Order is not even suggested, but the term is unambiguous.

    SELECT results_union( $SUBJECT$
                                SELECT * 
                                FROM get_my_stuff(x,y,'z')
                                LIMIT 5
                          $SUBJECT$
                          ,
                          [ "control_class" 
                             |
                            $$ executable statement $$ 
                          ]
                        );
  4. *_maps_to is a bit verbose, but also unambiguous; *_are / *_is are a pretty bold assertions (but that might be called for!) for asserting that there is a record by record, field by field match between the test subject and the control.

Curious how you could handle passing the control as a parameter: would you allow an executable statement, or simply a table/view name? A third parameter might be used to identify which fields from control to sort on, where a positive value means ASC and a negative means DESC:

SELECT results_map_to( $SUBJECT$ 
                         SELECT * 
                         FROM my_func('a','b','c')
                         ORDER BY 1, 3 DESC, 2
                       $SUBJECT$
                      , my_control_class
                      , ARRAY[1,-3,2]
                     );

VS

SELECT results_map_to( $SUBJECT$ 
                          SELECT * 
                          FROM my_func('a','b','c')
                          ORDER BY 1, 3 DESC, 2
                        $SUBJECT$
                      , $CONTROL$
                         SELECT * 
                         FROM my_control_class
                         ORDER BY 1, 3 DESC, 2 
                        $CONTROL$
                     );

Parting thought (as I am just getting into pgTAP): Have you implemented or planned to implement an assertion for a function's input parameters (IE, to go along with *_yields)? Might be something like:

SELECT input_takes( "my_function"
                  , "my_type"
                  );

and/or

SELECT input_takes( "my_function"
                  , ARRAY[ "overloaded_function_type1"
                         , "overloaded_function_type_2"
                         , "overloaded_function_type_n"
                         ]
                  );

Which implies that you need:

SELECT is_overloaded("my_function");

Thanks!

Theory wrote:

@Matthew—Thanks for the terrific and detailed reply! Let me take each of your points in turn.

  1. I like *_yields too, but I'm already using *_returns for this, as in function_returns(). So if I write a similar function to determine the row type of a query it will likely be called query_returns() or something similar.

  2. Ooh, yeah, *_intersect is definitely useful. I'll put that to use, too. But what's with the "control_class" bit and the array in your exmple there?

  3. To me, *_union does not mean that two result sets match. A union is kind of like a set concatenation: it contains every tuple in every relation it unites. That's all. Doin a but more research, I think that "set equivalence" is actually the term I've been looking for, so I'll likely use _eq or _equal for this purpose.

  4. I'm not sure about *_maps_to. That sounds like a representation of types again.

As for how I'll do things, I'm planning to allow two SQL statements to be passed, or one SQL statement and one (possibly nested) array. And yes, has_function() does what you ask, although I'm not sure what is_overloaded() would be for.

Since I was slow to reply to this comment (was at OS Bridge last week), I've extended the comment period another week to give folks more of a chance to reply. But please do also see my latest thinking on this stuff, as well.

—Theory

Ben Finney wrote:

Hrm…I don't think that match is a good choice, as there is already a matches() function that compares a scalar value to a regular expression.

Why does that pose a problem? It still describes the semantic: "does the result set match this one?"

I'm still in favour of 'foo_matches' of all the proposals here.

Theory wrote:

@Ben—

It works, but it's different from the existing use of *_match, which explicitly requires a regular expression. I want to keep that connection explicit and not overload the term "match" in the API.

Thanks!

Theory

Discussion is now closed.

Powered by KinoSearch