Just a Theory

Black lives matter

Posts about API

RFC: Restful Secondary Key API

I’ve been working on a simple CRUD API at work, with an eye to make a nicely-designed REST interface for managing a single type of resource. It’s not a complicated API, following best practices recommended by Apigee and Microsoft. It features exactly the sorts for APIs you’d expect if you’re familiar with REST, including:

  • POST /users: Create a new user resource
  • GET /users/{uid}: Read a user resource
  • PUT /users/{uid}: Update a user resource
  • DELETE /users/{uid}: Delete a user resource
  • GET /users?{params}: Search for user resources

If you’re familiar with REST, you get the idea.

There is one requirement that proved a bit of design challenge. We will be creating canonical ID for all resources managed by the service, which will function as the primary key. The APIs above reference that key by the {uid} path variable. However, we also need to support fetching a single resource by a number of existing identifiers, including multiple legacy IDs, and natural keys like, sticking to the users example, usernames and email addresses. Unlike the search API, which returns an array of resources, we need a nice single API like GET /users/{uid} that returns a single resource, but for a secondary key. What should it look like?

None of my initial proposals were great (using username as the sample secondary key, though again, we need to support a bunch of these):

  • GET /users?username={username} — consistent with search, but does it return a collection like search or just a single entry like GET /users/{uid}? Would be weird not to return an array or not based on which parameters were used.
  • GET /users/by/username/{username} — bit weird to put a preposition in the URL. Besides, it might conflict with a planned API to fetch subsets of info for a single resource, e.g., GET /users/{uid}/profile, which might return just the profile object.
  • GET /user?username={username} — Too subtle to have the singular rather than plural, but perhaps the most REST-ish.
  • GET /lookup?obj=user&username={username} Use special verb, not very RESTful

I asked around a coding Slack, posting a few possibilities, and friendly API designers suggested some others. We agreed it was an interesting problem, easily solved if there was just one alternate that never conflicts with the primary key ID, such as GET /users/{uid || username}. But of course that’s not the problem we have: there are a bunch of these fields, and they may well overlap!

There was some interest in GET /users/by/username/{username} as an aesthetically-pleasing URL, plus it allows for

  • /by => list of unique fields
  • /by/username/ => list of all usernames?

But again, it runs up against the planned use of subdirectories to return sub-objects of a resource. One other I played around with was: GET /users/user?username={username}: The user sub-path indicates we want just one user much more than /by does, and it’s unlikely we’d ever use user to name an object in a user resource. But still, it overloads the path to mean one thing when it’s user and another when it’s a UID.

Looking back through the options, I realized that what we really want is an API that is identical to GET /users/{uid} in its behaviors and response, just with a different key. So what if we just keep using that, as originally suggested by a colleague as GET /users/{uid || username} but instead of just the raw value, we encode the key name in the URL. Turns out, colons (:) are valid in paths, so I defined this route:

  • GET /users/{key}:{value}: Fetch a single resource by looking up the {key} with the {value}. Supported {key} params are legacy_id, username, email_address, and even uid. This then becomes the canonical “look up a user resource by an ID” API.

The nice thing about this API is that it’s consistent: all keys are treated the same, as long as no key name contains a colon. Best of all, we can keep the original GET /users/{uid} API around as an alias for GET /users/uid:{value}. Or, better, continue to refer to it as the canonical path, since the PUT and DELETE actions map only to it, and document the GET /users/{key}:{value} API as accessing an alias for symlink for GET /users/{uid}. Perhaps return a Location header to the canonical URL, too?

In any event, as far as I can tell this is a unique design, so maybe it’s too weird or not properly RESTful? Would love to know of any other patterns designed to solve the problem of supporting arbitrarily-named secondary unique keys. What do you think?

Custom SQLite Aggregates in Perl

About a year ago, Josh Berkus was reviewing some Bricolage SQL code, looking to optimize it for PostgreSQL. One of the things he noticed was that we were fetching a lot more rows for an object than we needed to. The reason for this is that an object might be associated with one or more groups, and to get back a list of all of the group IDs, we were getting multiple rows. For example, if I wanted to fetch a single story with the ID 10, I might get back rows like this:

SELECT s.id, s.title, grp.id
FROM   story s, member m, grp g
WHERE  s.id = m.story_id
       AND m.grp_id = g.id
       AND s.id = 10;
s.id |        s.title      | grp.id
-----+---------------------+--------
  10 | The Princess Bride  | 23
  10 | The Princess Bride  | 24
  10 | The Princess Bride  | 25
  10 | The Princess Bride  | 26
  10 | The Princess Bride  | 27

Now, that’s a lot of extra data to have to fetch for just a single row to be different; it’s very wasteful, really. So Josh said, “Why don’t you use a custom aggregate for that?” I knew nothing about aggregates, but I did some research, and figured out how to write PostgreSQL custom aggregates in SQL. I wrote a very simple one, called id_list(), that joins up all of the values in a column with an empty space. The aggregate code looks like this:

CREATE   FUNCTION append_id(TEXT, INTEGER)
RETURNS  TEXT AS '
    SELECT CASE WHEN $2 = 0 THEN
                $1
            ELSE
                $1 || '' '' || CAST($2 AS TEXT)
            END;'
LANGUAGE 'sql'
WITH     (ISCACHABLE, ISSTRICT);

CREATE AGGREGATE id_list (
    SFUNC    = append_id,
    BASETYPE = INTEGER,
    STYPE    = TEXT,
    INITCOND = ''
);

Now I was able to vastly simplify the results returned by the query:

SELECT s.id, s.title, id_list(grp.id)
FROM   story s, member m, grp g
WHERE  s.id = m.story_id
       AND m.grp_id = g.id
       AND s.id = 10;
GROUP BY s.id, s.title
s.id |        s.title      | id_list
-----+---------------------+---------------
  10 | The Princess Bride  | 23 24 25 26 27

So then I just had to split the id_list column on the white space and I was ready to go. Cool!

So recently, was thinking about how I might do something similar in SQLite. It turns out that SQLite has a way to add custom aggregates, too, via its sqlite_add_function function. But I don’t know C, and had been wondering for a while how, even if I figured out how to write an aggregate function in C, whether I would have to require users to compile SQLite with my C aggregate in order to get it to work.

However, as a Perl developer, I thought it might be worthwhile to just quickly check the DBD::SQLite docs might have to say on the matter. And it turns out that the ability to add aggregates to SQLite is supported in DBD::SQLite via the create_aggregate custom function. And what’s more, the aggregate can be written in Perl! Whoa! I couldn’t believe that it could be that easy, but a quick test script demonstrated that it is:

#!/usr/bin/perl -w

use strict;

use DBI;

my $dbfile = shift;
my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile", '', '');

END {
    $dbh->disconnect;
    unlink $dbfile;
};

$dbh->do('CREATE TABLE foo (a int)');
$dbh->do('BEGIN');
$dbh->do('INSERT INTO foo (a) VALUES (?)', undef, $_) for 1..10;
$dbh->do('COMMIT');

# Create a new aggregate.
$dbh->func('joiner', 1, 'My::Join', 'create_aggregate');

my $sel = $dbh->prepare(q{SELECT joiner(a) FROM foo});
$sel->execute;
$sel->bind_columns(\my $val);
print "$val\n" while $sel->fetch;

The first argument to create_aggregate() (itself invoked via the DBI func() method) the name of the aggregate, the second is the number of arguments to the aggregate (use -1 for an unlimited number), and the third is the name of a Perl class that implements the aggregate. That class needs just three methods: new(), an object constructor; step(), called for each aggregate row, and finalize, which must return the value calculated by the aggregate. My simple implementation looks like this:

package My::Join;

sub new { bless [] }
sub step {
    my $self = shift;
    push @$self, @_;
}
sub finalize {
    my $self = shift;
    return join q{ }, @$self;
}

Yep, that’s really it! When I run the script, the output looks like this:

% try foo
1 2 3 4 5 6 7 8 9 10

Keen! I mean, that is just so slick! And it really demonstrates the power of SQLite as an embeddable database, as well. Thanks Matt, for making the SQLite API available to us mere mortal Perl developers!

Looking for the comments? Try the old layout.

bricolage.cc Has a New Server

The Bricolage project has a new server! Thanks to a generous donation from NetStumbler.com, we have a shiny new hosted FreeBSD server. I’ve started building it, and have migrated the Bricolage Web site over today. By tomorrow, DNS records should be updated, and it’ll be rarin’ to go!

While I was at it, I rolled out the new Bricolage API documentation browser. There are now browser available for all of the major stable releases of Bricolage, listed on the old documentation page. The API browser for the current stable release will always be available here. Meanwhile, the documentation will also be generated from the Subversion trunk every morning; you can find those docs here.

We’ll also be able to keep better track of the kind of traffic the site gets thanks to the new stats site, which will also be updated daily.

Enjoy!

Looking for the comments? Try the old layout.