Just a Theory

By David E. Wheeler

Benchmarking PostgreSQL Functions

Update 2006-05-19: I realized that there was a nasty error in my algorithm for determining the runtime of a function: It was only fetching the milliseconds part of the runtime, without adding in seconds and minutes! This led to getting negative runtimes then the milliseconds part of the end time was less than the milliseconds part of the start time. Ugh. But with the help of yain on IRC, I’ve switched to calculating the number of seconds by converting the start and end times to epoch seconds (which have subsecond precision in PostgreSQL, and now things are just dandy. While I was at it, I reorganized the function so that it was a bit easier to read, by constructing the created function in the order it would be executed, and fixed the caching problem, as suggested by Aidan in a comment below.

Following yesterday’s post, Klint Gore sent me some PL/pgSQL code that might be useable as a benchmark function. Today I took that code and ran with it.

The idea was to create a function like the Perl Benchmark module’s timethese() function. In the process, I found, with help from Josh Berkus, that PL/pgSQL’s EXECUTE statement has quite a lot of overhead, and the amount of overhead per call is pretty random. The overhead resulted in pretty inaccurate benchmark numbers, unfortunately.

At Josh’s suggestion, I rewrote the function to just test each function inline, rather than passing the function code as parameters. This time, the results were dead on. So then I refactored the original benchmark function to create its own benchmark function, inlining all of the code, and then call that function. Almost higher order PL/pgSQL! Again the results were just right, and so now I present it to you:

create type _benchmark as (
    code      text,
    runtime   real,
    corrected real

RETURNS SETOF _benchmark AS $$
    code TEXT := '';
    a    _benchmark;
    -- Start building the custom benchmarking function.
    code := $_$
        RETURNS SETOF _benchmark AS $__$
            s TIMESTAMP;
            e TIMESTAMP;
            a RECORD;
            d numeric;
            res numeric;
            ret _benchmark;
            -- Create control.
            s := timeofday();
            FOR a IN SELECT TRUE FROM generate_series( 1, $_$ || n || $_$ )
            END LOOP;
            e := timeofday();
            d := extract(epoch from e) - extract(epoch from s);
            ret := ROW( '[Control]', d, 0 );
            RETURN NEXT ret;
    -- Append the code to bench each function call.
    FOR i IN array_lower(funcs,1) .. array_upper(funcs, 1) LOOP
        code := code || '
            s := timeofday();
            FOR a IN SELECT ' || funcs[i] || ' FROM generate_series( 1, '
                || n || $__$ ) LOOP
            END LOOP;
            e := timeofday();
            res := extract(epoch from e) - extract(epoch from s);
            ret := ROW(
                $__$ || quote_literal(funcs[i]) || $__$,
                res - d
            RETURN NEXT ret;

    -- Create the function.
    execute code || $_$
        $__$ language plpgsql;

    -- Now execute the function.
    FOR a IN EXECUTE 'SELECT * FROM _bench(' || n || ')' LOOP
        RETURN NEXT a;

    -- Drop the function.
    DROP FUNCTION _bench(integer);
$$ language 'plpgsql';

You call the function like this:

try=# select * from benchmark(10000, ARRAY[
try(#     'ean_substr(''036000291452'')',
try(#     'ean_byte(''036000291452'')',
try(#     'ean_c(''036000291452'')'
try(# ]);
            code            | runtime   | corrected 
    [Control]                  | 0.0237451 |          0
    ean_substr('036000291452') |  0.497734 |   0.473989
    ean_byte(  '036000291452') |  0.394456 |   0.370711
    ean_c(     '036000291452') | 0.0277281 | 0.00398302
(4 rows)

Pretty slick, eh? The only downside was that, when the DROP FUNCTION line was not commented out, the function would run once, and then, the next time, I’d get this error:

ERROR:  cache lookup failed for function 17323
CONTEXT:  PL/pgSQL function "benchmark" line 49 at for over select rows

I have no idea why. So I just leave the function and let the CREATE OR REPLACE take care of it.

Looking for the comments? Try the old layout.