Just a Theory

Trans rights are human rights

Posts about PGXN

Mini Summit One

Great turnout and discussion for the first in a series of community talks and discussions on the postgres extension ecosystem leading up to the Extension Ecosystem Summit at pgconf.dev on May 28. Thank you!

The talk, “State of the Extension Ecosystem”, was followed by 15 minutes or so of super interesting discussion. Here are the relevant links:

For posterity, I listened through my droning and tried to capture the general outline, posted here along with interspersed chat history and some relevant links. Apologies in advance for any inaccuracies or missed nuance; i’m happy to update these notes with your corrections.

And now, to the notes!

Introduction

  • Introduced myself, first Mini Summit, six leading up to the in-person summit on May 28 at PGConf.dev in Vancouver, Canada.

  • Thought I would get it things started, provide a bit of history of extensions and context for what’s next.

Presentation

  • Postgres has a long history of extensibility, originally using pure SQL or shared preload libraries. Used by a few early adopters, perhaps a couple dozen, including …

  • Explicit extension support added in Postgres 9.1 by Dimitri Fontaine, with PGXS, CREATE EXTENSION, and pg_dump & pg_restore support.

  • Example pair--1.0.0.sql:

    -- complain if script is sourced in psql and not CREATE EXTENSION
    \echo Use "CREATE EXTENSION pair" to load this file. \quit
    
    CREATE TYPE pair AS ( k text, v text );
    
    CREATE FUNCTION pair(text, text)
    RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair;';
    
    CREATE OPERATOR ~> (LEFTARG = text, RIGHTARG = text, FUNCTION = pair);
    
  • Bagel makes an appearance.

  • Example pair.control:

    # pair extension
    comment = 'A key/value pair data type'
    default_version = '1.0'
    module_pathname = '$libdir/pair'
    relocatable = true
    
  • Example Makefile:

    EXTENSION    = pair
    MODULEDIR    = $(EXTENSION)
    DOCS         = README.md
    DATA         = sql/pair--1.0.sql
    TESTS        = test/sql/base.sql
    REGRESS      = base
    REGRESS_OPTS = --inputdir=test
    MODULES      = src/pair
    PG_CONFIG   ?= pg_config
    
    PGXS := $(shell $(PG_CONFIG) --pgxs)
    include $(PGXS)
    
  • Build and Install:

    $ make
    make: Nothing to be done for `all'.
    
    $ make install
    mkdir -p '/pgsql/share/extension'
    mkdir -p '/pgsql/share/pair'
    mkdir -p '/pgsql/share/doc/pair'
    install -c -m 644 pair.control '/pgsql/share/extension/'
    install -c -m 644 sql/pair--1.0.sql  '/pgsql/share/pair/'
    install -c -m 644 README.md '/pgsql/share/doc/pair/'
    
    $ make installcheck
    # +++ regress install-check in  +++
    # using postmaster on Unix socket, default port
    ok 1         - base                                       15 ms
    1..1
    # All 1 tests passed.
    
  • CREATE EXTENSION:

    $ psql -d try -c 'CREATE EXTENSION pair'
    CREATE EXTENSION
    
    $ pg_dump -d try
    --
    -- Name: pair; Type: EXTENSION; Schema: -; Owner: -
    --
    CREATE EXTENSION IF NOT EXISTS pair WITH SCHEMA public;
    
    --
    -- Name: EXTENSION pair; Type: COMMENT; Schema: -; Owner:
    --
    COMMENT ON EXTENSION pair IS 'A key/value pair data type';
    
  • Many of us saw opportunity in this new feature.

    PostgreSQL today is not merely a database, it’s an application development platform.

    — Me, 2010

  • Proposed to build PGXN. Raised funds to build it in late 2010. Launched site April 2011; Daniele Varrazzo released CLI, and Dickson Guedes released the dev CLI.

  • Problems PGXN set out to solve:

    • Source code distribution with user registration and namespacing
    • Discovery: Search, read docs, brows tags
    • Installation: CLI to compile and install using PGXS or Configure
  • PGXN Components:

  • Problems out of scope for PGXN:

    • Binary packaging and distribution
      • Defer to apt/yum
    • Developer tooling (though dev utils helped)
    • Build tooling
      • Defer to core (PGXS)
  • PGXN Shortcomings:

    • Little development since 2012
    • Search limitations
    • Source of Record
      • Minority of available extensions on PGXN
      • Releases uneven or neglected

    In classic SDLC fashion, PGXN POC shipped as an MVP and was neglected.

    — Me, Just Now

  • Been peripheral to Postgres extensions for the last 10-12 years, but some things have happened.

  • Non-Core extension counts:

  • Daniele asks about that last source, which is just a list in a gist.

  • Joe Nelson links to the gist in Zoom chat. It is not his list, contrary to my off-the-cuff guess

  • Why haven’t extensions taken off?

  • Lost Opportunities

    • No one canonical source to discover and install extensions
    • Difficult to find and discover extensions without canonical registry
    • Most extensions are under-documented and difficult to understand
    • They are also hard to configure and install; most people don’t want or need a compiler
    • The maturity of extensions can be difficult to gauge, not systematized, must each be independently researched
      • David Christensen in Chat “attention economy/awareness, NIH, etc”
      • Jeremy S in chat: “Maybe some people don’t know they are using extensions (I think that’s possible to some degree)”
    • There is no comprehensive binary packaging
    • Centralized source distribution is insufficient (even if it were complete)
      • jubilee in chat: Trust aspect?
      • David Johnson in chat: To seem legit you need to touch the repo at least annually to ensure it works on the newest major release. Even if you just do your compile and update the readme.
      • I mention using pgxn-utils and GitHub workflows to ensure my extensions continue working
    • There is insufficient developer tooling; pgxn-utils not well-maintained, don’t build on recent Rubies, but pgrx has a lot of Rust-oriented tooling
      • Eric in chat: ❤️
      • jubilee in chat: 🦀 mentioned!
  • Filling the Gaps

  • dbdev: “The Database Package Manager for Trusted Language Extensions”: Includes only TLEs, no binary extensions

  • trunk: “A Postgres Extension Registry”: Binary distribution of curated extensions, desires to be comprehensive and cross-platform

  • pgxman: “npm for PostgreSQL”: Binary Apt package distribution of curated extensions packaged with, desires to be comprehensive and cross-platform

  • Emphases: Ease of Use. Screenshot from pgxman:

    $ curl -sfL https://install.pgx.sh | sh -
    👏🎉 pgxman successfully installed
    $ pgxman install pgvector
    The following Debian packages will be installed:
    postgresql-14-pgxman-pgvector=0.5.1
    Do you want to continue? [Y/n] y
    pgvector has been successfully installed.
    
    • Daniele in chat: “Missing a “curl | sudo sh” there…. 👀”
    • Greg Mullane (CrunchyData) [he/him] in chat: “Really not a fan of that “pipe curl stuff from internet into sh” system.”
    • Jeremy S in chat: “Someone recently reprimanded me for putting curl | psql in an extension README. From a security perspective it probably sets a better example to do curl >file.sql … psql file.sql (encourage users not to run from Internet but read/review first)” * jubilee in chat: “apt/yum install is just a better UI over curl | sh :^)”
    • Jeremy S in chat: “Yes and once you’re to that point there’s already more supply chain verification happening”
    • Jeremy S in chat: “It’s usually just the initial bootstrap into any system, if the setup wasn’t already in your distro”
  • Emphases: Platform neutrality. Screenshot from trunk:

    Architecture x86-64
    Operating system Debian/Ubuntu
  • Emphases: Stats. Screenshot from dbdev:

    Downloads


    20 all time downloads 0 downloads in last 30 days 1 download in last 90 days 0 downloads in last 180 days

  • Emphases: Curation. Screenshot from trunk:

    Featured 7
    Analytics 13
    Auditing / Logging 7
    Data Change Capture 6
    Connectors 27
    Data / Transformations 49
    • Damien Clochard in chat: gtg, see you later guys !
  • MVPs

    • trunk: Manual integration, Currently Debian-only

    • pgxman: Form-based submission, Currently Apt-only

    • dbdev: TLEs only, CLI publishing

    • David Christensen in chat: “go has a pretty good extensions infra, imho, wrt discovery/docs, etc. also has the benefit of the package names being the URL to access it, which is a nice convention.”

  • New Opportunities Today

    What are the community opportunities for the extension ecosystem?

    Some ideas:

    • Improved dev tools: More than pgxn-utils and pgrx
    • Canonical registry: All publicly-available extensions in one pac3
    • Easy publishing: auto-discovery or CI/CD pipeline publishing
    • Continuous Delivery: CI/CD pipeline publishing
    • File-free installation: TLEs
    • Documentation: Something like Go docs or Rust docs
    • File management: Put all the files for an extension in one directory
    • Improved metadata
      • Library Dependencies: utilities used by extensions
      • Platform Dependencies: system packages
      • Build pipelines: PGXS, pgrx, make, cpan, pypi, etc.
      • Artifacts: Binaries build on release
      • Classification: Curated in addition to tags
      • Extension Types: Extensions, apps, background workers, loadable libraries
    • Derived Services
      • Binary Packaging: Distributed binaries for many platforms
      • Ratings & Reviews: Stars, thumbs, comments
      • Aggregated Stats: Repository stats, etc.
      • Smoke Testing: Matrix of multi-platform test results
      • Security Scanning: Reporting vulnerabilities
      • Badging & Curation: Third-party classification, badging various statuses
  • Extension Ecosystem Summit

    Collaborate to examine the ongoing work on PostgreSQL extension distribution, examine its challenges, identify questions, propose solutions, and agree on directions for execution.

  • 🏔️ Your Summit Organizers

  • Devrim Gunduz in chat: Thanks David!

  • Schedule:

    • March 6: David Wheeler, PGXN: “State of the Extension Ecosystem”
    • March 20: Ian Stanton, Tembo: “Building Trunk: A Postgres Extension Registry and CLI”
    • April 3: Devrim Gündüz: “yum.postgresql.org and the challenges RPMifying extensions”
    • April 17: Jonathan Katz: “TLE Vision and Specifics”
    • May 1: Yurii Rashkovskii, Omnigres: “Universally buildable extensions: dev to prod”
    • May 15: (Placeholder) David Wheeler, PGXN: “Metadata for All: Enabling discovery, packaging, and community”
  • Ultimately want to talk about what’s important to you, the members of the community to make extensions successful.

Discussion

  • Eric: I’m Eric Ridge, one of the developers behind pgrx, as you’re going through this process of building a modern extension ecosystem, let us know what we can do on the Rust side to help make your lives easier, we’re happy to help any way we can.

  • Steven Miller in chat:

    These are some areas of interest we noticed building Tembo

    Binary packaging / distribution:

    • Variable installation location
    • System dependencies / uncommon system dependencies or versions
    • Chip specific instructions (e.g. vector compiled with avx512)
    • Extension-specific file types / extra data files (e.g. anonymizer .csv data)

    Turning on extensions automatically

    • Different ways to enable extensions
    • does it need load (shared_preload_libraries, session_… etc)?
    • Does it use create extension framework?
    • Does it require a specific schema?
    • What about turning on in multiple DBs at the same time in the same cluster, with background worker?
    • Disabling, what data will be lost?
    • Validating safety / user feedback on upgrade?

    In cloud / SaaS:

    • Installing + enabling extensions quickly, without restart
    • Persisting extension files
    • Extension-specific files (e.g. libraries) versus postgres’ libraries
    • Updating
    • Troubleshooting crashes / core dumps

    Anyone else have similar problems / tips?

  • Steven Miller: These were just things I noted during the presentation. Curious if these are interesting to others on the call.

  • Daniele in chat: “Regards binary distributions, python wheels might be a useful reference.”

  • Steven Miller: That’s good point! What do people think of idea to just install extensions onto servers, not packages, persisted on the disk, next to PGDATA so they go into a single persistent volume, and the rest is managed by an immutable container.

  • Daniele: Had experience on Crunchy where we had to replace an image to get an extension. Looked for feature to have a sidecar or a volume with the extension.

  • Steven Miller: Didn’t have a separate directory just for extensions, it’s just pg_config --libdir fore everything. Had to persist entire directory, including those files form the base build, their internal files. Would have been nice to have a separate directory, extra-libdr or extra-sharedir, something like that.

  • Yurii Rashkovskii: I was working on a patch to do exactly that, but haven’t completed it. Was going to introduce additional directories to search for this stuff.

  • Steven Miller: That would be really awesome.

  • Jeremy S in chat: “Advantage of that is that a single image can be shared among systems with different needs”

  • Eric in chat: “Thoughts around “enterprise repositories” that could be self-hosted and disconnected from the internet?”

    • Ian Stanton in chat: “I’ll touch on this in the next talk, it’s crossed our minds when building the Trunk registry”
  • Steven Miller: I think that’s a great idea.

  • Bagel reappears.

  • David Wheeler: PGXN originally designed so anyone could run Manager and their own root mirror, and maybe rsync from the community one. Don’t know that anyone ever did, it’s a little complicated and most people don’t want to work with Perl. [Chuckles]. Definitely think there’s space for that. If you work with Java or Go or maybe Rust, lots of orgs like Artifactory that provide internal registries. Could be cool use case for Postgres extensions.

  • David Christensen in chat: “something that could support extension batches; like groups of related extensions that could be installed in bulk or loaded in bulk (so could accommodate the shared systems with different individual extension needs, but could be shared_preload_library configured)”

  • “Steven Miller” in chat: “Sounds familiar”

  • Greg Mullane (CrunchyData) [he/him] in chat: “All these items remind me of CPAN. We should see what things it (and other similar systems) get right and wrong. I’ve learned from CPAN that ratings, reviews, and badging are going to be very difficult.”

    • David Christensen in chat: “I’d assumed at the time that it was largely lifted (at least in philosophy/design) from CPAN. 🙂”
    • David Wheeler (he/him) in chat: “yes”
  • Jeremy S: I think this is mostly focused on developers, but I had recent experience where multiple people in the past few months, new to Postgres, are trying to understand extensions. They install a version and then see there are like 15 versions installed, so confused. Goes back to the install file. Bit of UX angle where there are sharp edges where people trying to make sense of extensions, the flexibility makes it hard to understand. Some might be some nice guides, some architectural things explaining PGXS, or improvements to make to the design. Related, not main topic, but good to keep end user UX and devs building on Postgres but not Postgres developers, who run their businesses.

  • David Wheeler: Yeah all the files can be confusing, which is why I think trunk and pgxman trying to simplify: Just run this command and then you have it.

  • Steven Miller in chat: “I really agree with what Jeremy is saying. Right now PGXN and Trunk are taking and approach like “whatever an extension could do / how it may work, it should work on this registry”. But I think more standards / “what is a normal extension” would make the UX much easier.”

  • Jeremy S: Even with that the available extensions view is still there. Some of that is just schema management, and that’s how core is doing schema management.

  • Steven Miller in chat: I exactly agree about the concern about multiple extensions within a package. Also version and name mismatches

  • David Wheeler: And not everything is an extension, you just want to work, or your extension is just utility like pg_top you just want to use. Extensions I think were a tremendous contribution to Postgres itself, but a lot of it was wrangling the existing system for building Postgres itself to make it work for that. Could be very interesting, though quite long term — and I know Dimitri has tried this multiple times — to build a proper package management system within Postgres itself, to eas a lot of that pain and burden.

  • Tobias Bussmann in chat: “Thank you for the great overview and for taking this topic further! Unfortunately, I’ll not be able to join at pgConf.dev but will follow whatever results this will lead to. As a package maintainer, I am constantly looking in a easy way to allow users to use extensions without having to package everything ;)”

  • Steven Miller in chat: “Like auto explain for example right. i.e. a LOAD only “extension””

  • *Yurii Rashkovskii: An interesting topic, what extensions are capable of doing and how they can be more self-contained. Like Steven was saying in chat: how easy is it to load and unload extensions. Example: want an extension to hook into a part of Postgres: executor, planner, etc. How do you go about enabling them? How you unload them, introspect, list of hooks.

    Omni extension provides a list of all hooks, and when you remove an extension it removes the hooks that provide the extension, but still not part of the core. Hooks one of the greatest ways to expand the functionality of Postgres, allows us to experiment with Postgres before committing to the full cycle of getting a patch into Postgres. Lets us get it to users today to try. if it makes a lot of sense and people want it, time to commit to the process of getting a patch committed. But if we don’t hve this venue, how to get extensions in, our ability to try things is limited.

  • jubilee in chat: Hmm. It seems my audio is not working.

  • David Wheeler: The next session is two weeks from today: Ian Stanton is going to talk about “Building Trunk: A Postgres Extension Registry and CLI”. Will be interesting because a number of people have decided to build a binary packaging system for extensions, just to air out what the challenges were, what problems they wanted to solve, what problems remain, and where they want to take it in the future.

  • Jeremy S* in chat: “Bagel clearly has something to say”

  • *David Wheeler: jubileee I see your audio issues, do you just want to type your question into chat? We can also discuss things in the #extensions channel on the Postgres Slack

  • David Wheeler: Thank you all for coming!

  • jubilee in chat: “I had a question which is about: Does Postgres actually support docs for extensions? Like, a lot of people don’t really WANT to read a README. Can you get docstrings for a function in psql?”

  • Ian Stanton in chat: “Thank you David!”

  • jubilee in chat: And if not, why not?

Post Presentation Discussion

From Slack:

  • David Wheeler: I see now that “jubilee” left their question in the Zoom chat.
  • David Wheeler: The closest attempt at this I’ve seen is pg_readme, which will generate Markdown for an extension from comments in the catalog and write it out to a file.
  • David G. Johnson: The comment on command adds in database comments that psql describe commands should display.

Also on Slack, Greg Sabino Mullane started a longish thread on the things we want to do and build.

Talk: State of the Extension Ecosystem

Photo of the summit of Mount Hood

Update: 2024-03-06: Slides and video linked below.

A quick reminder that I’ll be giving a brief talk on the “State of the Extension Ecosystem” on Wednesday at noon US Eastern / 17:00 UTC. This talk is the first in a series of community talks and discussions on the postgres extension ecosystem leading up to the Extension Ecosystem Summit at pgconf.dev on May 28.

I plan to give a brief history of Postgres extension tools and distribution, the challenges encountered, recent developments, and opportunities for the future. It should take about 30 minutes, followed by discussion. Following this pattern for all the talks in the series, I hope to set up some engaging discussions and to surface significant topics ahead of the summit.

Join us! Need other information or just want an invitation without using Eventbrite, hit me up at david@ this domain, on Mastodon, or via the #extensions channel on the Postgres Slack.

Update: 2024-03-06: Great turnout and discussion, thank you! Links:

Extension Ecosystem Summit 2024

Logo for PGConf.dev

I’m pleased to announce that some pals and I have organized and will host the (first annual?) Extension Ecosystem Summit at PGConf.dev in Vancouver (and more, see below) on May 28:

Enabling comprehensive indexing, discovery, and binary distribution.

Participants will collaborate to examine the ongoing work on PostgreSQL extension distribution, examine its challenges, identify questions, propose solutions, and agree on directions for execution.

Going to PGConf? Select it as an “Additional Option” when you register, or update your registration if you’ve already registered. Hope to see you there!


Photo of the summit of Mount Hood

Extension Ecosystem Mini-Summit

But if you can’t make it, that’s okay, because in the lead up to the Summit, to we’re hosting a series of six virtual gatherings, the Postgres Extension Ecosystem Mini-Summit.

Join us for an hour or so every other Wednesday starting March 6 to hear contributors to a variety of community and commercial extension initiatives outline the problems they want to solve, their attempts to so, challenges discovered along the way, and dreams for an ideal extension ecosystem in the future. Tentative speaker lineup:

  • March 6: David Wheeler, PGXN: “State of the Extension Ecosystem”
  • March 20: Ian Stanton, Tembo: “Building Trunk: A Postgres Extension Registry and CLI”
  • April 3: Devrim Gündüz: “yum.postgresql.org and the challenges RPMifying extensions”
  • April 17: Jonathan Katz: “TLE Vision and Specifics”
  • May 1: Yurii Rashkovskii, Omnigres: “Universally buildable extensions: dev to prod”
  • May 15: (Placeholder) David Wheeler, PGXN: “Metadata for All: Enabling discovery, packaging, and community”

Hit the event page for details. Many thanks to my co-organizers Jeremy Schneider, David Christensen, Keith Fiske, and Devrim Gündüz, as well as the PGConf.dev organizers for making this all happen!

Update: 2024-03-06: Updated the talk schedule.

The History and Future of Extension Versioning

Every software distribution system deals with versioning. Early in the design of PGXN, I decided to require semantic versions (SemVer), a clearly-defined and widely-adopted version standard, even in its pre-1.0 specification. I implemented the semver data type that would properly sort semantic versions, later ported to C by Sam Vilain and eventually updated to semver 2.0.0.

As I’ve been thinking through the jobs and tools for the Postgres extension ecosystem, I wanted to revisit this decision, the context in which it was made, and survey the field for other options. Maybe a “PGXN v2” should do something different?

But first that context, starting with Postgres itself.

PostgreSQL Extension Version Standard

From the introduction extensions in PostgreSQL 9.1, the project side-stepped the need for version standardization and enforcement by requiring extension authors to adopt a file naming convention, instead. For example, an extension named “pair” must have a file with its name, two dashes, then the version as listed in its control file, like so:

pair--1.1.sql

As long as the file name is correct and the version part byte-compatible with the control file entry, CREATE EXTENSION will find it. To upgrade an extension the author must provide a second file with the extension name, the old version, and the new version, all delimited by double dashes. For example, to upgrade our “pair” extension to version 1.2, the author supply all the SQL commands necessary to upgrade it in this file:

pair--1.1--1.2.sql

This pattern avoids the whole question of version standards, ordering for upgrades or downgrades, and all the rest: extension authors have full responsibility to name their files correctly.

PGXN Versions

SemVer simplified a number of issues for PGXN in ways that the PostgreSQL extension versioning did not (without having to re-implement the core’s file naming code). PGXN wants all metadata for an extension in its META.json file, and not to derive it from other sources that could change over time.

Following the CPAN model, PGXN also required that extension releases never decrease the version.1 The well-defined sortability of semantic versions made this validation trivial. PGXN later relaxed enforcement to allow updates to previously-released versions. SemVer’s clearly specified sorting made this change possible, as the major.minor.patch precedence intuitively compare from left to right.

In other words, if one had previously released version 1.2.2, then released 1.3.0, a follow-up 1.2.3 is allowed, increasing the 1.2.x branch version, but not, say, 1.2.1, which decreases the 1.2.x branch version.

Overall, semantic versions have been great for clarity of versioning of PGXN extensions. The one bit of conflict comes from extensions that use some other other version standard in the control file, usually a two-part x.y version not allowed by SemVer, which requires x.y.z (or, more specifically, major.minor.patch).

But such versions are usually compatible with SemVer, and because PGXN cares only about the contents of the META.json, they’re free to use their own versions in the control file, just as long as the META.json file uses SemVers.

For example, the recent nominatim_fdw v1.0.0 release, which of course lists "version": "1.0.0" in its META.json file, sticks to its preferred default_version = '1.0' in its control file. The extension author simply appends .0 to create a valid SemVer from their preferred version, and as long as they never use any other patch number, it remains compatible.

Versioning Alternatives

Surveying the versioning landscape in 2024 yields a number of approaches. Might we prefer an alternative for future extensions distribution? Let’s look at the possibilities.

Ad Hoc Versions

As described above, the Postgres file naming convention allows ad hoc versions. As far as I can tell, so does the R Project’s CRAN. This approach seems fine for systems that don’t need to follow version changes themselves, but much trickier for systems that do. If I want to install the latest version of an extension, how does the installer know what that latest version is?

The answer is that the extension author must always release them in the proper order. But if someone releases 1.3.1 of an extension, and then 1.2.1, well then 1.2.1 is the latest, isn’t it? It could get confusing pretty quickly.

Seems better to require some system, so that download and install clients can get the latest version — or the latest maintenance version of an earlier release if they need it.

User Choice

Quite a few registries allow users to choose their own versioning standards, but generally with some very specific recommendations to prevent confusion for users.

  • Python Packaging is fairly liberal in the versions it allows, but strongly recommends semantic versioning or calendar versioning (more on that below).
  • CPAN (Perl) is also fairly liberal, due to its long history of module distribution, but currently requires “Decimal versions”, which are evaluated as floating-point numbers, or dotted integer versions, which require three dot-separated positive integers and must begin with the letter v.
  • RubyGems does not enforce a versioning policy, but warns that “using an ‘irrational’ policy will only be a disservice to those in the community who use your gems.” The project therefore urges developers to follow SemVer.

These three venerable projects date from an earlier period of registration and distribution, and have made concessions to times when no policies existed. Their solutions either try to cover as many legacy examples as possible while recommending better patterns going forward (Python, Perl), or simply make recommendations and punt responsibility to developers.

SemVer

More recently-designed registries avoid this problem by requiring some level of versioning standard from their inception. Nearly all use SemVer, including:

  • Go Modules, where “Each version starts with the letter v, followed by a semantic version.”
  • Cargo (Rust), which “uses SemVer for specifying version numbers. This establishes a common convention for what is compatible between different versions of a package.”
  • npm, where the “version must be parseable by node-semver, which is bundled with npm as a dependency.”

CalVer

CalVer eschews context-free incrementing integers in favor of semantically-meaningful versions, at least for some subset of a version string. In other words: make the version date-based. CalVer-versioned projects usually include the year and sometimes the month. Some examples:

  • Ubuntu uses YY.0M.MICRO, e.g., 23.04, released in April 2023, and 23.10.1, released in October 2023
  • Twisted uses YY.MM.MICRO, e.g., 22.4.0, released in April 2022

Ultimately, adoption of a CalVer format is a more choice about embedding calendar-based meaning into a version more than standardizing a specific format. One can of course use CalVer semantics in a semantic version, as in the Twisted example, which is fully-SemVer compliant.

In other words, adoption of CalVer need not necessitate rejection of SemVer.

Package Managers

What about package managers, like RPM and Apt? Some canonical examples:

  • RPM packages use the format:

    <name>-<version>-<release>.<architecture>
    

    Here <version> is the upstream version, but RPM practices a reasonable (if baroque) version comparison of all its parts. But it does not impose a standard on upstream packages, since they of course vary tremendously between communities and projects.

  • Apt packages use a similar format:

    [epoch:]upstream_version[-debian_revision]
    

    Again, upstream_version is the version of the upstream package, and not enforced by Apt.

  • APK (Alpine Linux) packages use the format

    {digit}{.digit}...{letter}{_suf{#}}...{-r#}
    

    I believe that {digit}{.digit}...{letter} is the upstream package version.

This pattern makes perfect sense for registries that repackage software from dozens of upstream sources that may or may not have their own policies. But a system that defines the standard for a specific ecosystem, like Rust or PostgreSQL, need not maintain that flexibility.

Recommendation

Given this survey, I’m inclined to recommend that the PostgreSQL community follow the PGXN (and Go, and Rust, and npm) precedent and continue to rely on and require semantic versions for extension distribution. It’s not perfect, given the contrast with the core’s lax version requirements. CalVer partisans can still use it, though with fewer formatting options (SemVer forbids leading zeros, as in the Ubuntu 23.04 example).

But with its continuing adoption, and especially its requirement by more recent, widely-used registries, and capacity to support date-based semantics for those who desire it, I think it continues to make the most sense.

Wrong!

I’m probably wrong. I’m often mistaken in one way or another, on the details or the conclusion. Please tell me how I’ve messed up! Find me on the #extensions channel on the Postgres Slack or ping me on Mastodon.


  1. Why? Because every module on CPAN has one and only one entry in the index file. Ricardo Signes explains↩︎

Extension Ecosystem Jobs to be Done

Over on the Tembo blog I’ve published a thinking-through of what all the jobs to be done of the ideal Postgres extension ecosystem might be:

These challenges and the interest and energy put into exploring new solutions make clear that the time has come to revisit the whole idea of the PostgreSQL extension ecosystem: to work though the jobs to be done, specify the tools to do those jobs, and outline a plan for the broader Postgres community to design and build them.

Future posts will dream up the tools and make the plan; today we begin with the jobs.

🎬 Let’s get started.

Presentation: Introduction to the PGXN Architecture

As I started digging into the jobs and tools for the Postgres extension ecosystem as part of my new gig, I realized that most people have little knowledge of the PGXN architecture. I learned a lot designing PGXN and its services, and am quite pleased with where it ended up, warts and all. So I thought it worthwhile to put together a brief presentation on the fundamental design principals (static REST file API), inter-related services (root mirror, manager, API, site) and tools (CLI, CI/CD).

Yesterday, the Tembo blog published the presentation, including the video and slides, along with a high-level architecture diagram. I hope it’s a useful point of reference for the Postgres community as we look to better distribute extensions in the future.

Contemplating Decentralized Extension Publishing

TL;DR

As I think through the future of the Postgres extension ecosystem as a key part of the new job, I wanted to understand how Go decentralized publishing works. In this post I work it out, and think through how we might do something similar for Postgres extension publishing. It covers the Go architecture, namespacing challenges, and PGXS abuse; then experiments with URL-based namespacing and ponders reorganizing installed extension files; and closes with a high-level design for making it work now and in the future.

It is, admittedly, a lot, mainly written for my own edification and for the information of my fellow extension-releasing travelers.

I find it fascinating and learned a ton. Maybe you will too! But feel free to skip this post if you’re less interested in the details of the journey and want to wait for more decisive posts once I’ve reached the destination.

Introduction

Most language registries require developers to take some step to make releases. Many automate the process in CI/CD pipelines, but it requires some amount of effort on the developer’s part:

  • Register for an account
  • Learn how to format things to publish a release
  • Remember to publish again for every new version
  • Create a pipeline to automate publishing (e.g., a GitHub workflow)

Decentralized Publishing

Go decentralized publishing has revised this pattern: it does not require user registration or authentication to to publish a module to pkg.go.dev. Rather, Go developers simply tag the source repository, and the first time someone refers to the tag in Go tools, the Go module index will include it.

For example, publishing v1.2.1 of a module in the github.com/golang/example repository takes just three commands:

git tag v1.2.1 -sm 'Tag v1.2.1'
git push --tags
go list -m github.com/golang/example@v1.2.1

After a few minutes, the module will show up in the index and then on pkg.go.dev. Anyone can run go get -u github.com/golang/example to get the latest version. Go developers rest easy in the knowledge that they’re getting the exact module they need thanks to the global checksum database, which Go uses “in many situations to detect misbehavior by proxies or origin servers”.

This design requires go get to understand multiple source code management systems: it supports Git, Subversion, Mercurial, Bazaar, and Fossil.1 It also needs the go.mod metadata file to live in the project defining the package.

But that’s really it. From the developer’s perspective it could not be easier to publish a module, because it’s a natural extension of the module development tooling and workflow of committing, tagging, and fetching code.

Decentralized Extension Publishing

Could we publish Postgres extensions in such a decentralized pattern? It might look something like this:

  • The developer places a metadata file in the proper location (control file, META.json, Cargo.toml, whatever — standard TBD)
  • To publish a release, the developer tags the repository and calls some sort of indexing service hook (perhaps from a tag-triggered release workflow)
  • The indexing service validates the extension and adds it to the index

Note that there is no registration required. It simply trusts the source code repository. It also avoids name collision: github.com/bob/hash is distinct from github.com/carol/hash.

This design does raise challenges for clients, whether they’re compiling extensions on a production system or building binary packages for distribution: they have to support various version control systems to pull the code (though starting with Git is a decent 90% solution).

Namespacing

Then there’s name conflicts. Perhaps github.com/bob/hash and github.com/carol/hash both create an extension named hash. By the current control file format, the script directory and module path can use any name, but in all likelihood the use these defaults:

directory = 'extension'
module_pathname = '$libdir/hash'

Meaning .sql files will be installed in the Postgres share/extension subdirectory — along with all the other installed extensions — and library files will be installed in the library directory along with all other libraries. Something like this:

pgsql
├── lib
│   └── hash.so
└── share
    └── extension
    │   └── hash.control
    │   ├── hash--1.0.0.sql
    └── doc
        └── hash.md

If both projects include, say, hash.control, hash--1.0.0.sql, and hash.so, the files from one will stomp all over the files of the other.

Installer Abuse

Go avoids this issue by using the domain and path from each package’s repository in its directory structure. For example, here’s a list of modules from google.golang.org repositories:

$ ls -1 ~/go/pkg/mod/google.golang.org
api@v0.134.0
api@v0.152.0
appengine@v1.6.7
genproto
genproto@v0.0.0-20230731193218-e0aa005b6bdf
grpc@v1.57.0
grpc@v1.59.0
protobuf@v1.30.0
protobuf@v1.31.0
protobuf@v1.32.0

The ~/go/pkg/mod directory has subdirectories for each VCS host name, and each then subdirectories for package paths. For the github.com/bob/hash example, the files would all live in ~/go/pkg/mod/github.com/bob/hash.

Could a Postgres extension build tool follow a similar distributed pattern by renaming the control file and installation files and directories to something specific for each, say github.com+bob+hash and github.com+carol+hash? That is, using the repository host name and path, but replacing the slashes in the path with some other character that wouldn’t create subdirectories — because PostgreSQL won’t find control files in subdirectories. The control file entries for github.com/carol/hash would look like this:

directory = 'github.com+carol+hash'
module_pathname = '$libdir/github.com+carol+hash'

Since PostgreSQL expects the control file to have the same name as the extension, and for SQL scripts to start with that name, the files would have to be named like so:

hash
├── Makefile
├── github.com+carol+hash.control
└── sql
    └── github.com+carol+hash--1.0.0.sql

And the Makefile contents:

EXTENSION  = github.com+carol+hash
MODULEDIR  = $(EXTENSION)
DATA       = sql/$(EXTENSION)--1.0.0.sql
PG_CONFIG ?= pg_config

PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

In other words, the extension name is the full repository host name and path and the Makefile MODULEDIR variable tells pg_config to put all the SQL and documentation files into a directories named github.com+carol+hash — preventing them from conflicting with any other extension.

Finally, the github.com+carol+hash.control file — so named becaus it must have the same name as the extension — contains:

default_version = '1.0.0'
relocatable = true
directory = 'github.com+carol+hash'
module_pathname = '$libdir/github.com+carol+hash'

Note the directory parameter, which must match MODULEDIR from the Makefile, so that CREATE EXTENSION can find the SQL files. Meanwhile, module_pathname ensures that the library file has a unique name — the same as the long extension name — again to avoid conflicts with other projects.

That unsightly naming extends to SQL: using the URL format could get to be a mouthful:

CREATE EXTENSION "github.com+carol+hash";

Which is do-able, but some new SQL syntax might be useful, perhaps something like:

CREATE EXTENSION hash FROM "github.com+carol+hash";

Or, if we’re gonna really go for it, use slashes after all!

CREATE EXTENSION hash FROM "github.com/carol/hash";

Want to use both extensions but they have conflicting objects (e.g., both create a “hash” data type)? Put them into separatre schemas (assuming relocatable = true in the control file):

CREATE EXTENSION hash FROM "github.com/carol/hash" WITH SCHEMA carol;
CREATE EXTENSION hash FROM "github.com/bob/hash" WITH SCHEMA bob;
CREATE TABLE try (
    h1 carol.hash,
    h2 bob.hash
);

Of course it would be nice if PostgreSQL added support for something like Oracle packages, but using schemas in the meantime may be sufficient.

Clearly we’re getting into changes to the PostgreSQL core, so put that aside and we can just use long names for creating, modifying, and dropping extensions, but not necessarily otherwise:

CREATE EXTENSION "github.com+carol+hash" WITH SCHEMA carol;
CREATE EXTENSION "github.com+bob+hash" WITH SCHEMA bob;
CREATE EXTENSION "gitlab.com+barack+kicker_type";
CREATE TABLE try (
    h1 carol.hash,
    h2 bob.hash
    kt kicker
);

Namespacing Experiment

To confirm that this approach might work, I committed 24134fd and pushed it in the namespace-experiment branch of the semver extension. This commit changes the extension name from semver to github.com+theory+pg-semver, and follows the above steps to ensure that its files are installed with that name.

Abusing the Postgres extension installation infrastructure like this does work, but suffers from a number of drawbacks, including:

  • The extension name is super long, as before, but now so too are the files in the repository (as opposed to the installer renaming them on install). The shared library file has to have the long name, so therefore does the .c source file. The SQL files must all start with github.com+theory+pg-semver, although I skipped that bit in this commit; instead the Makefile generates just one from sql/semver.sql.
  • Any previous installation of the semver type would remain unchanged, with no upgrade path. Changing an extension’s name isn’t a great idea.

I could probably script renaming and modifying file contents like this and make it part of the build process, but it starts to get complicated. We could also modify installers to make the changes, but there are a bunch of moving parts they would have to compensate for, and given how dynamic this can be (e.g., the semver Makefile reads the extension name from META.json), we would rapidly enter the territory of edge case whac-a-mole. I suspect it’s simply too error-prone.

Proposal: Update Postgres Extension Packaging

Perhaps the Go directory pattern could inspire a similar model in Postgres, eliminating the namespace issue by teaching the Postgres extension infrastructure to include all but one of the files for an extension in a single directory. In other words, rather than files distributed like so for semver:

pgsql
├── lib
│   └── semver.so
└── share
    └── extension
    │   └── semver.control
    │   ├── semver--0.32.1.sql
    │   ├── semver--0.32.0--0.32.1.sql
    └── doc
        └── semver.md

Make it more like this:

pgsql
└── share
    └── extension
        └── github.com
            └── theory
                └── pg-semver
                    └── extension.control
                    └── lib
                    │   └── semver.so
                    └── sql
                    │   └── semver--0.32.1.sql
                    │   └── semver--0.32.0--0.32.1.sql
                    └── doc
                        └── semver.md

Or perhaps:

pgsql
└── share
    └── extension
        └── github.com
            └── theory
                └── pg-semver
                    └── extension.control
                    └── semver.so
                    └── semver--0.32.1.sql
                    └── semver--0.32.0--0.32.1.sql
                    └── semver.md

The idea is to copy the files exactly as they’re stored in or compiled in the repository. Meanwhile, the new semver.name file — the only relevant file stored outside the extension module directory — simply points to that path:

github.com/theory/pg-semver

Then for CREATE EXTENSION semver, Postgres reads semver.name and knows where to find all the files to load the extension.

This configuration would require updates to the control file, now named extension.control, to record the full package name and appropriate locations. Add:

name = 'semver'
package = 'github.com/theory/pg-semver'

This pattern could also allow aliasing. Say we try to install a different semver extension from github.com/example/semver. This is in its extension.control file:

name = 'semver'
package = 'github.com/example/pg-semver'

The installer detects that semver.name already exists for a different package and raises an error. The user could then give it a different name by running something like:

make install ALIAS_EXTENSION_NAME=semver2

This would add semver2.name right next to semver.name, and its contents would contain github.com/example/semver, where all of its files are installed. This would allow CREATE EXTENSION semver2 to load the it without issue (assuming no object conflicts, hopefully resolved by relocate-ability).

I realize a lot of extensions with libraries could wreak some havoc on the library resolver having to search so many library directories, but perhaps there’s some way around that as well? Curious what techniques experienced C developers might have adopted.

Back to Decentralized Publishing

An updated installed extension file structure would be nice, and is surely worth a discussion, but even if it shipped in Postgres 20, we need an updated extension ecosystem today, to work well with all supported versions of Postgres. So let’s return to the idea of decentralized publishing without such changes.

I can think of two pieces that’d be required to get Go-style decentralized extension publishing to work with the current infrastructure.

Module Uniqueness

The first is to specify a new metadata field to be unique for the entire index, and which would contain the repository path. Call it module, after Go (a single Git repository can have multiple modules). In PGXN Meta Spec-style JSON it’d look something like this:

{
    "module": "github.com/theory/pg-semver",
    "version": "0.32.1",
    "provides": {
      "semver": {
         "abstract": "A semantic version data type",
      }
    }
}

Switch from the PGXN-style uniqueness on the distribution name (usually the name of the extension) and let the module be globally unique. This would allow another party to release an extension with the same name. Even a fork where only the module is changed:

{
    "module": "github.com/example/pg-semver",
    "version": "0.32.1",
    "provides": {
      "semver": {
         "abstract": "A semantic version data type",
      }
    }
}

Both would be indexed and appear under the module name, and both would be find-able by the provided extension name, semver.

Where that name must still be unique is in a given install. In other words, while github.com/theory/pg-semver and github.com/example/pg-semver both exist in the index, the semver extension can be installed from only one of them in a given Postgres system, where the extension name semver defines its uniqueness.

This pattern would allow for much more duplication of ideas while preserving the existing per-cluster namespacing. It also allows for a future Postgres release that supports something like the flexible per-cluster packaging as described above.2

Extension Toolchain App

The second piece is an extension management application that understands all this stuff and makes it possible. It would empower both extension development workflows — including testing, metadata management, and releasing — and extension user workflows — finding, downloading, building, and installing.

Stealing from Go, imagine a developer making a release with something like this:

git tag v1.2.1 -sm 'Tag v1.2.1'
git push --tags
pgmod list -m github.com/theory/pg-semver@v1.2.1

The creatively named pgmod tells the registry to index the new version directly from its Git repository. Thereafter anyone can find it and install it with:

  • pgmod get github.com/theory/pg-semver@v1.2.1 — installs the specified version
  • pgmod get github.com/theory/pg-semver — installs the latest version
  • pgmod get semver — installs the latest version or shows a list of matching modules to select from

Any of these would fail if the cluster already has an extension named semver with a different module name. But with something like the updated extension installation locations in a future version of Postgres, that limitation could be loosened.

Challenges

Every new idea comes with challenges, and this little thought experiment is no exception. Some that immediately occur to me:

  • Not every extension can be installed directly from its repository. Perhaps the metadata could include a download link for a tarball with the results of any pre-release execution?
  • Adoption of a new CLI could be tricky. It would be useful to include the functionality in existing tools people already use, like pgrx.
  • Updating the uniqueness constraint in existing systems like PGXN might be a challenge. Most record the repository info in the resources META.json object, so it would be do-able to adapt into a new META format, either on PGXN itself or in a new registry, should we choose to build one.
  • Getting everyone to standardize on standardized versioning tags might take some effort. Go had the benefit of controlling its entire toolchain, while Postgres extension versioning and release management has been all over the place. However PGXN long ago standardized on semantic versioning and those who have released extensions on PGXN have had few issues (one can still use other version formats in the control file, for better or worse).
  • Some PGXN distributions have shipped different versions of extensions in a single release, or the same version as in other releases. The release version of the overall package (repository, really) would have to become canonical.

I’m sure there are more, I just thought of these offhand. What have you thought of? Post ’em if you got ’em in the #extensions channel on the Postgres Slack, or give me a holler on Mastodon or via email.


  1. Or does it? Yes, it does. Although the Go CLI downloads most public modules from a module proxy server like proxy.golang.org, it still must know how to download modules from a version control system when a proxy is not available. ↩︎

  2. Assuming, of course, that if and when the Postgres core adopts more bundled packaging that they’d use the same naming convention as we have in the broader ecosystem. Not a perfectly safe assumption, but given the Go precedent and wide adoption of host/path-based projects, it seems sound. ↩︎

PGXN Tools v1.4

Over on the PGXN Blog I’ve posted a brief update on recent bug fixes and improvements to the pgxn-tools Docker image, which is used fairly widely these days to test, bundle, and release Postgres extensions to PGXN. This fix is especially important for Git repositories:

v1.4.1 fixes an issue where git archive was never actually used to build a release zip archive. This changed at some point without noticing due to the introduction of the safe.directory configuration in recent versions of Git. Inside the container the directory was never trusted, and the pgxn-bundle command caught the error, decided it wasn’t working with a Git repository, and used the zip command, instead.

I also posted a gist listing PGXN distributions with a .git directory.

PGXN Challenges

PGXN Gear

Last week, I informally shared Extension Ecosystem: Jobs and Tools with colleagues in the #extensions channel on the Postgres Slack. The document surveys the jobs to be done by the ideal Postgres extension ecosystem and the suggests the tools and services required to do those jobs — without reference to existing extension registries and packaging systems.

The last section enumerates some questions we need to ponder and answer. The first one on the list is:

What will PGXN’s role be in this ideal extension ecosystem?

The PostgreSQL Extension Network, or PGXN, is the original extension distribution system, created 2010–11. It has been a moderate success, but as we in the Postgres community imagine the ideal extension distribution future, it’s worthwhile to also critically examine existing tools like PGXN, both to inform the project and to realistically determine their roles in that future.

With that in mind, I here jot down some thoughts on the challenges with PGXN.

PGXN Challenges

PGXN sets a lot of precedents, particularly in its decoupling of the registry from the APIs and services that depend on it. It’s not an all-in-one thing, and designed for maximum distributed dissemination via rsync and static JSON files.

But there are a number of challenges with PGXN as it currently stands; a sampling:

  • PGXN has not comprehensively indexed all public PostgreSQL extensions. While it indexes more extensions than any other registry, it falls far short of all known extensions. To be a truly canonical registry, we need to make it as simple as possible for developers to register their extensions. (More thoughts on that topic in a forthcoming post.)

  • In that vein, releasing extensions is largely a manual process. The pgxn-tools Docker image has improved the situation, allowing developers to create relatively simple GitHub workflows to automatically test and release extensions. Still, it requires intention and work by extension developers. The more seamless we can make publishing extensions the better. (More thoughts on that topic in a forthcoming post.)

  • It’s written in Perl, and therefore doesn’t feel modern or easily accessible to other developers. It’s also a challenge to build and distribute the Perl services, though Docker images could mitigate this issue. Adopting a modern compiled language like Go or Rust might increase community credibility and attract more contributions.

  • Similarly, pgxnclient is written in Python and the pgxn-utils developer tools in Ruby, increasing the universe of knowledge and skill required for developers to maintain all the tools. They’re also more difficult to distribute than compiled tools would be. Modern cross-compilable languages like Go and Rust once again simplify distribution and are well-suited to building both web services and CLIs (but not, perhaps native UX applications — but then neither are dynamic languages like Ruby and Python).

  • The PGXN Search API uses the Apache Lucy search engine library, a project that retired in 2018. Moreover, the feature never worked very well, thanks to the decision to expose separate search indexes for different objects — and requiring the user to select which to search. People often can’t find what they need because the selected index doesn’t contain it. Worse, the default index on the site is “Documentation”, on the surface a good choice. But most extensions include no documentation other than the README, which appears in the “Distribution” index, not “Documentation”. Fundamentally the search API and UX needs to be completely re-architected and -implemented.

  • PGXN uses its own very simple identity management and basic authentication. It would be better to have tighter community identity, perhaps through the PostgreSQL community account.

Given these issues, should we continue building on PGXN, rewrite some or all of its components, or abandon it for new services. The answer may come as a natural result of designing the overall extension ecosystem architecture or from the motivations of community consensus. But perhaps not. In the end, we’ll need a clear answer to the question.

What are your thoughts? Hit us up in the #extensions channel on the Postgres Slack, or give me a holler on Mastodon or via email. We expect to start building in earnest in February, so now’s the time!

Automate Postgres Extension Releases on GitHub and PGXN

Back in June, I wrote about testing Postgres extensions on multiple versions of Postgres using GitHub Actions. The pattern relies on Docker image, pgxn/pgxn-tools, which contains scripts to build and run any version of PostgreSQL, install additional dependencies, build, test, bundle, and release an extension. I’ve since updated it to support testing on the the latest development release of Postgres, meaning one can test on any major version from 8.4 to (currently) 14. I’ve also created GitHub workflows for all of my PGXN extensions (except for pgTAP, which is complicated). I’m quite happy with it.

But I was never quite satisfied with the release process. Quite a number of Postgres extensions also release on GitHub; indeed, Paul Ramsey told me straight up that he did not want to manually upload extensions like pgsql-http and PostGIS to PGXN, but for PGXN to automatically pull them in when they were published on GitHub. It’s pretty cool that newer packaging systems like pkg.go.dev auto-index any packages on GibHub. Adding such a feature to PGXN would be an interesting exercise.

But since I’m low on TUITs for such a significant undertaking, I decided instead to work out how to automatically publish a release on GitHub and PGXN via GitHub Actions. After experimenting for a few months, I’ve worked out a straightforward method that should meet the needs of most projects. I’ve proven the pattern via the pair extension’s release.yml, which successfully published the v0.1.7 release today on both GitHub and PGXN. With that success, I updated the pgxn/pgxn-tools documentation with a starter example. It looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
name: Release
on:
  push:
    tags:
      - 'v*' # Push events matching v1.0, v20.15.10, etc.
jobs:
  release:
    name: Release on GitHub and PGXN
    runs-on: ubuntu-latest
    container: pgxn/pgxn-tools
    env:
      # Required to create GitHub release and upload the bundle.
      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    steps:
    - name: Check out the repo
      uses: actions/checkout@v3
    - name: Bundle the Release
      id: bundle
      run: pgxn-bundle
    - name: Release on PGXN
      env:
        # Required to release on PGXN.
        PGXN_USERNAME: ${{ secrets.PGXN_USERNAME }}
        PGXN_USERNAME: ${{ secrets.PGXN_PASSWORD }}
      run: pgxn-release
    - name: Create GitHub Release
      id: release
      uses: actions/create-release@v1
      with:
        tag_name: ${{ github.ref }}
        release_name: Release ${{ github.ref }}
        body: |
          Changes in this Release
          - First Change
          - Second Change          
    - name: Upload Release Asset
      uses: actions/upload-release-asset@v1
      with:
        # Reference the upload URL and bundle name from previous steps.
        upload_url: ${{ steps.release.outputs.upload_url }}
        asset_path: ./${{ steps.bundle.outputs.bundle }}
        asset_name: ${{ steps.bundle.outputs.bundle }}
        asset_content_type: application/zip

Here’s how it works:

  • Lines 4-5 trigger the workflow only when a tag starting with the letter v is pushed to the repository. This follows the common convention of tagging releases with version numbers, such as v0.1.7 or v4.6.0-dev. This assumes that the tag represents the commit for the release.

  • Line 10 specifies that the job run in the pgxn/pgxn-tools container, where we have our tools for building and releasing extensions.

  • Line 13 passes the GITHUB_TOKEN variable into the container. This is the GitHub personal access token that’s automatically set for every build. It lets us call the GitHub API via actions later in the workflow.

  • Step “Bundle the Release”, on Lines 17-19, validates the extension META.json file and creates the release zip file. It does so by simply reading the distribution name and version from the META.json file and archiving the Git repo into a zip file. If your process for creating a release file is more complicated, you can do it yourself here; just be sure to include an id for the step, and emit a line of text so that later actions know what file to release. The output should be appended to the $GITHUB_OUTPUT file like this, with $filename representing the name of the release file, usually $extension-$version.zip:

    echo bundle=$filename >> $GITHUB_OUTPUT
    
  • Step “Release on PGXN”, on lines 20-25, releases the extension on PGXN. We take this step first because it’s the strictest, and therefore the most likely to fail. If it fails, we don’t end up with an orphan GitHub release to clean up once we’ve fixed things for PGXN.

  • With the success of a PGXN release, step “Create GitHub Release”, on lines 26-35, uses the GitHub create-release action to create a release corresponding to the tag. Note the inclusion of id: release, which will be referenced below. You’ll want to customize the body of the release; for the pair extension, I added a simple make target to generate a file, then pass it via the body_path config:

    - name: Generate Release Changes
      run: make latest-changes.md
    - name: Create GitHub Release
      id: release
      uses: actions/create-release@v1
      with:
        tag_name: ${{ github.ref }}
        release_name: Release ${{ github.ref }}
        body_path: latest-changes.md
    
  • Step “Upload Release Asset”, on lines 36-43, adds the release file to the GitHub release, using output of the release step to specify the URL to upload to, and the output of the bundle step to know what file to upload.

Lotta steps, but works nicely. I only wish I could require that the testing workflow finish before doing a release, but I generally tag a release once it has been thoroughly tested in previous commits, so I think it’s acceptable.

Now if you’ll excuse me, I’m off to add this workflow to my other PGXN extensions.

Test Postgres Extensions With GitHub Actions

I first heard about GitHub Actions a couple years ago, but fully embraced them only in the last few weeks. Part of the challenge has been the paucity of simple but realistic examples, and quite a lot of complicated-looking JavaScript-based actions that seem like overkill. But through trial-and-error, I figured out enough to update my Postgres extensions projects to automatically test on multiple versions of Postgres, as well as to bundle and release them on PGXN. The first draft of that effort is pgxn/pgxn-tools1, a Docker image with scripts to build and run any version of PostgreSQL between 8.4 and 12, install additional dependencies, build, test, bundle, and release an extension.

Here’s how I’ve put it to use in a GitHub workflow for semver, the Semantic Version data type:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
name: CI
on: [push, pull_request]
jobs:
  test:
    strategy:
      matrix:
        pg: [12, 11, 10, 9.6, 9.5, 9.4, 9.3, 9.2]
    name: 🐘 PostgreSQL ${{ matrix.pg }}
    runs-on: ubuntu-latest
    container: pgxn/pgxn-tools
    steps:
      - run: pg-start ${{ matrix.pg }}
      - uses: actions/checkout@v3
      - run: pg-build-test

The important bits are in the jobs.test object. Under strategy.matrix, which defines the build matrix, the pg array defines each version to be tested. The job will run once for each version, and can be referenced via ${{ matrix.pg }} elsewhere in the job. Line 10 has the job a pgxn/pgxn-tools container, where the steps run. The are are:

  • Line 12: Install and start the specified version of PostgreSQL
  • Line 13: Clone the semver repository
  • Line 14: Build and test the extension

The intent here is to cover the vast majority of cases for testing Postgres extensions, where a project uses PGXS Makefile. The pg-build-test script does just that.

A few notes on the scripts included in pgxn/pgxn-tools:

  • pg-start installs, initializes, and starts the specified version of Postgres. If you need other dependencies, simply list their Debian package names after the Postgres version.

  • pgxn is a client for PGXN itself. You can use it to install other dependencies required to test your extension.

  • pg-build-test simply builds, installs, and tests a PostgreSQL extension or other code in the current directory. Effectively the equivalent of make && make install && make installcheck.

  • pgxn-bundle validates the PGXN META.json file, reads the distribution name and version, and bundles up the project into a zip file for release to PGXN.

  • pgxn-release uploads a release zip file to PGXN.

In short, use the first three utilities to handle dependencies and test your extension, and the last two to release it on PGXN. Simply set GitHub secrets with your PGXN credentials, pass them in environment variables named PGXN_USERNAME and PGXN_PASSWORD, and the script will handle the rest. Here’s how a release job might look:

15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
  release:
    name: Release on PGXN
    # Release pon push to main when the test job succeeds.
    needs: test
    if: github.ref == 'refs/heads/main' && github.event_name == 'push' && needs.test.result == 'success'
    runs-on: ubuntu-latest
    container:
      image: pgxn/pgxn-tools
      env:
        PGXN_USERNAME: ${{ secrets.PGXN_USERNAME }}
        PGXN_PASSWORD: ${{ secrets.PGXN_PASSWORD }}
    steps:
      - name: Check out the repo
        uses: actions/checkout@v3
      - name: Bundle the Release
        run: pgxn-bundle
      - name: Release on PGXN
        run: pgxn-release

Note that lines 18-19 require that the test job defined above pass, and ensure the job runs only on a push event to the main branch, where we push final releases. We set PGXN_USERNAME and PGXN_PASSWORD from the secrets of the same name, and then, in lines 27-32, check out the project, bundle it into a zip file, and release it on PGXN.

There are a few more features of the image, so read the docs for the details. As a first cut at PGXN CI/CD tools, I think it’s fairly robust. Still, as I gain experience and build and release more extensions in the coming year, I expect to work out integration with publishing GitHub releases, and perhaps build and publish relevant actions on the GitHub Marketplace.


  1. Not a great name, I know, will probably change as I learn more. ↩︎

iovationeering

Since June, as part of my work for PGX, I’ve been doing on-site full-time consulting for iovation here in Portland. iovation is in the business of deterring online fraud via device identification and reputation. Given the nature of that business, a whole lot of data arrives every day, and I’ve been developing PostgreSQL-based solutions to help get a handle on it. The work has been truly engaging, and a whole hell of a lot of fun. And there are some really great, very smart people at iovation, whom I very much like and respect.

iovation

So much so, in fact, that I decided to accept their offer of a full time position as “Senior Data Architect.” I started on Monday.

I know, crazy, right? They’ve actually been talking me up about it for a long time. In our initial contact close to two years ago, as I sought to land them as a PGX client, they told me they wanted to hire someone, and was I interested. I said “no.” I said “no” through four months of contracting this summer and fall, until one day last month I said to myself, “wait, why don’t I want this job?” I had been on automatic, habitually insisting I wasn’t interested in a W2 position. And with good reason. Aside from 15 months as CTO at values of n (during which time I worked relatively independently anyway), I’ve been an independent consultant since I founded Kineticode in November of 2001. Yeah. Ten Years.

Don’t get me wrong, those ten years have been great! Not only have I been able to support myself doing the things I love—and learned a ton in the process—but I’ve managed to write a lot of great code. Hell, I will be continuing as an associate with PGX, though with greatly reduced responsibilities. And someday I may go indy again. But in the meantime, the challenges, opportunities, and culture at iovation are just too good to pass up. I’m loving the work I’m doing there, and expect to learn a lot over the next few years.

Kineticode

So what, you might ask, does this mean for Kineticode, the company I founded to offer support, consulting, and training services for Bricolage CMS? The truth is that Kineticode has only a few technical support customers left; virtually all of my work for the last two years has been through PGX. So I’ve decided to shut Kineticode down. I’m shifting the Bricolage tech support offerings over to PGX and having Kineticode’s customers move there as their contacts come up for renewal. They can expect the same great service as always. Better even, as there are 10 associates in PGX, and, lately, only me at Kineticode. Since Kineticode itself is losing its Raison d’être, it’s going away.

PGX

I intend to remain involved in the various open-source projects I work on. I still function as the benevolent dictator of Bricolage CMS, though other folks have stepped up their involvement quite a lot in the last few years. And I expect to keep improving [PGXN] and DesignScene as time allows (I’ve actually been putting some effort into both in the last few weeks; watch for PGXN and Lunar/Theory announcements in the coming weeks and months!). And, in fact, I expect that a fair amount of the work I do at iovation will lead to blog posts, conference presentations, and more open-source code.

This is going to be a blast. Interested in a front-row seat? Follow me on Twitter.

Looking for the comments? Try the old layout.

PGXN Blog and Twitterstream

I crated the PGXN Blog yesterday. Tune in there for news and announcements. I’ll also be posting status reports once development gets underway, so that all you fans out there can follow my progress. Once the site is done (or at 1.0 anyway), the blog will be used for announcements, discussion of support issues, etc. So tune in!

Oh, and I created a PGXN Twitterstream, too. You should follow it! New blog posts will be tweeted, and once the site gets going, new uploads will be tweeted, too. Check it out!

Looking for the comments? Try the old layout.

PGXN Development Project

I’m pleased to announce the launch of the PGXN development project. I’ve written a detailed specification and pushed it through general approval on pgsql-hackers. I’ve written up a detailed project plan and estimated things at a highly reduced PostgreSQL Experts rate to come up with a fundraising goal: $25,000. And now, thanks to founding contributions from myYearbook.com, and PostgreSQL Experts, we have started the fundraising phase of the project.

So what’s this all about? PGXN, the PostgreSQL Extension Network, is modeled on CPAN, the Perl community’s archive of “all things Perl.” PGXN will provide four major pieces of infrastructure to the PostgreSQL community:

I’ve been wanting to start this project for a long time, but given my need to pay the bills, it didn’t seem like I’d ever be able to find the time for it. Then Josh Berkus suggested that we try to get community interest and raise money for me to have the time to work on it. So I jumped on that, putting in the hours needed to get general approval from the core PostgreSQL developers and to create a reasonable project plan and web site. And thanks to MyYearbook’s and PGX’s backing, I’m really excited about it. I hope to start on it in August.

If you’d like to contribute, first: Thank You!. The PGXN site has a Google Checkout widget that makes it easy to make a donation. If you’d rather pay by some other means (checks are great for us!), drop me a line and we’ll work something out. We have a few levels of contribution as well, including permanent linkage on the PGXN site for your organization, as well as the usual t-shirts launch party invitations.

Looking for the comments? Try the old layout.