Just a Theory

Trans rights are human rights

Mini Summit Two

We had such thoughtful and engaged discussion at this week’s Postgres Extension Ecosystem Mini-Summit! I did learn that one has to reserve a spot for each mini-summit individually, however. Eventbrite sends reminders for each one you sign up for, not all of them.

To reserve a spot and be reminded for forthcoming meetings, hit the Eventbrite page and select a date and hit “Reserve a Spot” for each date you’d like to attend.

Back to this week’s meetup. My colleague Ian Stanton of Tembo gave a great talk, “Building Trunk: A Postgres Extension Registry and CLI”, that provided background on the motivations and problems that inspired the creation of trunk, a binary packaging system for Postgres extensions.

The presentation was followed by 35+ minutes of questions, discussion, and brainstorming, which I’ve summarized below. But first, links!

Now down to business.

Introduction

  • I opened the meeting and introduced Ian Stanton.

Presentation

  • Ian introduced himself and trunk, “an extension registry and CLI supporting Tembo Cloud. Wants to tell a story, starting with backstory.

  • Tembo founded November 2022, provide managed Postgres solution called Tembo Cloud. Idea is Postgres can be used for so many different things through the power of extensions, so built use-case optimized “stacks” flavors of Postgres powered by extensions and configurations. Super proud of them, including Message Queue, for which we build an open-source extension.

  • Envisioned ability to install any extension, including user-provided extensions. Knew we’d need an extension management solution. So we built it.

  • It’s called trunk, an extension registry and CLI, an open-source app for the community that hosts binary packages for extensions, and powers Tembo Cloud’s extension management.

  • Q1 2023 had build Tembo CLoud v1 with all extensions bundled in containers. But wanted way to install them on the fly, ideally with installable packages. Explored the ecosystem for tool we could use.

  • PGXN first we found. Love it, backed by the community, been around since 2011, but hosted source code, not binaries. Also little development since 2012.

  • Apt and Yum repositories are community-backed and are binaries, just what we wanted, but smaller subset of extensions relative to the 1000s available. Thought it would be too time-consuming to add them all through the community process.

    • *Steven Miller: in chat: “From Steven Miller : Also with apt packaging, it requires to install to a specific path, but we needed to customize the install path based on what pg_config shows for share lib and package lib dir. That way we could persist extension installations on tembo cloud”
  • Weighed pros and cons of building one. Pros:

    • Full control over integration with Tembo Cloud
    • Binary distribution
    • We could build new features quickly
    • We could publish new extensions quickly

    Cons:

    • How will the community react?
    • Recreating the wheel?
  • Expected to publish 2–3 extension a day, only do-able with a solution we built.

  • Want to build something meaningful for Tembo Cloud and the community.

  • Astronomer Registry for Airflow: Built by Astronomer to find modules for Airflow, very well received by the community.

  • PGXN, Apt, and Yum repos: Wanted to take the best of them and build on it.

  • crates.io: Wanted a similar great experience for Postgres extensions.

  • Vision boiled down to discoverability, categories, ratings system, certification, and indexing of cloud provider support.

  • Want to package any extension, whether SQL, C/SQL, or pgrx.

  • Simple experience, like cargo publish and cargo install cargo-pgrx.

    Eric in chat: “❤️”

  • Hopes and Dreams: had idea people would magically show up, contribute to the code, and publish their extensions. Wanted to support multiple platforms, architectures, and Postgres versions, and for it to be a one-stop shop for Postgres extensions.

  • How it works.

  • CLI and Registry, written in Rust, uses Docker to build extensions. Packages named <trunk-project-name>-<version>-<pg-version>.tar.gz. Published with trunk publish and installed with trunk install, putting all the files in the right places.

    • Steven Miller in chat: “The approach to use docker for building has been nice. It allows for cross-compile, for example, building for any platform docker supports with the —platform flag”
  • Registry stores metadata and service web site and API, and uses S3 bucket for the tar-gzip files.

  • Example building semver extension:

    Create Trunk bundle:
    bitcode/src/semver/src/semver.bc
    bitcode/src/semver.index.bc
    semver.so
    licenses/LICENSE
    extension/semver--0.10.0--0.11.0.sql
    extension/semver--0.11.0--0.12.0.sql
    extension/semver--0.12.0--0.13.0.sql
    extension/semver--0.13.0--0.15.0.sql
    extension/semver--0.15.0--0.16.0.sql
    extension/semver--0.16.0--0.17.0.sql
    extension/semver--0.17.0--0.20.0.sql
    extension/semver--0.2.1--0.2.4.sql
    extension/semver--0.2.4--0.3.0.sql
    extension/semver--0.20.0--0.21.0.sql
    extension/semver--0.21.0--0.22.0.sql
    extension/semver--0.22.0--0.30.0.sql
    extension/semver--0.3.0--0.4.0.sql
    extension/semver--0.30.0--0.31.0.sql
    extension/semver--0.31.0--0.31.1.sql
    extension/semver--0.31.1--0.31.2.sql
    extension/semver--0.31.2--0.32.0.sql
    extension/semver--0.32.1.sql
    extension/semver--0.5.0--0.10.0.sql
    extension/semver--unpackaged--0.2.1. sql
    extension/semver.control
    extension/semver.sql
    manifest. json
    Packaged to •/. trunk/pg_semver-0.32.1-pg15.tar.gz
    

    Package up SQL files, control file, SO files, bitcode files into gzip file.

  • Once it’s published, API surfaces all this information:

    [
      {
        "name": "pg_semver",
        "description": "A semantic version data type for PostgreSQL.",
        "documentation_link": "https://github.com/theory/pg-semver",
        "repository_link": "https://github.com/theory/pg-semver",
        "version": "0.32.0",
        "postgres_versions": [
          15
        ],
        "extensions": [
          {
            "extension_name": "semver",
            "version": "0.32.0",
            "trunk_project_name": "pg_semver",
            "dependencies_extension_names": null,
            "loadable_libraries": null,
            "configurations": null,
            "control_file": {
              "absent": false,
              "content": ""
            }
          }
        ],
        "downloads": [
          {
            "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg15-0.32.0.tar.gz",
            "pg_version": 15,
            "platform": "linux/amd64",
            "sha256": "016249a3aeec1dc431fe14b2cb3c252b76f07133ea5954e2372f1a9f2178091b"
          }
        ]
      },
      {
        "name": "pg_semver",
        "description": "A semantic version data type for PostgreSQL.",
        "documentation_link": "https://github.com/theory/pg-semver",
        "repository_link": "https://github.com/theory/pg-semver",
        "version": "0.32.1",
        "postgres_versions": [
          15,
          14,
          16
        ],
        "extensions": [
          {
            "extension_name": "semver",
            "version": "0.32.1",
            "trunk_project_name": "pg_semver",
            "dependencies_extension_names": null,
            "loadable_libraries": null,
            "configurations": null,
            "control_file": {
              "absent": false,
              "content": "# semver extension\ncomment = 'Semantic version data type'\ndefault_version = '0.32.1'\nmodule_pathname = '$libdir/semver'\nrelocatable = true\n"
            }
          }
        ],
        "downloads": [
          {
            "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg14-0.32.1.tar.gz",
            "pg_version": 14,
            "platform": "linux/amd64",
            "sha256": "f412cfb4722eac32a38dbcc7cd4201d95f07fd88b7abc623cd84c77aecc8d4bb"
          },
          {
            "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg15-0.32.1.tar.gz",
            "pg_version": 15,
            "platform": "linux/amd64",
            "sha256": "9213771ffc44fb5a88726770f88fd13e62118b0f861e23271c3eeee427a23be9"
          },
          {
            "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg16-0.32.1.tar.gz",
            "pg_version": 16,
            "platform": "linux/amd64",
            "sha256": "8ffe4fa491f13a1764580d274e9f9909af4461aacbeb15857ab2fa235b152117"
          }
        ]
      }
    ]
    

    Includes different tar-gzip files for different versions of Postgres, the contents of the control file, dependencies; loadable libraries and configurations; and the one extension in this package — some can have many like PostGIS. Then Postgres version support and some other metadata.

  • What it looks like on the web site, includes README contents, data from the last slide, install command, etc.

  • This is what installation looks like:

    $ trunk install pg_semver
    Using pkglibdir: "/usr/lib/postgresql/16/lib"
    Using sharedir: "/usr/share/postgresql/16"
    Using Postgres version: 16
    info: Downloading from: https://cdb-plat-usel-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg16-0.32.1.tar.gz
    info: Dependent extensions to be installed: []
    info: Installing pg_semver 0.32.1
    [+] bitcode/src/semver/src/semver.bc => /usr/lib/postgresql/16/lib
    [+] bitcode/src/semver. index.bc => /usr/lib/postgresql/16/lib
    [+] semver.so => /usr/lib/postgresql/16/lib
    info: Skipping license file licenses/LICENSE
    [+] extension/semver--0.10.0--0.11.0.sql => /usr/share/postgresql/16
    [+] extension/semver--0.11.0--0.12.0.sql => /usr/share/postgresql/16
    [+] extension/semver--0.12.0--0.13.0.sql => /usr/share/postgresql/16
    [+] extension/semver--0.13.0--0.15.0.sql => /usr/share/postgresql/16
    [+] extension/semver--0.15.0--0.16.0.sql = /usr/share/postgresql/16
    [+] extension/semver--0.16.0--0.17.0.sql => /us/share/postgresql/16
    [+] extension/semver--0.17.0--0.20.0.sql => /usr/share/postgresql/16
    [+] extension/semver--0.2.1--0.2.4.sql => /usr/share/postgresql/16
    [+] extension/semver--0.2.4--0.3.0.sql > /us/share/postgresql/16
    [+] extension/semver--0.20.0--0.21.0.sql => /usr/share/postgresql/16
    [+] extension/semver--0.21.0--0.22.0.sql => /usr/share/postgresql/16
    [+] extension/semver--0.22.0--0.30.0.sql => /usr/share/postgresql/16
    [+] extension/semver--0.3.0--0.4.0.sql => /usr/share/postgresql/16
    [+] extension/semver--0.30.0--0.31.0.sql = /usr/share/postgresql/16
    [+] extension/semver--0.31.0--0.31.1.sql => /usr/share/postgresql/16
    [+] extension/semver--0.31.1--0.31.2.sql => /usr/share/postgresql/16
    [+] extension/semver--0.31.2--0.32.0.sql => /usr/share/postgresql/16
    [+] extension/semver--0.32.1.sql => /usr/share/postgresql/16
    [+] extension/semver--0.5.0--0.10.0.sql => /usr/share/postgresql/16
    [+] extension/semver--unpackaged--0.2.1.sql => /usr/share/postgresql/16
    [+] extension/semver.control => /usr/share/postgresql/16
    [+] extension/semver.sql => /usr/share/postgresql/16
    
    ***************************
    * POST INSTALLATION STEPS *
    ***************************
    
    Install the following system-level dependencies:
            On systems using apt:
                libc6
    
    Enable the extension with:
           CREATE EXTENSION IF NOT EXISTS semver CASCADE;
    

    CLI pulls down the tar-gzip, unpacks it, and puts the files in the right places and tells the users what other commands are needed to enable the extension.

  • Pause to take a sip of water.

    • David Wheeler (he/him) in chat: “STAY HYDRATED PEOPLE!”
  • State of the project. Trunk powers extension management for Tembo Cloud, 200 extensions on the platform, install and enable on the fly. Tembo Cloud likely trunk’s #1 user.

  • Get lots of site traffic, especially around categorization, addresses the discoverability problem set set out to solve.

    • Jeremy S in chat: “Interested in insights from site traffic - you mentioned that ‘categorization’ was popular - any other things that traffic patterns seem to suggest you might have done really well, or clearly is needed?”
  • But pretty minimal community involvement, out fault for not involving the community early on.

  • Did we solve the problem?

    • For Tembo Cloud: yes! Trunk is core component of the Tembo Cloud platform that lest us offer high number of extensions.
    • For the community: no! But helped bring more awareness to the opportunities to improve the ecosystem as a community.
    • Saw other solutions arise around the same time, including dbdev and pgxman, and Yurri at Omnigres is working on something as well. Huge opportunity to solve this together.
    • Steven Miller in chat: “I think it is very nice way to install other extensions via an extension how dbdev works”
    • David Wheeler (he/him) in chat: “GRANT!”
    • Grant Holly in chat: “Oh hi”
  • Lessons Learned

    • It’s a really hard problem to solve! As you add more layers of complexity, like different architectures, versions of Postgres, it gets harder and harder.

      • Steven Miller in chat, Replying to “The approach to use …”: “The downside for this approach includes missing chip-specific instruction support, for example AVX512, which optimizes performance on some extensions. However if you are building with docker on the same architecture as the host, then it still includes these instructions.”

      • David Wheeler (he/him) in chat, Replying to “The approach to use …” “Also presumably no support for building for non-Linux platforms, yes?”

    • The extension ecosystem truly is the wild west, not really best practices around building, versioning, and releasing, and when you’re collecting and housing them, it makes things difficult. A huge opportunity for us to come up with those standards and share them with the community.

    • Community involvement is crucial, wish we’d done it better early on, that’s why we’re all here today! Solution to build together doesn’t happen if we don’t tackle it as a community.

    • Similarly, wish we’d reached out to folks like David and Devrim early on, to get more insight from them and bring the community into the project from the beginning

  • The future of trunk

    • Registry and CLI will continue to serve Tembo Cloud

    • Has paved the way for binary packaging and distribution in PGXN v2 that David is spearheading, will at least will inform and perhaps be the basis for that part of the project.

  • That’s all, thank you, back to you, David!

Discussion

  • David Wheeler (he/him): Thanks for history an context, Ian! Questions or discussion topics? Some comments in the thread from Steven and Tobias.

    • Tobias Bussmann: in chat: speaking of paths: it would be super helpful if postgresql would support loading extensions from additional paths beside the $SHAREDIR/extension and $PKGLIBDIR directories. At least following directory symlinks within…

    • Steven Miller in chat, Replying to “The approach to use …”: I tried to make it work for Mac, for example, but the docker support didn’t work basically. I think it might work for Windows, since they have better container support. However I didn’t try that yet.

  • David Wheeler (he/him): Reads Tobias’s comment. You can specify a subdirectory in the sharedir and maybe the moduledir? But it’s a little hinky right now. Steve, do you want to talk about the us of Docker to build images?

  • Steven Miller: Yeah, I’d love to. To Tobias’s point, agree, on Tembo Cloud, we have a persistent directory where wer’re sintalling extensions, but because there is no way for an extra sharedir or package dir, we’re persisting all of the library files, including Postgres core. Not ideal, especially for upgrades.

    Approach for building ind Docker: been nice, do the build ina Dockerfile, start the container, then install and compare the difference between layers and zip up all the new files. Great for cross-compile but, not working for mac or other systems. Will need a fallback option to do a local build.

    • Jeremy S: in chat, Replying to “speaking of paths: i…”: Exactly same point was also mentioned just this morning on slack by Matthias
  • David Wheeler (he/him): Makes sense, thanks. What other bits do you feel like could be useful for packaging binaries at a community level?

  • Steven Miller: Sometimes we install binaries with trunk, but then difficult to know what has been installed. Nothing like apt where there is a history of what is installed or uninstall. Would be nice to do something like trunk list and see everything that has been installed. Also, future should be not just install but management, including turning extensions on, and there are a lot of ways to turn them on.

  • Ian Stanton: uninstall would be useful, too.

  • David Wheeler (he/him): Other questions about trunk or challenges to binary distribution it brings?

    • *Tobias Bussmann in chat, Replying to “speaking of paths: i…”: this would allow an immutable PostgreSQL base package and still allow to install extensions on top. This is esp. important if you need to have singned packages like on macOS

    • *nils in chat, Replying to “speaking of paths: i…”: Guess there is some prior art in how search_path in pg work, or the PATH in unix’s.

      Should be doable to allow to specify some kind of ordered search path, where Postgres will look for extensions. That way, Postgres can protect it’s own libs to no be overwritten by external libs, but allow for loading them from extra paths.

  • Yurri: There is CREATE EXTENSION and other extensions like logical decoding plugins. Does trunk handle them?

  • Steven Miller: We think of it as four types extensions into 2x2 matrix: 1. Does it require CREATE EXTENSION true or false; and 2. Does it have a loadable library true or false. The false/false category is output plugins; The true/true category, e.g. pg_partman, pg_cron; CREATE EXTENSION false and loadable library true, e.g., autoexplain, just a library, no upgrade concerns; and then CREATE EXTENSION true and loadable library false is the default case.

  • Ian Stanton: Steven wrote a blog on this.

    • Eric in chat: Does pgrx make the process of building easier or harder and is there anything we can do today to make rust extension building better?

    • Jason Petersen in chat: Yeah, it sounds like we need some sort of system database like apt has; would enable management, uninstall, version list, whether upgrades are available, etc

  • Yurri: That would be great. What other modules are there without extensions, like autoexplain?

  • Ian Stanton: auth delay is another, base backup to shell, considered parts of postgres, but we have trouble categorizing them. There are 10-15 I’ve come across.

  • Yurri: ARe these categories on Tembo, can you click a button?

  • Ian Stanton: Not a category, but would be a good one to add.

  • Yurri: Did you say output plugins are handled with Tembo:

  • Steven Miller: YOu can install them with trunk, yes.

  • Yurri: And you have the build pipeline that will work without plugins too, yeah, cool.

  • David Wheeler (he/him): Tobias, did you want to say more about the path issues?

  • Tobias Bussmann: Sure! We are building the Postgres.app, distribution for macOS, working different from Linux systems. We distribute some extensions directly, but also allow building and installing extensions on it. Works nicely, even with pgxn client, but it’s built within the application, which breaks the code signature.

    We always have to fight against a breaking system to allow that. Possible, but would be much cleaner to specify an extra directory where extensions could be loaded, and we could distribute packages with binary extensions that the user could download and install separately from the Postgres.app.

  • David Wheeler (he/him): You’re not suggesting a different directory for every extension with a module, but just another path in the search path that’s not subject to the signature verification.

  • Tobias Bussmann: Yes, that would be an option, but with a flexible system could use one per extension or just specify a second directory. Contrib extensions sometimes seen as part of Postgres, and they’re all stuffed in the same directory with third party extensions, which gets confusing and hard to manage.

    • Steven Miller in chat: In the previous extensions mini summit, Yuri mentioned that he was working on a patch to postgres for extra libdir, extra share dir, but I have not been tracking this one
  • nils: That’s what I was saying in chat, there is prior art in Postgres and Unix systems where you can specify a search path in postgres for a list of schemas, and in Unix the path is to find binaries. Give me a delimited list of directories on my system. Could be super user only, where they can specify where they’re installed, and we can go through the list ot find an extension.

  • David Wheeler (he/him): I might be imagining this, but I seem to recall there was a proposal to have extensions in their own directories, which would be nice for packaging, but then every time you add one you have to add another directory to the list and there is some fear the lookup time could be too long.

    • Jeremy S in chat, replying to “speaking of paths: i…”: (Or like LD_LIBRARY_PATH )

    • David Wheeler (he/him) in chat, replying to “speaking of paths: i…”: LD_LIBRARY_PATH is all but dead on macOS

  • Jason Petersen: If it happens at startup I can’t imagine that being a concern. If the list changes you reboot. It’s not gonna be a performance problem, I hope.

  • *David Wheeler (he/him): Or HUP it if you don’t want downtime.

  • Jason Petersen: Sure, but it doesn’t need to be on every command.

  • *David Wheeler (he/him): Eric, do you want to pose your question about pgrx?

  • Eric: Sure. Wanted to know, were there stumbling blocks to get pgrx support built into trunk, and does it make things easy or difficult? Different from C path, are there things we could do to make things easier today?

  • Ian Stanton: Yeah, I think the issue is mostly on our end. We have a separate image for each version of pgrx, and keeping up with the releases is challenging. We need to rethink our image building strategy. Shouldn’t be one image for each version of pgrx. That’s the biggest thing I’ve noticed, mostly on our side.

  • *David Wheeler (he/him): Because you need the install the version of pgrx that the extension requires before you do the build, and that’s just too slow?

  • Ian Stanton: Could be too slow. We’ve known about this problem for some time, just hasn’t been addressed yet.

  • Eric: Okay, maybe we can talk about it offline one day, be happy to chat. I think we’re close to being able to have the CLI, cargo-pgrx, be a different version than whatever version the extension uses.

  • Ian Stanton: That would be super useful!

  • Eric: Yeah, I think we’re close to being at that point, if not there already. We can talk about that offline.

  • Ian Stanton: Nice! We’ll reach out in Discord.

  • *David Wheeler (he/him): Other comments or questions, or people who have worked on other kinds of binary registry things, would love to hear more from other perspectives. Devrim is going to talk about the Yum repository next week [ed. correction: in two weeks].

  • Steven Miller in chat: Daniele last time mentioned Pip is good example of mixing source and binary distributions

  • Eric: I have a random question related to this. In the past and recent history, has hackers talked about some way of storing extension in the database rather than relying on the file system?

  • *David Wheeler (he/him): Yes! In this long thread from 2011 [ed. Correction: 2013] Dimitri was proposing a “unit”, a placeholder name, where the object would be stored in the database. Very long thread, I didn’t read the whole thing, lot of security challenges with it. If it needs a shared object library loading having to be written to the file system it’s just not going to happen. I don’t know whether that’d be required or not.

    Dimitri also worked on a project called pginstall where you could install extensions from the database like dbdev, but not just TLEs, but anything. The idea is a build farm would build binaries and the function in the database would go to the registry and pull down the binaries and put them in the right places on the file system.

    There were a lot of interesting ideas floating around, but because of the legacy of the PGXS stuff, it has always been a bit of a struggle to decide not to use it, to support something not just on the machine, but do something over libpq or in SQL. Lot of talk, not a lot of action.

    • *Tobias Bussmann in chat in response to “In the previous ex…”: still searching on hacker for it. Meanwhile I found: https://commitfest.postgresql.org/5/170/

    • Steven Miller in chat: That approach is very awesome (install via extension)

  • Eric: I can see why it would take some time to sort it all out. One thing to require super user privileges to create an extension, but also having root on the box itself? Yeah.

  • Yurri: TLE plugs into that a little bit for a non-shared object. Not exactly storing it in the database, but does provide a SQL based/function method of installing from inside the database, but only for trusted languages, not shared objects.

  • *David Wheeler (he/him): dbdev install does download it from database.dev and stores it in the database, and has hooks into the CREATE EXTENSION command and pulls it out of its own catalog. Was a similar model with pginstall, but with binary support, too.

  • Yurri: Back to trunk. When you start building, and have to deal with binaries, pgxn you can put the source up there, but I want to get to the whole matrix of all the different versions. Every extension author does it a little different. Some extensions have versions for Postgres 15, another for 14, some have the same version across all the majors, sometimes an extension works for some majors and others. Has trunk expanded to other Postgres versions to support the whole exploding matrix of stuff that does and doesn’t work, 5-6 majors, gets to be a large matrix, a lot to keep track of. How’s that working out for the builds and managing that matrix.

    • Steven Miller in chat: Dimensions I think are:
      • pg version
      • architecture
      • chip-specific instructions (edge case for native builds?)
  • Steven Miller in chat: We just announced support for 14 and 16

    David Wheeler (he/him) in chat, replying to “Dimensions I think a…”: OS, OS version

  • Steven Miller in chat,: Replying to “Dimensions I think a…”: Ah right

  • Ian Stanton: Steven do you want to take that one?

  • Steven Miller: Oh yeah. We’ve started toe-dipping on this one. Started with Tembo Cloud’s platform, but have no released Postgres 14 and 16, and also trunk has built-in support for other architectures, such as arm, or whatever the Docker --platform flag supports. We looked at mac builds, not working yet, might work for Windows, which ahs better container support, but I don’t know, and also there is an edge case for pg_vector especially, which compiles to include ship-specific instructions for AVX512, which helps with vector. So that’s another dimension to consider.

  • Yurri: Part of the idea behind this forum is to see if we can chart a path forward, maybe not solve everything. What can we solve, how can we make something a little better for Postgres at large?

    • Eric in chat: Even as a Mac user I don’t know the answer to this… what’s the common Postgres package there? Postgres dot app, homebrew, something else?

    • David Wheeler (he/him) in chat: pgenv! (self-promotion)

    • Eric in chat: I assume folks don’t use macOS in prod but developers are important too

    • nils in chat, Replying to “Even as a Mac user I…”:

      $ git clone ..
      $ ./configure
      $ make
      $ make install
      

      At least that is what I do 😄

  • Steven Miller: In my opinion, the way to approach it is to know all the dimensions you need, and in the metadata API say which binaries are available. Then get through it with testing and badging If we let things get built, to what extent is it tested and used? That can help. Daniele was in the previous call, said we could look to Pip and Wheel files for inspiration, and Adam on our team has said the same. This is something that has some binary and some source, and falls back on doing the build when it needs to.

  • *David Wheeler (he/him): I’ve been thinking about this quite a bit lately. Can see needing to take advantage of multiple platforms available through GitHub workflow nodes or the community’s build farm, which has a vast array of different architectures and platforms to build stuff. There are precedents!

    I imagine a system where, when something is published on PGXN, another system is notified and queues it up to all its build farm members to build binaries, ideally without full paths like trunk, and making them available for those platforms. Building out that infrastructure will take a fair bit of effort, I think. With cross-compiling is available it might be…doable? But most modules and for SQL and maybe Rust or Go extensions, but a challenge for C extensions.

    This is a problem I’d like us to solve in the next year or two.

    • Steven Miller in chat, replying to “I assume folks don’t…”: Yeah exactly, like trunk install after brew install postgres

    • Tobias Bussmann in chat, replying to “Even as a Mac user…”: this seems to be quite spread. There are also people that prefer docker based installs

    • Eric in chat: pgrx supports cross compilation

      With a caveat or two!

    • Eric in chat, replying to “Even as a Mac user I…” @nils same. For v9.3 though 16!

  • *David Wheeler (he/him): What else? Reading the comments.

  • Yurri: I think maybe that PGXN JSON file, I know you’ve been spending time on it, David, including the proposal on namespacing a few days ago. That feels like it could be helpful to be part of this. IF it could be something we could center around… The first time I wanted to put an extension on PGXN, it took me a long time to figure out that JSON file. I didn’t find the blog post that goes through it in nice detail till like two weeks after. If I’d found it sooner I could have skipped so many things I tried to figure out on my own.

    If we can center around that file, it’ll draw more attention to it, more links back to it, more examples people blog about here and there, it helps going forward. The trick is getting it right not being this massive thing no one can figure out, or has too many options, but hits all the points we need.

    • nils in chat, replying to “Even as a Mac user I…”: Well, mostly for extension, for Postgres I rely on David’s pgenv

    • *Eric * in chat, replying to “Even as a Mac user I…”: @Tobias Bussmann hmm. Makes it difficult to get an extension installed.

  • *David Wheeler (he/him): I’ve been thinking about this a lot, drafted a doc some of my colleagues at Tembo have read over and I hope to publish soon [ed. Note: now published], thinking through what a v2 of the PGXN Meta Spec might include. I think we should extend with list of external libraries required, or the architectures it supports, or it’s a loadable library or an app that doesn’t even go into the database.

    I would like soon to draft an actual revision of the spec, and document it well but also turn it into a JSON Schema document so we can automate publishing it and verification in the same place. I also imagine building an eventual replacement or evolution of the PGXN client or trunk client or some client that you can use to manage that thing. I think pgrx does that, adding metadata via the client rather than parse and understand the whole file.

    I’m with you it could get really complicated, but I’m not sure I see an alternative other than building good tooling to minimize the pain.

  • Ian Stanton: I think automatically pulling that information when it’s readily available would be super helpful. We use it as an app to just take care of things for people.

  • *David Wheeler (he/him): Right, and then if we’re successful in getting it done it’s getting people to take up the tools and start using them. There’s only so much we can infer. I can tell how to do a build if there’s a Makefile or a configure file or a cargo.toml, but that doesn’t reveal what libraries are required. This is why there’s a lot of hand-tuning of RPM and Apt spec files.

    • Steven Miller in chat: We are calling this “system dependencies”

      Ssl and glibc the main ones 🙂

    • Jason Petersen in chat: And sometimes the package names aren’t even 1—1 mappings

    • Eric in chat: Ha! Try relying on elasticsearch as a runtime dependency! 😞

  • Yurri: That’s another thing to touch on. A lot of extensions are just a thin layer of glue between Postgres and some OSS library that someone else maintains. But the trick, when you want to build a Yum package, the dependency has a different name than the rest of the RedHat ecosystem vs. the Debian ecosystem. So part of what Devrim has to do to maintain the RPM packages is manually sort all that out, because you can’t automatically… libc! It’s called glibc in RedHat and just libc in Debian, and every package has slightly different names. Do how do you manage that in trunk? Do you pull the source for any dependencies? Does your Docker image…I don’t know how this is working.

    • David Wheeler (he/him) in chat: I want to build a json schema validation extension in Rust using https://github.com/Stranger6667/jsonschema-rs or something

    • Tobias Bussmann in chat, replying to “Ha! Try relying o…”: or V8 🤯

  • Ian Stanton: Two sides to that one is build time dependencies, and there there are runtime dependencies. I just dropped an example for some random extension. Tthe way we’ve been building this is to write out a Dockerfile that can include build time dependencies. [hunts for link…]

  • Ian Stanton: We specify them all there. But for runtime, we don’t know what’s required until we test the thing. We have stuff in our CI pipelines to install and enable the extension to see if it works. If it doesn’t, it will report a missing dependency. Then we know we need to add it to our Postgres images. Not the best flow for finding these dependencies. Steven, want to add anything more to the build time dependency piece?

    • David Wheeler (he/him) in chat, replying to “Ha! Try relying on …”: Next version of plv8 released on PGXN will have v8 bundled
  • Steven Miller: A lot share the same ones, SSL and glibc, so we just build with the same versions we run on Tembo Cloud. In the metadata we list all system dependencies, that’s what we build towards, and include them in the Docker image. If you pick a different stack, like the Machine Learning stack, it has all the Python stuff in the base image. We don’t really love this, but this is something where Python wheel might inspire us, becaus it has packaging and system dependencies.

    • Eric in chat, replying to “I want to build a js…”: I feel like I’ve seen one already?

    • David Wheeler (he/him) in chat, replying to “I want to build a js…”: GIMME

  • Yurri: If you really want to od this right, just like in the RPM repositories, you have to know what the dependencies are. David, I’m curious, what your thoughts are, if this is to be done right, there has to be a way to indicate dependencies in the META.json file, but then I’m talking about Debian and RedHat, but what about Mac? Windows doesn’t really have a packaging system. There are BSDs, other places Postgres can run, probably have to narrow the scope a bit to solve something.

  • Tobias Bussmann in chat, responding to “Ha! Try relying o…” Sounds promising, but for which architectures? I have good hope for pljs as replacement for plv8

  • Ian Stanton in chat: https://github.com/tembo-io/trunk/blob/d199346/contrib/fuzzystrmatch/Trunk.toml#L13

  • David Wheeler (he/him): Fortunately there are only around 1100 extensions in the world, a relatively low barrier at this point. Some of these other things have thousands or millions of extensions.

  • Yurri: I guess when you put it that way! But I wasn’t going to go through all 1000 of them one-at-a-time.

  • David Wheeler (he/him): No. I posted about this on Ivory a few weeks ago [ed. correction: he means on Mastodon]: how does one do this in a platform-neutral way. There are some emerging standards where people are trying to figure this stuff out. One is called purl, where you specify dependencies by packing URLs, or “purls”, and then it’s up to the installing client to resolve them vai whatever the packaging system it depends on.

    I would assume on Windows we’d have to say “it works great as long as you use Chocolatey” or something like that. But it’s certainly a difficult problem. I’m looking forward to your talk about your unique approach to solving it, Yurrii [ed. note: that’s the May 1 mini-summit], that’s going to be super interesting.

  • David G. Johnston: Ultimately you just crowd sourcing. If we just say “this is what we call this thing in PostgreSQL world”, then if people need to compile it on Chocolatey on Windows, they figure it out and contribute it. Or on Debian or RedHat. Just facilitate crowd-sourcing, metadata in a database.

  • David Wheeler (he/him): My initial idea was a global registry that people contribute to just by editing files in a GitHub repository.

  • David G. Johnston: HashiCorp has to have something like that already, there’s stuff out there, no need to reinvent the wheel. This is a global problem if we open-source it we can solve it.

  • David Wheeler (he/him): Right. Really appreciate everyone coming. Great discussion, I appreciate it. In two weeks, Devrim Gündüz is going to talk about the Yum Community Repository and the challenges of RPMifying extensions. I had this idea of automating adding extensions to the Yum and Apt repositories, an Devrim is a little skeptical. So super look forward to his perspective on this stuff. Two weeks from today at noon [ed.: America/New_York]. Thanks for coming!

    • Eric in chat: Thanks a ton! This is exciting stuff.

    • Tobias Bussmann in chat: Thanks all!

    • Grant Holly in chat: Thanks everyone. Great discussion

    • Jeremy S: in chat: Thanks david

    • Steven Miller in chat: Thanks all! Cya next time

    • Jeremy S in chat: Isn’t bagel supposed to come for the end

    • Ian Stanton in chat: Thanks all :)

RFC: PGXN Metadata Sketch

Update 2024-03-22: To compliment high-level discussions on the #extensions channel on the Postgres Slack, I also created a pull request to make it easer directly comment to this post point by point. Keep the feedback coming, and thank you!

Update 2024-03-25: Clarified the definition of “Release” and made “Maintainers” plural. Thanks to Matthias van de Meent for the suggestions!


This post proposes a new metadata standard for extension packaging, distribution, and delivery, building on the PGXN Meta Spec to address its shortcomings and emerging use cases 12 years on. The goals include:

  • Don’t break the existing standard, but add to it
  • Add explicit support for different types of Postgres extensions, such as background workers and loadable modules
  • Add additional metadata for curation and automated binary compilation
  • Add hooks for registry and third-party-contributed metadata, including binary distribution options and stats & reports.

There are two high level categories of metadata, split into two separate files:

  1. Package Metadata provided by extension package authors
  2. Registry Metadata aggregated by the root registry from various sources, including data derived from the extension source code or package metadata, but also trusted third parties such as packagers, smoke testers, security scanners, and more.

Following community discussion of this proposal, the Package Metadata will lead to a draft for PGXN Meta Spec version 2.0.0, while the Registry Metadata will guide the design and implementation of the Root Registry APIs required to provide it.

Terms

Definition of terms as used in this document, because every language and distribution system uses terms differently. These are ours.

  • Extension: a software component that extends the capabilities of a PostgreSQL database or cluster. Extensions may be CREATE EXTENSION extensions, background workers, command-line apps, loadable modules, shared libraries, and more.

  • Package: A collection of extensions that are released, versioned, and distributed together. Packages may be downloaded directly from version control repositories or in archive files generated by a release tag.

  • Package Path: Identifies a package, declared in the package metadata file. A package path should describe both what the package does and where to find it. Typically, a package path consists of a repository root path — the directory that contains the metadata file — and a directory within the repository.

  • Repository: The version control repository where the package is developed, referenced as package path or URL.

  • Repository root path: The portion of the package path that corresponds to the root directory of the version control repository where the package is developed, and which contains the metadata file. Most package are defined in their repository’s root directory, so this is usually the entire path. For example, github.com/example/pgtap is the repository root path for the package of the same name.

  • Source Distribution: The contents of a single package bundled together with package metadata into distributable archive file, usually named with the last part of the package path or the main extension, a dash, and the version, e.g., pgtap-1.14.3.zip.

  • Binary Distribution: A binary software distribution of a package, compiled and packaged for a particular OS version, platform, and hardware architecture. Examples include deb, RPM, and wheel.

  • Release: A single version of the package made available to the public on PGXN, expressed as the package path, an at sign, and the semver. Example: github.com/theory/pgtap@v1.14.3.

Package Metadata

The extension author provides this data as part of a release submitted to the root registry. The registry itself will not change this data in any way, but provide it as-is.

Essentials

Essential information about the package itself, including its path, name, version, list of authors, license, etc. Ecosystem applications use this data for indexing, naming conventions, and display information.

Fields (all required):

  • Package: A package path uniquely identifying the package (usually the source repo host name and path; see this RFC)
  • Name: The name of the extension package, usually the same as the main extension, not unique but conflicts are resolved by escalation to package
  • Version: The version of the package, expressed as a SemVer
  • Abstract: A short description of the purpose of the package
  • Description: A longer description of the package, answering the question “what is this thing and what value is it?”
  • Maintainers: List of maintainers, each an object with name and either email or url (or both)
  • License: An SPDX 2.3 license expression comprised of one or more licenses from the SPDX License List
  • Spec: Information about the metadata spec itself, a version and URL

Example:

{
  "package": "github.com/example/pg-pair",
  "name": "pair",
  "version": "1.1.0",
  "abstract": "A key/value pair data type",
  "description": "Adds a key/value pair data type to Postgres, useful for specifying arbitrary key/value function parameters.",
  "maintainers": [
    {
        "name": "Naomi Nagata",
        "email": "naomi@example.com",
        "url": "https://example.com/roci/naomi"
    }
  ],
  "license": "PostgreSQL OR MIT",
  "spec": {
    "version": "2.0.0",
    "url": "https://pgxn.org/meta/spec.txt"
  }
}

Other possibilities:

  • License File: An alternative to license for nonstandard licenses (from cargo)
  • Contributors: List of people who have contributed to the package
  • Emeriti: List of former maintainers and contributors

PGXN Compatibility:

  • Overloads the string representation of the spec v1 field, but it should be able to parse either one, especially with the spec version to differentiate.
  • Uniqueness of package and relationship to name
  • PGXN has maintainer; is it really worth pluralizing it? Maybe we don’t need it at all, given repository and issue links.

Artifacts

A list of links and checksums for downloading the extension package in one or more formats, including source code, binaries, system packages, and more. Apps use this information (along with registry metadata) to determine the best option for installing an extension on a particular system.

This information is optional, and provided in addition to the source code available from the root registry itself. The idea is that, on release, the release manager creates artifacts in one or more locations (such as GitHub releases), and it’s useful to link to them from the registry. Binary builds may be unlikely today, but leave the option open for now by allowing arbitrary text in the type field.

Each item must have:

  • Type: The type of artifact: Source, binary, RPM, etc.
  • URL: Link the the artifact
  • SHA: A SHA checksum

Each URL must properly resolve and the checksum must match.

[
  {
    "type": "source",
    "url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0.zip",
    "sha256": "2b9d2416096d2930be51e5332b70bcd97846947777a93e4a3d65fe1b5fd7b004"
  },
  {
    "type": "binary",
    "url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0-linux-amd64.tar.gz",
    "sha1": "12d9bc5cfb6bc3c453627eac69511f48be63cfc0"
  },
  {
    "type": "binary",
    "url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0-linux-arm64.tar.gz",
    "sha1": "787dc39137f7d1510a33ab0a1b8905cd5f3f72d1"
  }
]

PGXN Compatibility:

New object, no conflicts.

Resources

External information about the package, mostly links, including source code repository, bug reporting, documentation, badges, funding, etc. Apps use this data for links, of course, but also full text indexing, documentation rendering, and displaying useful information about the extension.

Fields:

  • Homepage: URL for the package’s home page
  • Issues: URL to issues
  • Documentation: URL to documentation
  • Support: URL to support
  • Repository: Source code repository URL
  • Badges: Array of badge link objects
  • Funding: Array of funding objects
{
  "homepage": "https://pair.example.com",
  "issues": "https://github.com/example/pair/issues",
  "documentation": "https://pair.example.com/docs",
  "support": "https://github.com/example/pair/discussions",
  "repository": "https://github.com/example/pair",
  "badges": [
    {
      "alt": "Test Status",
      "src": "https://test.packages.postgresql.org/github.com/example/pair.svg"
    }
  ],
  "funding": [
    {
      "type" : "individual",
      "url" : "http://example.com/donate"
    },
    {
      "type" : "patreon",
      "url" : "https://www.patreon.com/example"
    }
  ]
}

PGXN Compatibility:

  • issues replaces bugtracker and is just a URL. Can support either
  • repository is just a URL instead of an object. Can support either but with the package key referencing the repository path perhaps we don’t need to change it.

Contents

A description of what’s included in the package. Often a package consists of multiple extensions, such as PostGIS, which includes postgis, postgis_tiger_geocoder, address_standardizer, and more.

Furthermore, some extensions are not CREATE EXTENSION-type extensions at all, such as background workers, command-line apps, loadable modules, shared libraries, and more. Each should be listed along with documentation links where they differ from the package overall (or are simply more specific).

Fields:

  • Extensions: Object describing CREATE EXTENSION extensions. Keys are extension names, fields are objects with the fields sql, doc, abstract, control (same as control file, but can use the file itself), tle (meaning it can be used as a trusted language extension), and preload (to indicate whether its libraries need to be loaded in advance via shared_preload_libraries, session_preload_libraries, or local_preload_libraries).
  • Workers: Object describing background workers. Keys are worker names, fields are objects with the fields bin, doc, and abstract.
  • Apps: Object describing applications, command-line or otherwise. Keys are app names, fields are objects with the fields bin, doc, and abstract.
  • Modules: Object describing loadable modules that can be loaded into Postgres (not necessary for extensions that include libraries). Keys are module names, fields are objects with the fields lib (without file extension), doc, abstract, and preload.
  • Libraries: Other libraries that may ship in the package and need to be installed but are not loadable modules, such as a dynamic library used by an app. Keys are library names, fields are objects with the fields lib (without file extension) or dir (for a directory of files), doc, and abstract.
{
  "extensions": {
    "pair": {
      "sql": "sql/pair.sql",
      "doc": "doc/pair.md",
      "abstract": "A key/value pair data type",
      "preload": "session",
      "tle": true,
      "control": {
        "directory": "pair",
        "module_pathname": "$libdir/pair",
        "relocatable": true
      }
    }
  },
  "workers": {
    "pair_pruner": {
      "bin": "bin/pair_pruner",
      "doc": "doc/pair_pruner.md",
      "abstract": "A worker to periodically prune pairs"
    }
  },
  "apps": {
    "pair_rand": {
      "bin": "bin/pair_rand",
      "doc": "doc/pair_rand.md",
      "abstract": "Command to generate random pairs of strings"
    }
  },
  "modules": {
    "lib_pair": {
      "lib": "lib/lib_pair",
      "doc": "doc/lib_pair.md",
      "abstract": "A library hooking function calls to convert pairs to named parameters",
      "load": "shared_preload_libraries"
    }
  },
  "libraries": {
    "ruby_pair": {
      "dir": "lib/gems",
      "abstract": "Ruby libraries required to run the extension"
    }
  }
}

Other options:

  • Allow or require an array of docs?
  • Specify support for specific doc formats (markdown, manpage, plain text, MkDocs, etc.)?
  • Support glob patterns?
  • Add libraries as sub-key for workers and apps, e.g. for a Ruby app that requires a directory of gems?

PGXN Compatibility:

Significantly different from the existing provides object, so key this one under contents or manifest and continue to support both.

Dependencies

A list of external dependencies required to configure, build, test, install, and run the extensions in the package. These include not only other extension packages, but also external libraries and system dependencies. And let’s not forget the versions of Postgres required, as well as any OS and version dependencies (e.g, does it work on Windows? FreeBSD? What versions?) and architectures (arm64, amd64, etc.)

Fields:

  • platforms: Platforms supported by the package, defined as OS and architecture. Assume all platforms if none are listed.

  • postgres: Versions of PostgreSQL required by the package, specified as a version range, as well as compiled features (--with-* configure options)

  • pipeline: The build pipeline required to build the extension. Options: pgxs, meson (?) pgrx, gnu (./configure), gem, cpan, pip, go, etc.

  • prereqs: Object defining external dependencies required for different phases:

    • configure: Dependencies to configure the package (e.g., items required for ./configure or make to work)
    • build: Dependencies to build the package (in addition to run dependencies)
    • test: Dependencies to test the package (in addition to build and run dependencies)
    • run: Dependencies to run the package
    • develop: Dependencies to develop the package (in addition to all other phase dependencies)

    Each points to an object with at least one of these keys:

    • requires: Required to use the package
    • recommends: Not required, but recommended as a best practice
    • suggests: Not required, but nice to have
    • conflicts: Package will not work with these items

    These, in turn, contain at least one of these keys:

    • package: Other packages, expressed by their package paths, pointing to a version expression
    • external: External dependencies not included with the package
    • contrib: List of Postgres contrib or development packages
  • variations: A list of dependency variations, each with two fields:

    • where: Specifies a configuration that, when matched, includes the dependencies. For example, to specify alternate dependencies for Linux arm64, it would be "platforms": { "linux": "arm64" }. For a specific version of Postgres, it would be something like "postgres": { "version": ">= 16, < 17" }. [This likely needs some deep thinking through.]
    • dependencies: An object containing of the other key structures in this object, defining the additional dependencies for environments that match the where field.

The packages and external keys point to objects in which the keys are dependency names and the values are version range expressions (empty string means any version). The postgres.version field is also a version range expression.

The contrib key points to an array of Postgres-supplied apps and extensions. Versions are not necessary because they effectively have the same version as a supported Postgres version.

Example:

{
  "postgres": {
    "version": ">= 12, < 17",
    "with": [ "xml", "uuid", "perl" ]
  },
  "pipeline": "pgrx",
  "platforms": {
    "linux": [ "amd64", "arm64" ],
    "darwin": [ "amd64", "arm64" ],
    "windows": [ "amd64" ],
    "freebsd": [ "amd64" ]
  },
  "dependencies": {
    "configure": {
      "requires": {
        "external": { "cargo-pgrx": "" }
      }
    },
    "build": {
      "requires": {
        "external": {
          "curl": "",
          "awk": "",
          "sed": "",
          "perl": "5.20"
        }
      },
      "recommends": {
        "external": {
          "jq": "",
          "perl": "5.38"
        }
      }
    },
    "test": {
      "requires": {
        "contrib": [ "pg_regress", "plpgsql", "plperl" ],
        "packages": {
          "github.com/theory/pgtap": "1.1.0"
        }
      }
    },
    "run": {
      "requires": {
        "contrib": [ "plperl" ],
        "packages": {
          "github.com/theory/hostname": ""
        }
      }
    }
  }
}

Notes:

The external field is the tricky one. How does one provide a canonical dependency for a third-party library or app that may have loads of different names on different platforms, different packaging systems (apt vs rpm), and even different versions of them, or split apart for development and runtime use? Some possibilities:

  • Use purl and let downstream services handle the conversion
  • Create a rule set that allows one to predict dependency name changes across ecosystems, similar to repology-rules
  • Just use Repology?

Other notes:

  • Decide on a version range standards to adopt. Start with the PGXN version range, pep-0508, npm ranges, and Gem::Version.
  • Is pipeline really necessary, given configure requirements? I think so, because it tells the client the preferred build system to use, in case it can’t detect it for some reason.

PGXN Compatibility:

Significantly differs from the existing prereqs object, so this one is keyed under dependencies and we can support either.

Classification

Classification metadata lets the extension developer associate additional information to improve discovery, such as key words. It might also allow selections from a curated list of extension classifications, such as the category slugs supported for the cargo categories field.

Ecosystem apps use this data to organize extensions under key words or categories, making it easier for users to find extensions that are frequently used together or for specific workloads or tasks.

Example:

{
  "tags": [
    "testing",
    "pair",
    "parameter"
  ],
  "categories": [
    "Machine Learning"
  ]
}

PGXN Compatibility:

Conflicts with the existing root-level tags array, but we can support either one.

Example: PGML Extension

A hand-coded example that attempts to capture all the complexity of the pgml extension:

{
  "package": "github.com/postgresml/postgresml/pgml-extension",
  "name": "pgml",
  "version": "2.8.2",
  "abstract": "pgml: Created by the PostgresML team",
  "maintainers": [
    {
      "name": "the PostgresML team",
      "url": "https://github.com/postgresml/postgresml/"
    }
  ],
  "license": "MIT",
  "spec": {
    "version": "2.0.0",
    "url": "https://pgxn.org/meta/spec.txt"
  },
  "artifacts": [
    {
      "type": "source zip",
      "url": "https://github.com/postgresml/postgresml/archive/refs/tags/v2.8.2.zip",
      "sha256": "2b9d2416096d2930be51e5332b70bcd97846947777a93e4a3d65fe1b5fd7b004"
    },
    {
      "type": "source tgz",
      "url": "https://github.com/postgresml/postgresml/archive/refs/tags/v2.8.2.tar.gz",
      "sha256": "845f28339c6159ac32daccea1cd17b386ea083c3e60bb8d58fb737725afe7eb5"
    }
  ],
  "resources": {
    "homepage": "https://postgresml.org/",
    "issues": "https://github.com/postgresml/postgresml/issues",
    "documentation": "https://postgresml.org/docs/",
    "support": "https://discord.com/invite/DmyJP3qJ7U",
    "repository": "https://github.com/postgresml/postgresml",
    "badges": [
      {
        "alt": "Tests Passing",
        "src": "https://github.com/postgresml/postgresml/actions/workflows/ci.yml/badge.svg"
      },
      {
        "alt": "Chat Activity",
        "src": "https://camo.githubusercontent.com/1988e7aaaa6a0c8ddcf880faacf1ec3263e23fac4aeff4710922082fde84442a/68747470733a2f2f696d672e736869656c64732e696f2f646973636f72642f31303133383638323433303336393330303939",
        "link": "https://discord.gg/DmyJP3qJ7U"
      }
    ]
  },
  "manifest": {
    "extensions": {
      "pgml": {
        "sql": "sql/pgml--2.8.2.sql",
        "preload": "shared",
        "tle": false,
        "control": {
          "comment": "pgml:  Created by the PostgresML team",
          "default_version": "2.8.2",
          "module_pathname": "$libdir/pgml",
          "relocatable": false,
          "superuser": true,
          "schema": "pgml",
          "trusted": true
        }
      }
    }
  },
  "dependencies": {
    "postgres": {
      "version": ">= 15, < 16"
    },
    "pipeline": "pgxs",
    "platforms": {
      "linux": [
        "amd64",
        "arm64"
      ],
      "darwin": [
        "amd64",
        "arm64"
      ]
    },
    "dependencies": {
      "configure": {
        "requires": {
          "external": {
            "cargo-pgrx": "",
            "bison": "",
            "cmake": "",
            "flex": "",
            "libclang-dev": "",
            "libopenblas-dev": "",
            "libpython3-dev": "",
            "libreadline-dev": "",
            "libssl-dev": "",
            "pkg-config": ""
          }
        }
      },
      "run": {
        "requires": {
          "external": {
            "libopenblas": "",
            "libpython3": "",
            "libreadline": "",
            "libssl": "",
            "python3": ""
          }
        },
        "recommends": {
          "external": {
            "python(pyarrow)": "=11.0.0",
            "python(catboost)": "",
            "python(lightgbm)": "",
            "python(torch)": "",
            "python(torchaudio)": "",
            "python(torchvision)": "",
            "python(xgboost)": "",
            "python(accelerate)": "",
            "python(bitsandbytes)": "",
            "python(ctransformers)": "",
            "python(huggingface-hub)": "",
            "python(deepspeed)": "",
            "python(einops)": "",
            "python(optimum)": "",
            "python(peft)": "",
            "python(tokenizers)": "",
            "python(transformers)": "",
            "python(transformers-stream-generator)": "",
            "python(InstructorEmbedding)": "",
            "python(sentence-transformers)": "",
            "python(rouge)": "",
            "python(sacrebleu)": "",
            "python(sacremoses)": "",
            "python(datasets)": "",
            "python(orjson)": "",
            "python(langchain)": ""
          }
        }
      }
    },
    "variations": [
      {
        "where": {
          "platforms": {
            "linux": []
          }
        },
        "dependencies": {
          "prereqs": {
            "run": {
              "recommends": {
                "external": {
                  "python(auto-gptq)": "",
                  "python(xformers)": ""
                }
              }
            }
          }
        }
      }
    ]
  },
  "classification": {
    "tags": [
      "machine learning",
      "ml",
      "transformers"
    ],
    "categories": [
      "Machine Learning"
    ]
  }
}

Note that PostgresML also provides a dashboard app, but it does not ship with the extension, so it’s not listed here. It could have its own package metadata and be separately distributed.

For the Python dependencies, in the absence of a neutral standard for specifying package, here they’re specified using a syntax borrowed from RPM to make clear that they’re Python dependencies. A couple of those dependencies are Linux-only, so listed in variations where the OS is Linux.

Registry Metadata

The second metadata format is registry metadata. It starts by with a copy the package metadata, but builds from there, adding information (such as the artifact link, SHA, and cryptographic signature from the registry itself) and sections (defined below).

Providing a separate metadata format enables broader community collaboration to augment the metadata for extension packages.

Ideally an ecosystem of trusted services will be developed by various parties, who will report back to the root registry as the source of record for all metadata about extension packages. For example, a binary distribution service like apt.postgresql.org or trunk will submit links and checksums for binary packages and perhaps periodic download stats.

Broadly speaking, there are three typologies for registry metadata: Distributions, Reports, and Classifications.

Distributions

A list of links and checksums for downloading binary distributions for the extension package. The root registry page for the package can therefore list all the ways to install an extension and expand the list as new binary registries add it.

Along with the Artifacts package metadata, the distribution metadata allows installer tools to select the best option to install based on the OS, Postgres version, and cluster configuration (e.g., install the community Apt package if the cluster was installed from the community Apt registry).

Each item must have:

  • Registry: The name of the distribution registry, unique
  • Type: The type of distribution: Apt, RPM, TLE, etc.
  • URL: Link to the the registry’s page for the distribution
  • Command: In lieu of a URL and SHA it can provide instructions to install the distribution, e.g., apt-get install postgresql16-pair.

Each distribution registry must be trusted by the root registry to submit this information to add to the metadata, and have its own page with additional contextual information: its home page, docs, basic instructions, whatever. A README of sorts.

Example:

[
  {
    "registry": "trunk.pgxn.org",
    "type": "trunk",
    "command": "pgxn install github.com/example/pair"
  },
  {
    "registry": "apt.postgresql.org",
    "type": "apt",
    "command": "apt-get install postgresql16-pair"
  },
  {
    "registry": "yum.postgresql.org",
    "type": "rpm",
    "command": "yum install postgresql16-pair"
  },
  {
    "registry": "pgt.dev",
    "type": "apt",
    "url": "https://pgt.dev/extensions/pair",
    "command": "trunk install pair"
  },
  {
    "registry": "pgxman.com",
    "type": "apt",
    "url": "https://pgxman.com/x/pair",
    "command": "pgxman install pair"
  },
  {
    "registry": "database.dev",
    "type": "sql",
    "url": "https://database.dev/example/pair",
    "command": "SELECT dbdev.install('example-pair');"
  },
]

Other Options:

  • Information about installing from a registry that doesn’t offer URLs, like Homebrew, where one just needs to know what the formula is called. Maybe the same should apply to Apt?

Metrics and Reports

This object includes data such as number of downloads, build and test status on various Postgres/OS/version combinations, binary packaging distributions, test coverage, security certification, vulnerability detection, quality metrics and user ratings, and more. Some of these bits of data might include badges, e.g., for ratings or security assessments.

In the broader ecosystem, it would be the responsibility of the root registry to ensure that data comes only from trusted sources. However this data will be publicly readable, allowing any downstream applications to extend and publish metrics and reports with their own information.

Example:

{
    "aggregates": {
      "downloads": 32
    },
    "sources": {
      "smoke-tester.example.com": {
        "stats": {
          "passes": 32,
          "fails": 8,
          "unknown": 0
        },
        "link": "https://smoke-tester.example.com/extension/pair",
        "badge": "https://smoke-tester.example.com/extension/pair.svg"
      },
      "pgt.dev": {
        "downloads": 12,
        "tags": ["two", "kv"],
        "categories": ["Analytics"]
      },
      "pgxman.com": {
        "downloads": 20
      },
      "ratings.example.com": {
        "stats": {
          "ratings": 324,
          "average": 3.2,
          "comments": 13
        },
        "link": "https://ratings.example.com/extension/pair",
        "badge": "https://ratings.example.com/extension/pair.svg"
      },
      "stats.example.com": {
        "contributors": 2,
        "stars": 14,
        "last_active": "2024-01-12",
      },
      "security-scan.example.com": {
        "advisories": {
          "high": 2,
          "moderate": 11,
          "low": 1,
          "cves": 0
        },
        "link": "https://security-scan.example.com/extension/pair",
        "badge": "https://security-scan.example.com/extension/pair.svg"
      }
    }
}

Notes:

  • The aggregates section aggregates results from multiple sources, for example summing all downloads or averaging ratings. The list of items to aggregate could evolve regularly.

  • Each key in sources identifies a trusted downstream source of information. Each would have its own schema describing its objects and their meaning, along with URI templates to link to. For example, stats.example.com might have these templates:

    {
      "contributors": {
        "alt": "List of people who have contributed to the {name} project",
        "badge": "https://stats.example.com/{ package }/contributors.svg",
        "uri": "https://stats.example.com/{ package }/contributors"
      },
      "stars": {
        "alt": "Star ratings for the {name} project",
        "badge": "https://stats.example.com/{ package }/stars.svg",
        "uri": "https://stats.example.com/{ package }/stars"
      }
    }
    
  • Might be worth adopting more of an API of multiple files, rather than one big file. Example: Homebrew analytics. Note also its support for date ranges for analytics (30/90/365 days, stats when directly installed vs. installed as a dependency).

Extension Registry Namespacing RFC

A few weeks ago I brainstormed about decentralized Postgres extension publishing, inspired in part by an examination of Go decentralized publishing. It was…a lot. I’ve been deeply pondering the future of PGXN and the broader extension ecosystem, and want to start to nail down some decisions. To that end, I’d like to propose an update to extension namespacing.

Status Quo

There are currently three ways in which an extension is considered unique:

  1. Only one extension can have a given name within a single Postgres cluster. Names are defined by the name of the control file. It is therefore not possible to have two extensions with the same name in the same Postgres cluster.
  2. PGXN follows this pattern: Only one extension can have a given name in the PGXN registry. The first person to release an extension then “owns” its name, and no one else can release an extension with the same name.1 I think dbdev follows the same pattern.
  3. Other registries like trunk and pgxman define an extension by the distribution name, at least for the purposes of selecting a binary to install. Thus when you trunk install postgis, you get all of the extensions included, as you’d expect, while trunk install address_standardizer wouldn’t work at all. In the few places that trunk supports installation by extension name, it prompts the user to use the appropriate package name if there’s a conflict.

A Modest Proposal

I’d like to propose the following changes to the PGXN Meta Spec to start to move away from extension uniqueness in the broader extension ecosystem and more toward package name.

  • Add a new field, call it module_path, project_path, project_uri, that, if present, uniquely identifies an extension project and all of its parts. It should be to a Go-style module path (or URI) that identifies the project repository path where a META.json file lives.
  • Retain the provides object where keys identify extensions, but those keys will no longer be globally unique to the registry. In other words, the combination of module_path and extension name uniquely identifies an extension, including an empty module_path.

How it Works

Some examples. Let’s say there is an existing extension named pair, included in the distribution named pg_pair:

{
  "name": "pg_pair",
  "version": "1.2.3",
  "provides": {
    "pair": {
      "file": "pair.sql",
      "version": "1.2.0"
    }
  }
}

The extension name pair is unique, and pgxn install pair will download the pg_pair v1.2.3 bundle and compile and install pair v1.2.0.

Now someone else comes along and wants to make their own pair with this metadata:

{
  "name": "my_pair",
  "version": "0.2.3",
  "provides": {
    "pair": {
      "file": "pair.sql",
      "version": "0.2.3"
    }
  }
}

Just like today, this upload would be rejected, because there is already a registered pair extension. Under my proposal, they can disambiguate by providing a module_path:

{
  "name": "my_pair",
  "module_path": "github/example/pair",
  "version": "0.2.3",
  "provides": {
    "pair": {
      "file": "pair.sql",
      "version": "0.2.3"
    }
  }
}

This upload would be allowed. With these two releases, someone attempting to install pair would see something like this:

$ pgxn install pair
ERROR: Duplicate extension name “pair”. Install one of these instead:
       * pgxn.org/dist/pair
       * github/example/pair

Note the the module path pgxn.org/dist/pair in the the first option. This is the default module path for distributions without a module path.2 But now the user can select the proper one to install:

$ pgxn install pgxn.org/dist/pair
INFO: latest version: pgxn.org/dist/pair@1.2.3
INFO: building extension
INFO: installing extension
INFO: done!

Furthermore, the PGXN client will prevent the user from later installing a conflicting extension. The failure would look something like:

$ pgxn install github/example/pair
INFO: latest version: pgxn.org/dist/pair@0.2.3
ERROR: Cannot install extension “pair” from pgxn.org/dist/pair:
ERROR: A conflicting extension named “pair” is already installed
ERROR: from pgxn.org/dist/pair

Features with Benefits

I see a number of benefits to this change:

  • Compatibility with the v1 metadata spec, so that no data migration or distribution indexing is required.
  • It loosens up extension namespacing (or name registration, if you prefer) while adding additional metadata to help users evaluate the quality of an extension. For example, does it come from a well-known developer? You can see it right in the module path.
  • It creates a pattern to eventually allow auto-indexing of extensions. For example, if you run pgxn install github.com/example/pew, and PGXN doesn’t have it, it can look for a META.json file in that repository and, if it exists, and there’s a semver release tag, it could try to index it and let the user install it. There are ownership issues to be worked out, but it has possibilities.
  • It preserves the Postgres core concept of extension identity while putting in place a well-established (by Go modules and widespread use of URIs in general) that the Postgres core could eventually adopt to allow more flexible extension namespacing.

Request for Comments

What do you think? Good idea? Terrible idea? Please hit me with your thoughts on Mastodon, or via the #extensions channel on the Postgres Slack. I’d like to get this decision (and a few others, stay tuned!) nailed down soon and start development, so don’t hesitate? I need your help to prevent me from making a huge mistake.


  1. Unless the owner would like to share ownership with someone else, in which case they can email me to request that another user be granted “co-ownership”. They can also request to transfer ownership to another user, after which the original owner will no longer be able to release the extension. ↩︎

  2. Or, if the META.json file has a repository resource with a URL, PGXN could index it as the implied module path. Or, failing that, maybe it should fall back on the distribution name instead of a pgxn.org path, and prompt with pg_pair/pair↩︎

Talk: Building Trunk

Line drawing of a steamer trunk with three diamond stars above it.

Update: 2024-03-22: Slides and video linked below.

This week, my fellow Tembonaut Ian Stanton will present the extension ecosystem mini-summit talk, “Building Trunk: A Postgres Extension Registry and CLI”. We felt it important to get some insight from a couple of the recently-developed Postgres extension registries: what problems they set out to solve, how the were built and operate, successes addressing their issues, and what issues remain, both for the projects and the ecosystem overall. Ian plans to give us the low-down on trunk.

Join us! Need other information or just want an invitation without using Eventbrite, hit me up at david@ this domain, on Mastodon, or via the #extensions channel on the Postgres Slack.

Update: 2024-03-22: Such a great discussion, was great! Missed it? I got you. Links:

Mini Summit One

Great turnout and discussion for the first in a series of community talks and discussions on the postgres extension ecosystem leading up to the Extension Ecosystem Summit at pgconf.dev on May 28. Thank you!

The talk, “State of the Extension Ecosystem”, was followed by 15 minutes or so of super interesting discussion. Here are the relevant links:

For posterity, I listened through my droning and tried to capture the general outline, posted here along with interspersed chat history and some relevant links. Apologies in advance for any inaccuracies or missed nuance; i’m happy to update these notes with your corrections.

And now, to the notes!

Introduction

  • Introduced myself, first Mini Summit, six leading up to the in-person summit on May 28 at PGConf.dev in Vancouver, Canada.

  • Thought I would get it things started, provide a bit of history of extensions and context for what’s next.

Presentation

  • Postgres has a long history of extensibility, originally using pure SQL or shared preload libraries. Used by a few early adopters, perhaps a couple dozen, including …

  • Explicit extension support added in Postgres 9.1 by Dimitri Fontaine, with PGXS, CREATE EXTENSION, and pg_dump & pg_restore support.

  • Example pair--1.0.0.sql:

    -- complain if script is sourced in psql and not CREATE EXTENSION
    \echo Use "CREATE EXTENSION pair" to load this file. \quit
    
    CREATE TYPE pair AS ( k text, v text );
    
    CREATE FUNCTION pair(text, text)
    RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair;';
    
    CREATE OPERATOR ~> (LEFTARG = text, RIGHTARG = text, FUNCTION = pair);
    
  • Bagel makes an appearance.

  • Example pair.control:

    # pair extension
    comment = 'A key/value pair data type'
    default_version = '1.0'
    module_pathname = '$libdir/pair'
    relocatable = true
    
  • Example Makefile:

    EXTENSION    = pair
    MODULEDIR    = $(EXTENSION)
    DOCS         = README.md
    DATA         = sql/pair--1.0.sql
    TESTS        = test/sql/base.sql
    REGRESS      = base
    REGRESS_OPTS = --inputdir=test
    MODULES      = src/pair
    PG_CONFIG   ?= pg_config
    
    PGXS := $(shell $(PG_CONFIG) --pgxs)
    include $(PGXS)
    
  • Build and Install:

    $ make
    make: Nothing to be done for `all'.
    
    $ make install
    mkdir -p '/pgsql/share/extension'
    mkdir -p '/pgsql/share/pair'
    mkdir -p '/pgsql/share/doc/pair'
    install -c -m 644 pair.control '/pgsql/share/extension/'
    install -c -m 644 sql/pair--1.0.sql  '/pgsql/share/pair/'
    install -c -m 644 README.md '/pgsql/share/doc/pair/'
    
    $ make installcheck
    # +++ regress install-check in  +++
    # using postmaster on Unix socket, default port
    ok 1         - base                                       15 ms
    1..1
    # All 1 tests passed.
    
  • CREATE EXTENSION:

    $ psql -d try -c 'CREATE EXTENSION pair'
    CREATE EXTENSION
    
    $ pg_dump -d try
    --
    -- Name: pair; Type: EXTENSION; Schema: -; Owner: -
    --
    CREATE EXTENSION IF NOT EXISTS pair WITH SCHEMA public;
    
    --
    -- Name: EXTENSION pair; Type: COMMENT; Schema: -; Owner:
    --
    COMMENT ON EXTENSION pair IS 'A key/value pair data type';
    
  • Many of us saw opportunity in this new feature.

    PostgreSQL today is not merely a database, it’s an application development platform.

    — Me, 2010

  • Proposed to build PGXN. Raised funds to build it in late 2010. Launched site April 2011; Daniele Varrazzo released CLI, and Dickson Guedes released the dev CLI.

  • Problems PGXN set out to solve:

    • Source code distribution with user registration and namespacing
    • Discovery: Search, read docs, brows tags
    • Installation: CLI to compile and install using PGXS or Configure
  • PGXN Components:

  • Problems out of scope for PGXN:

    • Binary packaging and distribution
      • Defer to apt/yum
    • Developer tooling (though dev utils helped)
    • Build tooling
      • Defer to core (PGXS)
  • PGXN Shortcomings:

    • Little development since 2012
    • Search limitations
    • Source of Record
      • Minority of available extensions on PGXN
      • Releases uneven or neglected

    In classic SDLC fashion, PGXN POC shipped as an MVP and was neglected.

    — Me, Just Now

  • Been peripheral to Postgres extensions for the last 10-12 years, but some things have happened.

  • Non-Core extension counts:

  • Daniele asks about that last source, which is just a list in a gist.

  • Joe Nelson links to the gist in Zoom chat. It is not his list, contrary to my off-the-cuff guess

  • Why haven’t extensions taken off?

  • Lost Opportunities

    • No one canonical source to discover and install extensions
    • Difficult to find and discover extensions without canonical registry
    • Most extensions are under-documented and difficult to understand
    • They are also hard to configure and install; most people don’t want or need a compiler
    • The maturity of extensions can be difficult to gauge, not systematized, must each be independently researched
      • David Christensen in Chat “attention economy/awareness, NIH, etc”
      • Jeremy S in chat: “Maybe some people don’t know they are using extensions (I think that’s possible to some degree)”
    • There is no comprehensive binary packaging
    • Centralized source distribution is insufficient (even if it were complete)
      • jubilee in chat: Trust aspect?
      • David Johnson in chat: To seem legit you need to touch the repo at least annually to ensure it works on the newest major release. Even if you just do your compile and update the readme.
      • I mention using pgxn-utils and GitHub workflows to ensure my extensions continue working
    • There is insufficient developer tooling; pgxn-utils not well-maintained, don’t build on recent Rubies, but pgrx has a lot of Rust-oriented tooling
      • Eric in chat: ❤️
      • jubilee in chat: 🦀 mentioned!
  • Filling the Gaps

  • dbdev: “The Database Package Manager for Trusted Language Extensions”: Includes only TLEs, no binary extensions

  • trunk: “A Postgres Extension Registry”: Binary distribution of curated extensions, desires to be comprehensive and cross-platform

  • pgxman: “npm for PostgreSQL”: Binary Apt package distribution of curated extensions packaged with, desires to be comprehensive and cross-platform

  • Emphases: Ease of Use. Screenshot from pgxman:

    $ curl -sfL https://install.pgx.sh | sh -
    👏🎉 pgxman successfully installed
    $ pgxman install pgvector
    The following Debian packages will be installed:
    postgresql-14-pgxman-pgvector=0.5.1
    Do you want to continue? [Y/n] y
    pgvector has been successfully installed.
    
    • Daniele in chat: “Missing a “curl | sudo sh” there…. 👀”
    • Greg Mullane (CrunchyData) [he/him] in chat: “Really not a fan of that “pipe curl stuff from internet into sh” system.”
    • Jeremy S in chat: “Someone recently reprimanded me for putting curl | psql in an extension README. From a security perspective it probably sets a better example to do curl >file.sql … psql file.sql (encourage users not to run from Internet but read/review first)” * jubilee in chat: “apt/yum install is just a better UI over curl | sh :^)”
    • Jeremy S in chat: “Yes and once you’re to that point there’s already more supply chain verification happening”
    • Jeremy S in chat: “It’s usually just the initial bootstrap into any system, if the setup wasn’t already in your distro”
  • Emphases: Platform neutrality. Screenshot from trunk:

    Architecture x86-64
    Operating system Debian/Ubuntu
  • Emphases: Stats. Screenshot from dbdev:

    Downloads


    20 all time downloads 0 downloads in last 30 days 1 download in last 90 days 0 downloads in last 180 days

  • Emphases: Curation. Screenshot from trunk:

    Featured 7
    Analytics 13
    Auditing / Logging 7
    Data Change Capture 6
    Connectors 27
    Data / Transformations 49
    • Damien Clochard in chat: gtg, see you later guys !
  • MVPs

    • trunk: Manual integration, Currently Debian-only

    • pgxman: Form-based submission, Currently Apt-only

    • dbdev: TLEs only, CLI publishing

    • David Christensen in chat: “go has a pretty good extensions infra, imho, wrt discovery/docs, etc. also has the benefit of the package names being the URL to access it, which is a nice convention.”

  • New Opportunities Today

    What are the community opportunities for the extension ecosystem?

    Some ideas:

    • Improved dev tools: More than pgxn-utils and pgrx
    • Canonical registry: All publicly-available extensions in one pac3
    • Easy publishing: auto-discovery or CI/CD pipeline publishing
    • Continuous Delivery: CI/CD pipeline publishing
    • File-free installation: TLEs
    • Documentation: Something like Go docs or Rust docs
    • File management: Put all the files for an extension in one directory
    • Improved metadata
      • Library Dependencies: utilities used by extensions
      • Platform Dependencies: system packages
      • Build pipelines: PGXS, pgrx, make, cpan, pypi, etc.
      • Artifacts: Binaries build on release
      • Classification: Curated in addition to tags
      • Extension Types: Extensions, apps, background workers, loadable libraries
    • Derived Services
      • Binary Packaging: Distributed binaries for many platforms
      • Ratings & Reviews: Stars, thumbs, comments
      • Aggregated Stats: Repository stats, etc.
      • Smoke Testing: Matrix of multi-platform test results
      • Security Scanning: Reporting vulnerabilities
      • Badging & Curation: Third-party classification, badging various statuses
  • Extension Ecosystem Summit

    Collaborate to examine the ongoing work on PostgreSQL extension distribution, examine its challenges, identify questions, propose solutions, and agree on directions for execution.

  • 🏔️ Your Summit Organizers

  • Devrim Gunduz in chat: Thanks David!

  • Schedule:

    • March 6: David Wheeler, PGXN: “State of the Extension Ecosystem”
    • March 20: Ian Stanton, Tembo: “Building Trunk: A Postgres Extension Registry and CLI”
    • April 3: Devrim Gündüz: “yum.postgresql.org and the challenges RPMifying extensions”
    • April 17: Jonathan Katz: “TLE Vision and Specifics”
    • May 1: Yurii Rashkovskii, Omnigres: “Universally buildable extensions: dev to prod”
    • May 15: (Placeholder) David Wheeler, PGXN: “Metadata for All: Enabling discovery, packaging, and community”
  • Ultimately want to talk about what’s important to you, the members of the community to make extensions successful.

Discussion

  • Eric: I’m Eric Ridge, one of the developers behind pgrx, as you’re going through this process of building a modern extension ecosystem, let us know what we can do on the Rust side to help make your lives easier, we’re happy to help any way we can.

  • Steven Miller in chat:

    These are some areas of interest we noticed building Tembo

    Binary packaging / distribution:

    • Variable installation location
    • System dependencies / uncommon system dependencies or versions
    • Chip specific instructions (e.g. vector compiled with avx512)
    • Extension-specific file types / extra data files (e.g. anonymizer .csv data)

    Turning on extensions automatically

    • Different ways to enable extensions
    • does it need load (shared_preload_libraries, session_… etc)?
    • Does it use create extension framework?
    • Does it require a specific schema?
    • What about turning on in multiple DBs at the same time in the same cluster, with background worker?
    • Disabling, what data will be lost?
    • Validating safety / user feedback on upgrade?

    In cloud / SaaS:

    • Installing + enabling extensions quickly, without restart
    • Persisting extension files
    • Extension-specific files (e.g. libraries) versus postgres’ libraries
    • Updating
    • Troubleshooting crashes / core dumps

    Anyone else have similar problems / tips?

  • Steven Miller: These were just things I noted during the presentation. Curious if these are interesting to others on the call.

  • Daniele in chat: “Regards binary distributions, python wheels might be a useful reference.”

  • Steven Miller: That’s good point! What do people think of idea to just install extensions onto servers, not packages, persisted on the disk, next to PGDATA so they go into a single persistent volume, and the rest is managed by an immutable container.

  • Daniele: Had experience on Crunchy where we had to replace an image to get an extension. Looked for feature to have a sidecar or a volume with the extension.

  • Steven Miller: Didn’t have a separate directory just for extensions, it’s just pg_config --libdir fore everything. Had to persist entire directory, including those files form the base build, their internal files. Would have been nice to have a separate directory, extra-libdr or extra-sharedir, something like that.

  • Yurii Rashkovskii: I was working on a patch to do exactly that, but haven’t completed it. Was going to introduce additional directories to search for this stuff.

  • Steven Miller: That would be really awesome.

  • Jeremy S in chat: “Advantage of that is that a single image can be shared among systems with different needs”

  • Eric in chat: “Thoughts around “enterprise repositories” that could be self-hosted and disconnected from the internet?”

    • Ian Stanton in chat: “I’ll touch on this in the next talk, it’s crossed our minds when building the Trunk registry”
  • Steven Miller: I think that’s a great idea.

  • Bagel reappears.

  • David Wheeler: PGXN originally designed so anyone could run Manager and their own root mirror, and maybe rsync from the community one. Don’t know that anyone ever did, it’s a little complicated and most people don’t want to work with Perl. [Chuckles]. Definitely think there’s space for that. If you work with Java or Go or maybe Rust, lots of orgs like Artifactory that provide internal registries. Could be cool use case for Postgres extensions.

  • David Christensen in chat: “something that could support extension batches; like groups of related extensions that could be installed in bulk or loaded in bulk (so could accommodate the shared systems with different individual extension needs, but could be shared_preload_library configured)”

  • “Steven Miller” in chat: “Sounds familiar”

  • Greg Mullane (CrunchyData) [he/him] in chat: “All these items remind me of CPAN. We should see what things it (and other similar systems) get right and wrong. I’ve learned from CPAN that ratings, reviews, and badging are going to be very difficult.”

    • David Christensen in chat: “I’d assumed at the time that it was largely lifted (at least in philosophy/design) from CPAN. 🙂”
    • David Wheeler (he/him) in chat: “yes”
  • Jeremy S: I think this is mostly focused on developers, but I had recent experience where multiple people in the past few months, new to Postgres, are trying to understand extensions. They install a version and then see there are like 15 versions installed, so confused. Goes back to the install file. Bit of UX angle where there are sharp edges where people trying to make sense of extensions, the flexibility makes it hard to understand. Some might be some nice guides, some architectural things explaining PGXS, or improvements to make to the design. Related, not main topic, but good to keep end user UX and devs building on Postgres but not Postgres developers, who run their businesses.

  • David Wheeler: Yeah all the files can be confusing, which is why I think trunk and pgxman trying to simplify: Just run this command and then you have it.

  • Steven Miller in chat: “I really agree with what Jeremy is saying. Right now PGXN and Trunk are taking and approach like “whatever an extension could do / how it may work, it should work on this registry”. But I think more standards / “what is a normal extension” would make the UX much easier.”

  • Jeremy S: Even with that the available extensions view is still there. Some of that is just schema management, and that’s how core is doing schema management.

  • Steven Miller in chat: I exactly agree about the concern about multiple extensions within a package. Also version and name mismatches

  • David Wheeler: And not everything is an extension, you just want to work, or your extension is just utility like pg_top you just want to use. Extensions I think were a tremendous contribution to Postgres itself, but a lot of it was wrangling the existing system for building Postgres itself to make it work for that. Could be very interesting, though quite long term — and I know Dimitri has tried this multiple times — to build a proper package management system within Postgres itself, to eas a lot of that pain and burden.

  • Tobias Bussmann in chat: “Thank you for the great overview and for taking this topic further! Unfortunately, I’ll not be able to join at pgConf.dev but will follow whatever results this will lead to. As a package maintainer, I am constantly looking in a easy way to allow users to use extensions without having to package everything ;)”

  • Steven Miller in chat: “Like auto explain for example right. i.e. a LOAD only “extension””

  • *Yurii Rashkovskii: An interesting topic, what extensions are capable of doing and how they can be more self-contained. Like Steven was saying in chat: how easy is it to load and unload extensions. Example: want an extension to hook into a part of Postgres: executor, planner, etc. How do you go about enabling them? How you unload them, introspect, list of hooks.

    Omni extension provides a list of all hooks, and when you remove an extension it removes the hooks that provide the extension, but still not part of the core. Hooks one of the greatest ways to expand the functionality of Postgres, allows us to experiment with Postgres before committing to the full cycle of getting a patch into Postgres. Lets us get it to users today to try. if it makes a lot of sense and people want it, time to commit to the process of getting a patch committed. But if we don’t hve this venue, how to get extensions in, our ability to try things is limited.

  • jubilee in chat: Hmm. It seems my audio is not working.

  • David Wheeler: The next session is two weeks from today: Ian Stanton is going to talk about “Building Trunk: A Postgres Extension Registry and CLI”. Will be interesting because a number of people have decided to build a binary packaging system for extensions, just to air out what the challenges were, what problems they wanted to solve, what problems remain, and where they want to take it in the future.

  • Jeremy S* in chat: “Bagel clearly has something to say”

  • *David Wheeler: jubileee I see your audio issues, do you just want to type your question into chat? We can also discuss things in the #extensions channel on the Postgres Slack

  • David Wheeler: Thank you all for coming!

  • jubilee in chat: “I had a question which is about: Does Postgres actually support docs for extensions? Like, a lot of people don’t really WANT to read a README. Can you get docstrings for a function in psql?”

  • Ian Stanton in chat: “Thank you David!”

  • jubilee in chat: And if not, why not?

Post Presentation Discussion

From Slack:

  • David Wheeler: I see now that “jubilee” left their question in the Zoom chat.
  • David Wheeler: The closest attempt at this I’ve seen is pg_readme, which will generate Markdown for an extension from comments in the catalog and write it out to a file.
  • David G. Johnson: The comment on command adds in database comments that psql describe commands should display.

Also on Slack, Greg Sabino Mullane started a longish thread on the things we want to do and build.

Talk: State of the Extension Ecosystem

Photo of the summit of Mount Hood

Update: 2024-03-06: Slides and video linked below.

A quick reminder that I’ll be giving a brief talk on the “State of the Extension Ecosystem” on Wednesday at noon US Eastern / 17:00 UTC. This talk is the first in a series of community talks and discussions on the postgres extension ecosystem leading up to the Extension Ecosystem Summit at pgconf.dev on May 28.

I plan to give a brief history of Postgres extension tools and distribution, the challenges encountered, recent developments, and opportunities for the future. It should take about 30 minutes, followed by discussion. Following this pattern for all the talks in the series, I hope to set up some engaging discussions and to surface significant topics ahead of the summit.

Join us! Need other information or just want an invitation without using Eventbrite, hit me up at david@ this domain, on Mastodon, or via the #extensions channel on the Postgres Slack.

Update: 2024-03-06: Great turnout and discussion, thank you! Links:

Extension Ecosystem Summit 2024

Logo for PGConf.dev

I’m pleased to announce that some pals and I have organized and will host the (first annual?) Extension Ecosystem Summit at PGConf.dev in Vancouver (and more, see below) on May 28:

Enabling comprehensive indexing, discovery, and binary distribution.

Participants will collaborate to examine the ongoing work on PostgreSQL extension distribution, examine its challenges, identify questions, propose solutions, and agree on directions for execution.

Going to PGConf? Select it as an “Additional Option” when you register, or update your registration if you’ve already registered. Hope to see you there!


Photo of the summit of Mount Hood

Extension Ecosystem Mini-Summit

But if you can’t make it, that’s okay, because in the lead up to the Summit, to we’re hosting a series of six virtual gatherings, the Postgres Extension Ecosystem Mini-Summit.

Join us for an hour or so every other Wednesday starting March 6 to hear contributors to a variety of community and commercial extension initiatives outline the problems they want to solve, their attempts to so, challenges discovered along the way, and dreams for an ideal extension ecosystem in the future. Tentative speaker lineup:

  • March 6: David Wheeler, PGXN: “State of the Extension Ecosystem”
  • March 20: Ian Stanton, Tembo: “Building Trunk: A Postgres Extension Registry and CLI”
  • April 3: Devrim Gündüz: “yum.postgresql.org and the challenges RPMifying extensions”
  • April 17: Jonathan Katz: “TLE Vision and Specifics”
  • May 1: Yurii Rashkovskii, Omnigres: “Universally buildable extensions: dev to prod”
  • May 15: (Placeholder) David Wheeler, PGXN: “Metadata for All: Enabling discovery, packaging, and community”

Hit the event page for details. Many thanks to my co-organizers Jeremy Schneider, David Christensen, Keith Fiske, and Devrim Gündüz, as well as the PGConf.dev organizers for making this all happen!

Update: 2024-03-06: Updated the talk schedule.

RFC: Extension Metadata Typology

Lately I’ve been thinking a lot about metadata for Postgres extensions. Traditional use cases include control file metadata, which lives in .control files used by CREATE EXTENSION and friends, and PGXN metadata, which lives in META.json files used by PGXN to index and publish extensions. But these two narrow use cases for SQL behavior and source code distribution don’t provide the information necessary to enable other use cases, including building, installing, configuration, and more.

So I have also been exploring other metadata formats, including:

These standards from neighboring communities reveal a great deal of overlap, as one might expect (everything has a name, a version, an author, license, and so on), but also types of metadata that had not occurred to me. As I took notes and gathered suggestions from colleagues and coworkers, I began to recognize natural groupings of metadata. This lead to the realization that it might be easier — and more productive — to think about these groupings rather than individual fields.

I therefore propose a typology for Postgres extension metadata.

Extension Metadata Typology

Essentials

Essential information about the extension itself, including its name (or unique package name), version, list of authors, license, etc. Pretty much every metadata format encompasses this data. Ecosystem applications use it for indexing, installation locations, naming conventions, and display information.

Artifacts

A list of links and checksums for downloading the extension in one or more formats, including source code, binaries, system packages, and more. Apps use this information to determine the best option for installing an extension on a particular system.

Resources

External information about the extension, mostly links, including source code repository, bug reporting, documentation, badges, funding, etc. Apps use this data for links, of course, but also full text indexing, documentation rendering, and displaying useful information about the extension.

Contents

A description of what’s included in the extension package. Often an “extension” consists of multiple extensions, such as PostGIS, which includes postgis, postgis_tiger_geocoder, address_standardizer, and more. Furthermore, some extensions are not CREATE EXTENSION-type extension at all, such as background workers, command-line apps, libraries, etc. Each should be listed along with documentation links where they differ from the package overall (or are simply more specific).

Prerequisites

A list of external dependencies required to configure, build, test, install, and run the extension. These include not only other extensions, but also external libraries and OS-specific lists of binary package dependencies. And let’s not forget the versions of Postgres required, as well as any OS and version dependencies (e.g, does it work on Windows? FreeBSD? What versions?) and architectures (arm64, amd64, etc.)

How to Build It

Metadata that apps use to determine how to build the extension. Does it use the PostgreSQL PGXS build pipeline? Or perhaps it needs the cargo-based pgrx toolchain. Maybe a traditional ./configure && make pattern? Perl, Ruby, Python, Go, Rust, or NPM tooling? Whatever the pattern, this metadata needs to be sufficient for an ecosystem app to programmatically determine how to build an extension.

How to Install It

Usually an extension of the build metadata, the install metadata describes how to install the extension. That could be PGXS or pgrx again, but could also use other patterns — or multiple patterns! For example, perhaps an extension can be built and installed with PGXS, but it might also be TLE-safe, and therefore provide details for handing the SQL files off to a TLE installer.

This typology might include additional data, such as documentation files to install (man pages anyone?), or directories of dependent files or libraries, and the like — whatever needs to be installed for the extension.

How to Run It

Not all Postgres extensions are CREATE EXTENSION extensions. Some provide background workers to perform various tasks; others simply provide Utility applications like pg_top and pg_repack. In fact pg_repack provides both a command-line application and a CREATE EXTENSION extension in one package!

This metadata also provides configuration information, both control file parameters like trusted, superuser, and schema, but also load configuration information, like whether an extension needs its libraries included in shared_preload_libraries to enable LOAD or requires a cluster restart. (Arguably this information should be in the “install” typology rather than “run”.)

Classification

Classification metadata lets the extension developer associate additional information to improve discovery, such as key words. It might also allow selections from a curated list of extension classifications, such as the category slugs supported for the cargo categories field. Ecosystem apps use this data to organize extensions under key words or categories, making it easier for users to find extensions often used together or for various workloads or tasks.

Metrics and Reports

This final typology differs from the others in that its metadata derives from third party sources rather than the extension developer. It includes data such as number of downloads, build and test status on various Postgres/OS/version combinations, binary packaging distributions, test coverage, security scan results, vulnerability detection, quality metrics and user ratings, and more.

In the broader ecosystem, it would be the responsibility of the root registry to ensure such data in the canonical data for each extension comes only from trusted sources, although applications downstream of the root registry might extend metrics and reports metadata with their own information.

What More?

Reading through various metadata standards, I suspect this typology is fairly comprehensive, but I’m usually mistaken about such things. What other types of metadata do you find essential for the use cases you’re familiar with? Do they fit one of the types here, or do they require some other typology I’ve failed to imagine? Hit the #extensions channel on the Postgres Slack to contribute to the discussion, or give me a holler on Mastodon.

Meanwhile, I’ll be refining this typology and assigning all the metadata fields to them in the coming weeks, with an eye to proposing a community-wide metadata standard. I hope it will benefit us all; your input will ensure it does.

The History and Future of Extension Versioning

Every software distribution system deals with versioning. Early in the design of PGXN, I decided to require semantic versions (SemVer), a clearly-defined and widely-adopted version standard, even in its pre-1.0 specification. I implemented the semver data type that would properly sort semantic versions, later ported to C by Sam Vilain and eventually updated to semver 2.0.0.

As I’ve been thinking through the jobs and tools for the Postgres extension ecosystem, I wanted to revisit this decision, the context in which it was made, and survey the field for other options. Maybe a “PGXN v2” should do something different?

But first that context, starting with Postgres itself.

PostgreSQL Extension Version Standard

From the introduction extensions in PostgreSQL 9.1, the project side-stepped the need for version standardization and enforcement by requiring extension authors to adopt a file naming convention, instead. For example, an extension named “pair” must have a file with its name, two dashes, then the version as listed in its control file, like so:

pair--1.1.sql

As long as the file name is correct and the version part byte-compatible with the control file entry, CREATE EXTENSION will find it. To upgrade an extension the author must provide a second file with the extension name, the old version, and the new version, all delimited by double dashes. For example, to upgrade our “pair” extension to version 1.2, the author supply all the SQL commands necessary to upgrade it in this file:

pair--1.1--1.2.sql

This pattern avoids the whole question of version standards, ordering for upgrades or downgrades, and all the rest: extension authors have full responsibility to name their files correctly.

PGXN Versions

SemVer simplified a number of issues for PGXN in ways that the PostgreSQL extension versioning did not (without having to re-implement the core’s file naming code). PGXN wants all metadata for an extension in its META.json file, and not to derive it from other sources that could change over time.

Following the CPAN model, PGXN also required that extension releases never decrease the version.1 The well-defined sortability of semantic versions made this validation trivial. PGXN later relaxed enforcement to allow updates to previously-released versions. SemVer’s clearly specified sorting made this change possible, as the major.minor.patch precedence intuitively compare from left to right.

In other words, if one had previously released version 1.2.2, then released 1.3.0, a follow-up 1.2.3 is allowed, increasing the 1.2.x branch version, but not, say, 1.2.1, which decreases the 1.2.x branch version.

Overall, semantic versions have been great for clarity of versioning of PGXN extensions. The one bit of conflict comes from extensions that use some other other version standard in the control file, usually a two-part x.y version not allowed by SemVer, which requires x.y.z (or, more specifically, major.minor.patch).

But such versions are usually compatible with SemVer, and because PGXN cares only about the contents of the META.json, they’re free to use their own versions in the control file, just as long as the META.json file uses SemVers.

For example, the recent nominatim_fdw v1.0.0 release, which of course lists "version": "1.0.0" in its META.json file, sticks to its preferred default_version = '1.0' in its control file. The extension author simply appends .0 to create a valid SemVer from their preferred version, and as long as they never use any other patch number, it remains compatible.

Versioning Alternatives

Surveying the versioning landscape in 2024 yields a number of approaches. Might we prefer an alternative for future extensions distribution? Let’s look at the possibilities.

Ad Hoc Versions

As described above, the Postgres file naming convention allows ad hoc versions. As far as I can tell, so does the R Project’s CRAN. This approach seems fine for systems that don’t need to follow version changes themselves, but much trickier for systems that do. If I want to install the latest version of an extension, how does the installer know what that latest version is?

The answer is that the extension author must always release them in the proper order. But if someone releases 1.3.1 of an extension, and then 1.2.1, well then 1.2.1 is the latest, isn’t it? It could get confusing pretty quickly.

Seems better to require some system, so that download and install clients can get the latest version — or the latest maintenance version of an earlier release if they need it.

User Choice

Quite a few registries allow users to choose their own versioning standards, but generally with some very specific recommendations to prevent confusion for users.

  • Python Packaging is fairly liberal in the versions it allows, but strongly recommends semantic versioning or calendar versioning (more on that below).
  • CPAN (Perl) is also fairly liberal, due to its long history of module distribution, but currently requires “Decimal versions”, which are evaluated as floating-point numbers, or dotted integer versions, which require three dot-separated positive integers and must begin with the letter v.
  • RubyGems does not enforce a versioning policy, but warns that “using an ‘irrational’ policy will only be a disservice to those in the community who use your gems.” The project therefore urges developers to follow SemVer.

These three venerable projects date from an earlier period of registration and distribution, and have made concessions to times when no policies existed. Their solutions either try to cover as many legacy examples as possible while recommending better patterns going forward (Python, Perl), or simply make recommendations and punt responsibility to developers.

SemVer

More recently-designed registries avoid this problem by requiring some level of versioning standard from their inception. Nearly all use SemVer, including:

  • Go Modules, where “Each version starts with the letter v, followed by a semantic version.”
  • Cargo (Rust), which “uses SemVer for specifying version numbers. This establishes a common convention for what is compatible between different versions of a package.”
  • npm, where the “version must be parseable by node-semver, which is bundled with npm as a dependency.”

CalVer

CalVer eschews context-free incrementing integers in favor of semantically-meaningful versions, at least for some subset of a version string. In other words: make the version date-based. CalVer-versioned projects usually include the year and sometimes the month. Some examples:

  • Ubuntu uses YY.0M.MICRO, e.g., 23.04, released in April 2023, and 23.10.1, released in October 2023
  • Twisted uses YY.MM.MICRO, e.g., 22.4.0, released in April 2022

Ultimately, adoption of a CalVer format is a more choice about embedding calendar-based meaning into a version more than standardizing a specific format. One can of course use CalVer semantics in a semantic version, as in the Twisted example, which is fully-SemVer compliant.

In other words, adoption of CalVer need not necessitate rejection of SemVer.

Package Managers

What about package managers, like RPM and Apt? Some canonical examples:

  • RPM packages use the format:

    <name>-<version>-<release>.<architecture>
    

    Here <version> is the upstream version, but RPM practices a reasonable (if baroque) version comparison of all its parts. But it does not impose a standard on upstream packages, since they of course vary tremendously between communities and projects.

  • Apt packages use a similar format:

    [epoch:]upstream_version[-debian_revision]
    

    Again, upstream_version is the version of the upstream package, and not enforced by Apt.

  • APK (Alpine Linux) packages use the format

    {digit}{.digit}...{letter}{_suf{#}}...{-r#}
    

    I believe that {digit}{.digit}...{letter} is the upstream package version.

This pattern makes perfect sense for registries that repackage software from dozens of upstream sources that may or may not have their own policies. But a system that defines the standard for a specific ecosystem, like Rust or PostgreSQL, need not maintain that flexibility.

Recommendation

Given this survey, I’m inclined to recommend that the PostgreSQL community follow the PGXN (and Go, and Rust, and npm) precedent and continue to rely on and require semantic versions for extension distribution. It’s not perfect, given the contrast with the core’s lax version requirements. CalVer partisans can still use it, though with fewer formatting options (SemVer forbids leading zeros, as in the Ubuntu 23.04 example).

But with its continuing adoption, and especially its requirement by more recent, widely-used registries, and capacity to support date-based semantics for those who desire it, I think it continues to make the most sense.

Wrong!

I’m probably wrong. I’m often mistaken in one way or another, on the details or the conclusion. Please tell me how I’ve messed up! Find me on the #extensions channel on the Postgres Slack or ping me on Mastodon.


  1. Why? Because every module on CPAN has one and only one entry in the index file. Ricardo Signes explains↩︎

Extension Ecosystem Jobs to be Done

Over on the Tembo blog I’ve published a thinking-through of what all the jobs to be done of the ideal Postgres extension ecosystem might be:

These challenges and the interest and energy put into exploring new solutions make clear that the time has come to revisit the whole idea of the PostgreSQL extension ecosystem: to work though the jobs to be done, specify the tools to do those jobs, and outline a plan for the broader Postgres community to design and build them.

Future posts will dream up the tools and make the plan; today we begin with the jobs.

🎬 Let’s get started.

Presentation: Introduction to the PGXN Architecture

As I started digging into the jobs and tools for the Postgres extension ecosystem as part of my new gig, I realized that most people have little knowledge of the PGXN architecture. I learned a lot designing PGXN and its services, and am quite pleased with where it ended up, warts and all. So I thought it worthwhile to put together a brief presentation on the fundamental design principals (static REST file API), inter-related services (root mirror, manager, API, site) and tools (CLI, CI/CD).

Yesterday, the Tembo blog published the presentation, including the video and slides, along with a high-level architecture diagram. I hope it’s a useful point of reference for the Postgres community as we look to better distribute extensions in the future.

Contemplating Decentralized Extension Publishing

TL;DR

As I think through the future of the Postgres extension ecosystem as a key part of the new job, I wanted to understand how Go decentralized publishing works. In this post I work it out, and think through how we might do something similar for Postgres extension publishing. It covers the Go architecture, namespacing challenges, and PGXS abuse; then experiments with URL-based namespacing and ponders reorganizing installed extension files; and closes with a high-level design for making it work now and in the future.

It is, admittedly, a lot, mainly written for my own edification and for the information of my fellow extension-releasing travelers.

I find it fascinating and learned a ton. Maybe you will too! But feel free to skip this post if you’re less interested in the details of the journey and want to wait for more decisive posts once I’ve reached the destination.

Introduction

Most language registries require developers to take some step to make releases. Many automate the process in CI/CD pipelines, but it requires some amount of effort on the developer’s part:

  • Register for an account
  • Learn how to format things to publish a release
  • Remember to publish again for every new version
  • Create a pipeline to automate publishing (e.g., a GitHub workflow)

Decentralized Publishing

Go decentralized publishing has revised this pattern: it does not require user registration or authentication to to publish a module to pkg.go.dev. Rather, Go developers simply tag the source repository, and the first time someone refers to the tag in Go tools, the Go module index will include it.

For example, publishing v1.2.1 of a module in the github.com/golang/example repository takes just three commands:

git tag v1.2.1 -sm 'Tag v1.2.1'
git push --tags
go list -m github.com/golang/example@v1.2.1

After a few minutes, the module will show up in the index and then on pkg.go.dev. Anyone can run go get -u github.com/golang/example to get the latest version. Go developers rest easy in the knowledge that they’re getting the exact module they need thanks to the global checksum database, which Go uses “in many situations to detect misbehavior by proxies or origin servers”.

This design requires go get to understand multiple source code management systems: it supports Git, Subversion, Mercurial, Bazaar, and Fossil.1 It also needs the go.mod metadata file to live in the project defining the package.

But that’s really it. From the developer’s perspective it could not be easier to publish a module, because it’s a natural extension of the module development tooling and workflow of committing, tagging, and fetching code.

Decentralized Extension Publishing

Could we publish Postgres extensions in such a decentralized pattern? It might look something like this:

  • The developer places a metadata file in the proper location (control file, META.json, Cargo.toml, whatever — standard TBD)
  • To publish a release, the developer tags the repository and calls some sort of indexing service hook (perhaps from a tag-triggered release workflow)
  • The indexing service validates the extension and adds it to the index

Note that there is no registration required. It simply trusts the source code repository. It also avoids name collision: github.com/bob/hash is distinct from github.com/carol/hash.

This design does raise challenges for clients, whether they’re compiling extensions on a production system or building binary packages for distribution: they have to support various version control systems to pull the code (though starting with Git is a decent 90% solution).

Namespacing

Then there’s name conflicts. Perhaps github.com/bob/hash and github.com/carol/hash both create an extension named hash. By the current control file format, the script directory and module path can use any name, but in all likelihood the use these defaults:

directory = 'extension'
module_pathname = '$libdir/hash'

Meaning .sql files will be installed in the Postgres share/extension subdirectory — along with all the other installed extensions — and library files will be installed in the library directory along with all other libraries. Something like this:

pgsql
├── lib
│   └── hash.so
└── share
    └── extension
    │   └── hash.control
    │   ├── hash--1.0.0.sql
    └── doc
        └── hash.md

If both projects include, say, hash.control, hash--1.0.0.sql, and hash.so, the files from one will stomp all over the files of the other.

Installer Abuse

Go avoids this issue by using the domain and path from each package’s repository in its directory structure. For example, here’s a list of modules from google.golang.org repositories:

$ ls -1 ~/go/pkg/mod/google.golang.org
api@v0.134.0
api@v0.152.0
appengine@v1.6.7
genproto
genproto@v0.0.0-20230731193218-e0aa005b6bdf
grpc@v1.57.0
grpc@v1.59.0
protobuf@v1.30.0
protobuf@v1.31.0
protobuf@v1.32.0

The ~/go/pkg/mod directory has subdirectories for each VCS host name, and each then subdirectories for package paths. For the github.com/bob/hash example, the files would all live in ~/go/pkg/mod/github.com/bob/hash.

Could a Postgres extension build tool follow a similar distributed pattern by renaming the control file and installation files and directories to something specific for each, say github.com+bob+hash and github.com+carol+hash? That is, using the repository host name and path, but replacing the slashes in the path with some other character that wouldn’t create subdirectories — because PostgreSQL won’t find control files in subdirectories. The control file entries for github.com/carol/hash would look like this:

directory = 'github.com+carol+hash'
module_pathname = '$libdir/github.com+carol+hash'

Since PostgreSQL expects the control file to have the same name as the extension, and for SQL scripts to start with that name, the files would have to be named like so:

hash
├── Makefile
├── github.com+carol+hash.control
└── sql
    └── github.com+carol+hash--1.0.0.sql

And the Makefile contents:

EXTENSION  = github.com+carol+hash
MODULEDIR  = $(EXTENSION)
DATA       = sql/$(EXTENSION)--1.0.0.sql
PG_CONFIG ?= pg_config

PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

In other words, the extension name is the full repository host name and path and the Makefile MODULEDIR variable tells pg_config to put all the SQL and documentation files into a directories named github.com+carol+hash — preventing them from conflicting with any other extension.

Finally, the github.com+carol+hash.control file — so named becaus it must have the same name as the extension — contains:

default_version = '1.0.0'
relocatable = true
directory = 'github.com+carol+hash'
module_pathname = '$libdir/github.com+carol+hash'

Note the directory parameter, which must match MODULEDIR from the Makefile, so that CREATE EXTENSION can find the SQL files. Meanwhile, module_pathname ensures that the library file has a unique name — the same as the long extension name — again to avoid conflicts with other projects.

That unsightly naming extends to SQL: using the URL format could get to be a mouthful:

CREATE EXTENSION "github.com+carol+hash";

Which is do-able, but some new SQL syntax might be useful, perhaps something like:

CREATE EXTENSION hash FROM "github.com+carol+hash";

Or, if we’re gonna really go for it, use slashes after all!

CREATE EXTENSION hash FROM "github.com/carol/hash";

Want to use both extensions but they have conflicting objects (e.g., both create a “hash” data type)? Put them into separatre schemas (assuming relocatable = true in the control file):

CREATE EXTENSION hash FROM "github.com/carol/hash" WITH SCHEMA carol;
CREATE EXTENSION hash FROM "github.com/bob/hash" WITH SCHEMA bob;
CREATE TABLE try (
    h1 carol.hash,
    h2 bob.hash
);

Of course it would be nice if PostgreSQL added support for something like Oracle packages, but using schemas in the meantime may be sufficient.

Clearly we’re getting into changes to the PostgreSQL core, so put that aside and we can just use long names for creating, modifying, and dropping extensions, but not necessarily otherwise:

CREATE EXTENSION "github.com+carol+hash" WITH SCHEMA carol;
CREATE EXTENSION "github.com+bob+hash" WITH SCHEMA bob;
CREATE EXTENSION "gitlab.com+barack+kicker_type";
CREATE TABLE try (
    h1 carol.hash,
    h2 bob.hash
    kt kicker
);

Namespacing Experiment

To confirm that this approach might work, I committed 24134fd and pushed it in the namespace-experiment branch of the semver extension. This commit changes the extension name from semver to github.com+theory+pg-semver, and follows the above steps to ensure that its files are installed with that name.

Abusing the Postgres extension installation infrastructure like this does work, but suffers from a number of drawbacks, including:

  • The extension name is super long, as before, but now so too are the files in the repository (as opposed to the installer renaming them on install). The shared library file has to have the long name, so therefore does the .c source file. The SQL files must all start with github.com+theory+pg-semver, although I skipped that bit in this commit; instead the Makefile generates just one from sql/semver.sql.
  • Any previous installation of the semver type would remain unchanged, with no upgrade path. Changing an extension’s name isn’t a great idea.

I could probably script renaming and modifying file contents like this and make it part of the build process, but it starts to get complicated. We could also modify installers to make the changes, but there are a bunch of moving parts they would have to compensate for, and given how dynamic this can be (e.g., the semver Makefile reads the extension name from META.json), we would rapidly enter the territory of edge case whac-a-mole. I suspect it’s simply too error-prone.

Proposal: Update Postgres Extension Packaging

Perhaps the Go directory pattern could inspire a similar model in Postgres, eliminating the namespace issue by teaching the Postgres extension infrastructure to include all but one of the files for an extension in a single directory. In other words, rather than files distributed like so for semver:

pgsql
├── lib
│   └── semver.so
└── share
    └── extension
    │   └── semver.control
    │   ├── semver--0.32.1.sql
    │   ├── semver--0.32.0--0.32.1.sql
    └── doc
        └── semver.md

Make it more like this:

pgsql
└── share
    └── extension
        └── github.com
            └── theory
                └── pg-semver
                    └── extension.control
                    └── lib
                    │   └── semver.so
                    └── sql
                    │   └── semver--0.32.1.sql
                    │   └── semver--0.32.0--0.32.1.sql
                    └── doc
                        └── semver.md

Or perhaps:

pgsql
└── share
    └── extension
        └── github.com
            └── theory
                └── pg-semver
                    └── extension.control
                    └── semver.so
                    └── semver--0.32.1.sql
                    └── semver--0.32.0--0.32.1.sql
                    └── semver.md

The idea is to copy the files exactly as they’re stored in or compiled in the repository. Meanwhile, the new semver.name file — the only relevant file stored outside the extension module directory — simply points to that path:

github.com/theory/pg-semver

Then for CREATE EXTENSION semver, Postgres reads semver.name and knows where to find all the files to load the extension.

This configuration would require updates to the control file, now named extension.control, to record the full package name and appropriate locations. Add:

name = 'semver'
package = 'github.com/theory/pg-semver'

This pattern could also allow aliasing. Say we try to install a different semver extension from github.com/example/semver. This is in its extension.control file:

name = 'semver'
package = 'github.com/example/pg-semver'

The installer detects that semver.name already exists for a different package and raises an error. The user could then give it a different name by running something like:

make install ALIAS_EXTENSION_NAME=semver2

This would add semver2.name right next to semver.name, and its contents would contain github.com/example/semver, where all of its files are installed. This would allow CREATE EXTENSION semver2 to load the it without issue (assuming no object conflicts, hopefully resolved by relocate-ability).

I realize a lot of extensions with libraries could wreak some havoc on the library resolver having to search so many library directories, but perhaps there’s some way around that as well? Curious what techniques experienced C developers might have adopted.

Back to Decentralized Publishing

An updated installed extension file structure would be nice, and is surely worth a discussion, but even if it shipped in Postgres 20, we need an updated extension ecosystem today, to work well with all supported versions of Postgres. So let’s return to the idea of decentralized publishing without such changes.

I can think of two pieces that’d be required to get Go-style decentralized extension publishing to work with the current infrastructure.

Module Uniqueness

The first is to specify a new metadata field to be unique for the entire index, and which would contain the repository path. Call it module, after Go (a single Git repository can have multiple modules). In PGXN Meta Spec-style JSON it’d look something like this:

{
    "module": "github.com/theory/pg-semver",
    "version": "0.32.1",
    "provides": {
      "semver": {
         "abstract": "A semantic version data type",
      }
    }
}

Switch from the PGXN-style uniqueness on the distribution name (usually the name of the extension) and let the module be globally unique. This would allow another party to release an extension with the same name. Even a fork where only the module is changed:

{
    "module": "github.com/example/pg-semver",
    "version": "0.32.1",
    "provides": {
      "semver": {
         "abstract": "A semantic version data type",
      }
    }
}

Both would be indexed and appear under the module name, and both would be find-able by the provided extension name, semver.

Where that name must still be unique is in a given install. In other words, while github.com/theory/pg-semver and github.com/example/pg-semver both exist in the index, the semver extension can be installed from only one of them in a given Postgres system, where the extension name semver defines its uniqueness.

This pattern would allow for much more duplication of ideas while preserving the existing per-cluster namespacing. It also allows for a future Postgres release that supports something like the flexible per-cluster packaging as described above.2

Extension Toolchain App

The second piece is an extension management application that understands all this stuff and makes it possible. It would empower both extension development workflows — including testing, metadata management, and releasing — and extension user workflows — finding, downloading, building, and installing.

Stealing from Go, imagine a developer making a release with something like this:

git tag v1.2.1 -sm 'Tag v1.2.1'
git push --tags
pgmod list -m github.com/theory/pg-semver@v1.2.1

The creatively named pgmod tells the registry to index the new version directly from its Git repository. Thereafter anyone can find it and install it with:

  • pgmod get github.com/theory/pg-semver@v1.2.1 — installs the specified version
  • pgmod get github.com/theory/pg-semver — installs the latest version
  • pgmod get semver — installs the latest version or shows a list of matching modules to select from

Any of these would fail if the cluster already has an extension named semver with a different module name. But with something like the updated extension installation locations in a future version of Postgres, that limitation could be loosened.

Challenges

Every new idea comes with challenges, and this little thought experiment is no exception. Some that immediately occur to me:

  • Not every extension can be installed directly from its repository. Perhaps the metadata could include a download link for a tarball with the results of any pre-release execution?
  • Adoption of a new CLI could be tricky. It would be useful to include the functionality in existing tools people already use, like pgrx.
  • Updating the uniqueness constraint in existing systems like PGXN might be a challenge. Most record the repository info in the resources META.json object, so it would be do-able to adapt into a new META format, either on PGXN itself or in a new registry, should we choose to build one.
  • Getting everyone to standardize on standardized versioning tags might take some effort. Go had the benefit of controlling its entire toolchain, while Postgres extension versioning and release management has been all over the place. However PGXN long ago standardized on semantic versioning and those who have released extensions on PGXN have had few issues (one can still use other version formats in the control file, for better or worse).
  • Some PGXN distributions have shipped different versions of extensions in a single release, or the same version as in other releases. The release version of the overall package (repository, really) would have to become canonical.

I’m sure there are more, I just thought of these offhand. What have you thought of? Post ’em if you got ’em in the #extensions channel on the Postgres Slack, or give me a holler on Mastodon or via email.


  1. Or does it? Yes, it does. Although the Go CLI downloads most public modules from a module proxy server like proxy.golang.org, it still must know how to download modules from a version control system when a proxy is not available. ↩︎

  2. Assuming, of course, that if and when the Postgres core adopts more bundled packaging that they’d use the same naming convention as we have in the broader ecosystem. Not a perfectly safe assumption, but given the Go precedent and wide adoption of host/path-based projects, it seems sound. ↩︎

PGXN Tools v1.4

Over on the PGXN Blog I’ve posted a brief update on recent bug fixes and improvements to the pgxn-tools Docker image, which is used fairly widely these days to test, bundle, and release Postgres extensions to PGXN. This fix is especially important for Git repositories:

v1.4.1 fixes an issue where git archive was never actually used to build a release zip archive. This changed at some point without noticing due to the introduction of the safe.directory configuration in recent versions of Git. Inside the container the directory was never trusted, and the pgxn-bundle command caught the error, decided it wasn’t working with a Git repository, and used the zip command, instead.

I also posted a gist listing PGXN distributions with a .git directory.

PGXN Challenges

PGXN Gear

Last week, I informally shared Extension Ecosystem: Jobs and Tools with colleagues in the #extensions channel on the Postgres Slack. The document surveys the jobs to be done by the ideal Postgres extension ecosystem and the suggests the tools and services required to do those jobs — without reference to existing extension registries and packaging systems.

The last section enumerates some questions we need to ponder and answer. The first one on the list is:

What will PGXN’s role be in this ideal extension ecosystem?

The PostgreSQL Extension Network, or PGXN, is the original extension distribution system, created 2010–11. It has been a moderate success, but as we in the Postgres community imagine the ideal extension distribution future, it’s worthwhile to also critically examine existing tools like PGXN, both to inform the project and to realistically determine their roles in that future.

With that in mind, I here jot down some thoughts on the challenges with PGXN.

PGXN Challenges

PGXN sets a lot of precedents, particularly in its decoupling of the registry from the APIs and services that depend on it. It’s not an all-in-one thing, and designed for maximum distributed dissemination via rsync and static JSON files.

But there are a number of challenges with PGXN as it currently stands; a sampling:

  • PGXN has not comprehensively indexed all public PostgreSQL extensions. While it indexes more extensions than any other registry, it falls far short of all known extensions. To be a truly canonical registry, we need to make it as simple as possible for developers to register their extensions. (More thoughts on that topic in a forthcoming post.)

  • In that vein, releasing extensions is largely a manual process. The pgxn-tools Docker image has improved the situation, allowing developers to create relatively simple GitHub workflows to automatically test and release extensions. Still, it requires intention and work by extension developers. The more seamless we can make publishing extensions the better. (More thoughts on that topic in a forthcoming post.)

  • It’s written in Perl, and therefore doesn’t feel modern or easily accessible to other developers. It’s also a challenge to build and distribute the Perl services, though Docker images could mitigate this issue. Adopting a modern compiled language like Go or Rust might increase community credibility and attract more contributions.

  • Similarly, pgxnclient is written in Python and the pgxn-utils developer tools in Ruby, increasing the universe of knowledge and skill required for developers to maintain all the tools. They’re also more difficult to distribute than compiled tools would be. Modern cross-compilable languages like Go and Rust once again simplify distribution and are well-suited to building both web services and CLIs (but not, perhaps native UX applications — but then neither are dynamic languages like Ruby and Python).

  • The PGXN Search API uses the Apache Lucy search engine library, a project that retired in 2018. Moreover, the feature never worked very well, thanks to the decision to expose separate search indexes for different objects — and requiring the user to select which to search. People often can’t find what they need because the selected index doesn’t contain it. Worse, the default index on the site is “Documentation”, on the surface a good choice. But most extensions include no documentation other than the README, which appears in the “Distribution” index, not “Documentation”. Fundamentally the search API and UX needs to be completely re-architected and -implemented.

  • PGXN uses its own very simple identity management and basic authentication. It would be better to have tighter community identity, perhaps through the PostgreSQL community account.

Given these issues, should we continue building on PGXN, rewrite some or all of its components, or abandon it for new services. The answer may come as a natural result of designing the overall extension ecosystem architecture or from the motivations of community consensus. But perhaps not. In the end, we’ll need a clear answer to the question.

What are your thoughts? Hit us up in the #extensions channel on the Postgres Slack, or give me a holler on Mastodon or via email. We expect to start building in earnest in February, so now’s the time!

I’m a Postgres Extensions Tembonaut

Tembo Logo

New year, new job.

I’m pleased to announce that I started a new job on January 2 at Tembo, a fully-managed PostgreSQL developer platform. Tembo blogged the news, too.

I first heard from Tembo CTO Samay Sharma last summer, when he inquired about the status of PGXN, the PostgreSQL Extension Network, which I built in 2010–11. Tembo bundles extensions into Postgres stacks, which let developers quickly spin up Postgres clusters with tools and features optimized for specific use cases and workloads. The company therefore needs to provide a wide variety of easy-to-install and well-documented extensions to power those use cases. Could PGXN play a role?

I’ve tended to PGXN’s maintenance for the last fourteen years, and thanks in no small part to hosting provided by depesz. As of today’s stats it distributes 376 extensions on behalf of 419 developers. PGXN has been a moderate success, but Samay asked how we could collaborate to build on its precedent to improve the extensions ecosystem overall.

It quickly became apparent that we share a vision for what that ecosystem could become, including:

  • Establishing the canonical Postgres community index of extensions, something PGXN has yet to achieve
  • Improving metadata standards to enable new patterns, such as automated binary packaging
  • Working with the Postgres community to establish documentation standards that encourage developers to provide comprehensive extension docs
  • Designing and building developer tools that empower more developers to build, test, distribute, and maintain extensions

Over the the past decade I’ve have many ideas and discussion on these topics, but seldom had the bandwidth to work on them. In the last couple years I’ve enabled TLS and improved the site display, increased password security, and added a notification queue with hooks that post to both Twitter (RIP @pgxn) and Mastodon (@pgxn@botsin.space). Otherwise, aside from keeping the site going, periodically improving new accounts, and eyeing the latest releases, I’ve had little bandwidth for PGXN or the broader extension ecosystem.

Now, thanks to the vision and strategy of Samay and Tembo CEO Ry Walker, I will focus on these projects full time. The Tembo team have already helped me enumerate the extension ecosystem jobs to be done and the tools required to do them. This week I’ll submit it to collaborators from across the Postgres community1 to fill in the missing parts, make adjustments and improvements, and work up a project plan.

The work also entails determining the degree to which PGXN and other extension registries (e.g., dbdev, trunk, pgxman, pgpm (WIP), etc.) will play a role or provide inspiration, what bits should be adopted, rewritten, or discarded.2 Our goal is to build the foundations for a community-owned extensions ecosystem that people care about and will happily adopt and contribute to.

I’m thrilled to return to this problem space, re-up my participation in the PostgreSQL community, and work with great people to build out the extensions ecosystem for future.

Want to help out or just follow along? Join the #extensions channel on the Postgres Slack. See you there.


  1. Tembo was not the only company whose representatives have reached out in the past year to talk about PGXN and improving extensions. I’ve also had conversations with Supabase, Omnigres, Hydra, and others. ↩︎

  2. Never be afraid to kill your darlings↩︎

Times Up

December 22, 2023 was my last day at The New York Times. My tenure was just under two and a half years.

My first 19 months at the company were pretty much everything I had hoped, as I collaborated with Design and Product to design a distributed platform and conceived, designed, and implemented CipherDoc, a service for encrypted data management. I’m incredibly proud of that work!

But alas, plans change. After the company mothballed the project, I refocused my time on glue work: re-platforming services, upgrading runtimes and dependencies, improving logging and observability, and documenting code, architectures, and playbooks. It felt good to reduce the onboarding, maintenance, and on-call overhead for my teams; I hope it helps them to be more productive and fulfilled in their work.

I treasure the terrific people I met at The Times, just super thoughtful, empathetic, and creative co-workers, partners, and colleagues. It’s no wonder they had the energy to form a union, The Times Tech Guild, for which I’m gratified to have helped organize and steward members. The Guild connected me with a far broader range of talented and committed people than I would have otherwise. I will miss them all, and continue to cheer for and support the movement from the outside.

And now, as 2023 winds down, I’ve decided to try something new. More news in the new year!