Update 2024-03-22: To compliment high-level discussions on the #extensions
channel on the Postgres Slack, I also created a pull request to make it
easer directly comment to this post point by point. Keep the feedback coming,
and thank you!
Update 2024-03-25: Clarified the definition of “Release” and made
“Maintainers” plural. Thanks to Matthias van de Meent for the suggestions!
This post proposes a new metadata standard for extension packaging,
distribution, and delivery, building on the PGXN Meta Spec to address its
shortcomings and emerging use cases 12 years on. The goals include:
- Don’t break the existing standard, but add to it
- Add explicit support for different types of Postgres extensions,
such as background workers and loadable modules
- Add additional metadata for curation and automated binary compilation
- Add hooks for registry and third-party-contributed metadata, including
binary distribution options and stats & reports.
There are two high level categories of metadata, split into two separate
files:
- Package Metadata provided by extension package
authors
- Registry Metadata aggregated by the root registry
from various sources, including data derived from the extension source
code or package metadata, but also trusted third parties such as
packagers, smoke testers, security scanners, and more.
Following community discussion of this proposal, the Package
Metadata will lead to a draft for PGXN Meta Spec version
2.0.0, while the Registry Metadata will guide the design
and implementation of the Root Registry APIs required to provide it.
Terms
Definition of terms as used in this document, because every language and
distribution system uses terms differently. These are ours.
-
Extension: a software component that extends the capabilities of a
PostgreSQL database or cluster. Extensions may be CREATE EXTENSION
extensions, background workers, command-line apps, loadable modules,
shared libraries, and more.
-
Package: A collection of extensions that are released, versioned, and
distributed together. Packages may be downloaded directly from version
control repositories or in archive files generated by a release tag.
-
Package Path: Identifies a package, declared in the package
metadata file. A package path should describe both
what the package does and where to find it. Typically, a package path
consists of a repository root path — the directory that contains the
metadata file — and a directory within the repository.
-
Repository: The version control repository where the package is
developed, referenced as package path or URL.
-
Repository root path: The portion of the package path that corresponds
to the root directory of the version control repository where the package
is developed, and which contains the metadata file. Most package are
defined in their repository’s root directory, so this is usually the
entire path. For example, github.com/example/pgtap
is the repository
root path for the package of the same name.
-
Source Distribution: The contents of a single package bundled together
with package metadata into distributable archive
file, usually named with the last part of the package path or the main
extension, a dash, and the version, e.g., pgtap-1.14.3.zip
.
-
Binary Distribution: A binary software distribution of a package,
compiled and packaged for a particular OS version, platform, and hardware
architecture. Examples include deb, RPM, and wheel.
-
Release: A single version of the package made available to the public on
PGXN, expressed as the package path, an at sign, and the semver. Example:
github.com/theory/pgtap@v1.14.3
.
The extension author provides this data as part of a release submitted to the
root registry. The registry itself will not change this data in any way, but
provide it as-is.
Essentials
Essential information about the package itself, including its path, name,
version, list of authors, license, etc. Ecosystem applications use this data
for indexing, naming conventions, and display information.
Fields (all required):
- Package: A package path uniquely identifying the package (usually the
source repo host name and path; see this RFC)
- Name: The name of the extension package, usually the same as the main
extension, not unique but conflicts are resolved by escalation to
package
- Version: The version of the package, expressed as a SemVer
- Abstract: A short description of the purpose of the package
- Description: A longer description of the package, answering the
question “what is this thing and what value is it?”
- Maintainers: List of maintainers, each an object with
name
and either
email
or url
(or both)
- License: An SPDX 2.3 license expression comprised of one or more
licenses from the SPDX License List
- Spec: Information about the metadata spec itself, a version and URL
Example:
{
"package": "github.com/example/pg-pair",
"name": "pair",
"version": "1.1.0",
"abstract": "A key/value pair data type",
"description": "Adds a key/value pair data type to Postgres, useful for specifying arbitrary key/value function parameters.",
"maintainers": [
{
"name": "Naomi Nagata",
"email": "naomi@example.com",
"url": "https://example.com/roci/naomi"
}
],
"license": "PostgreSQL OR MIT",
"spec": {
"version": "2.0.0",
"url": "https://pgxn.org/meta/spec.txt"
}
}
Other possibilities:
- License File: An alternative to
license
for nonstandard licenses
(from cargo)
- Contributors: List of people who have contributed to the package
- Emeriti: List of former maintainers and contributors
PGXN Compatibility:
- Overloads the string representation of the spec v1 field, but it should be
able to parse either one, especially with the spec version to
differentiate.
- Uniqueness of
package
and relationship to name
- PGXN has
maintainer
; is it really worth pluralizing it? Maybe we don’t
need it at all, given repository and issue links.
Artifacts
A list of links and checksums for downloading the extension package in one or
more formats, including source code, binaries, system packages, and more. Apps
use this information (along with registry metadata) to
determine the best option for installing an extension on a particular system.
This information is optional, and provided in addition to the source code
available from the root registry itself. The idea is that, on release, the
release manager creates artifacts in one or more locations (such as GitHub
releases), and it’s useful to link to them from the registry. Binary builds
may be unlikely today, but leave the option open for now by allowing arbitrary
text in the type
field.
Each item must have:
- Type: The type of artifact: Source, binary, RPM, etc.
- URL: Link the the artifact
- SHA: A SHA checksum
Each URL must properly resolve and the checksum must match.
[
{
"type": "source",
"url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0.zip",
"sha256": "2b9d2416096d2930be51e5332b70bcd97846947777a93e4a3d65fe1b5fd7b004"
},
{
"type": "binary",
"url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0-linux-amd64.tar.gz",
"sha1": "12d9bc5cfb6bc3c453627eac69511f48be63cfc0"
},
{
"type": "binary",
"url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0-linux-arm64.tar.gz",
"sha1": "787dc39137f7d1510a33ab0a1b8905cd5f3f72d1"
}
]
PGXN Compatibility:
New object, no conflicts.
Resources
External information about the package, mostly links, including source code
repository, bug reporting, documentation, badges, funding, etc. Apps use this
data for links, of course, but also full text indexing, documentation
rendering, and displaying useful information about the extension.
Fields:
- Homepage: URL for the package’s home page
- Issues: URL to issues
- Documentation: URL to documentation
- Support: URL to support
- Repository: Source code repository URL
- Badges: Array of badge link objects
- Funding: Array of funding objects
{
"homepage": "https://pair.example.com",
"issues": "https://github.com/example/pair/issues",
"documentation": "https://pair.example.com/docs",
"support": "https://github.com/example/pair/discussions",
"repository": "https://github.com/example/pair",
"badges": [
{
"alt": "Test Status",
"src": "https://test.packages.postgresql.org/github.com/example/pair.svg"
}
],
"funding": [
{
"type" : "individual",
"url" : "http://example.com/donate"
},
{
"type" : "patreon",
"url" : "https://www.patreon.com/example"
}
]
}
PGXN Compatibility:
issues
replaces bugtracker
and is just a URL. Can support either
repository
is just a URL instead of an object. Can support either
but with the package
key referencing the repository path perhaps
we don’t need to change it.
Contents
A description of what’s included in the package. Often a package consists of
multiple extensions, such as PostGIS, which includes postgis
,
postgis_tiger_geocoder
, address_standardizer
, and more.
Furthermore, some extensions are not CREATE EXTENSION
-type extensions at
all, such as background workers, command-line apps, loadable modules,
shared libraries, and more. Each should be listed along with documentation
links where they differ from the package overall (or are simply more
specific).
Fields:
- Extensions: Object describing
CREATE EXTENSION
extensions. Keys
are extension names, fields are objects with the fields sql
, doc
,
abstract
, control
(same as control file, but can use the file itself),
tle
(meaning it can be used as a trusted language extension), and
preload
(to indicate whether its libraries need to be loaded in advance
via shared_preload_libraries
, session_preload_libraries
, or
local_preload_libraries
).
- Workers: Object describing background workers. Keys are worker names,
fields are objects with the fields
bin
, doc
, and abstract
.
- Apps: Object describing applications, command-line or otherwise. Keys
are app names, fields are objects with the fields
bin
, doc
, and
abstract
.
- Modules: Object describing loadable modules that can be loaded into
Postgres (not necessary for extensions that include libraries). Keys are
module names, fields are objects with the fields
lib
(without file
extension), doc
, abstract
, and preload
.
- Libraries: Other libraries that may ship in the package and need to be
installed but are not loadable modules, such as a dynamic library used
by an app. Keys are library names, fields are objects with the fields
lib
(without file extension) or dir
(for a directory of files), doc
,
and abstract
.
{
"extensions": {
"pair": {
"sql": "sql/pair.sql",
"doc": "doc/pair.md",
"abstract": "A key/value pair data type",
"preload": "session",
"tle": true,
"control": {
"directory": "pair",
"module_pathname": "$libdir/pair",
"relocatable": true
}
}
},
"workers": {
"pair_pruner": {
"bin": "bin/pair_pruner",
"doc": "doc/pair_pruner.md",
"abstract": "A worker to periodically prune pairs"
}
},
"apps": {
"pair_rand": {
"bin": "bin/pair_rand",
"doc": "doc/pair_rand.md",
"abstract": "Command to generate random pairs of strings"
}
},
"modules": {
"lib_pair": {
"lib": "lib/lib_pair",
"doc": "doc/lib_pair.md",
"abstract": "A library hooking function calls to convert pairs to named parameters",
"load": "shared_preload_libraries"
}
},
"libraries": {
"ruby_pair": {
"dir": "lib/gems",
"abstract": "Ruby libraries required to run the extension"
}
}
}
Other options:
- Allow or require an array of docs?
- Specify support for specific doc formats (markdown, manpage, plain text,
MkDocs, etc.)?
- Support glob patterns?
- Add libraries as sub-key for workers and apps, e.g. for a Ruby app that
requires a directory of gems?
PGXN Compatibility:
Significantly different from the existing provides
object, so key this one
under contents
or manifest
and continue to support both.
Dependencies
A list of external dependencies required to configure, build, test, install,
and run the extensions in the package. These include not only other extension
packages, but also external libraries and system dependencies. And let’s not
forget the versions of Postgres required, as well as any OS and version
dependencies (e.g, does it work on Windows? FreeBSD? What versions?) and
architectures (arm64, amd64, etc.)
Fields:
-
platforms: Platforms supported by the package, defined as OS and
architecture. Assume all platforms if none are listed.
-
postgres: Versions of PostgreSQL required by the package, specified as
a version range, as well as compiled features (--with-*
configure
options)
-
pipeline: The build pipeline required to build the extension. Options:
pgxs
, meson
(?) pgrx
, gnu
(./configure
), gem
, cpan
, pip
,
go
, etc.
-
prereqs: Object defining external dependencies required for different
phases:
- configure: Dependencies to configure the package (e.g., items
required for
./configure
or make
to work)
- build: Dependencies to build the package (in addition to run
dependencies)
- test: Dependencies to test the package (in addition to build and
run dependencies)
- run: Dependencies to run the package
- develop: Dependencies to develop the package (in addition to all
other phase dependencies)
Each points to an object with at least one of these keys:
- requires: Required to use the package
- recommends: Not required, but recommended as a best practice
- suggests: Not required, but nice to have
- conflicts: Package will not work with these items
These, in turn, contain at least one of these keys:
- package: Other packages, expressed by their package paths,
pointing to a version expression
- external: External dependencies not included with the package
- contrib: List of Postgres contrib or development packages
-
variations: A list of dependency variations, each with two fields:
- where: Specifies a configuration that, when matched, includes the
dependencies. For example, to specify alternate dependencies for Linux
arm64, it would be
"platforms": { "linux": "arm64" }
. For a specific
version of Postgres, it would be something like
"postgres": { "version": ">= 16, < 17" }
. [This likely needs some
deep thinking through.]
- dependencies: An object containing of the other key structures in
this object, defining the additional dependencies for environments
that match the
where
field.
The packages
and external
keys point to objects in which the keys are
dependency names and the values are version range expressions (empty string
means any version). The postgres.version
field is also a version range
expression.
The contrib
key points to an array of Postgres-supplied apps and extensions.
Versions are not necessary because they effectively have the same version as a
supported Postgres version.
Example:
{
"postgres": {
"version": ">= 12, < 17",
"with": [ "xml", "uuid", "perl" ]
},
"pipeline": "pgrx",
"platforms": {
"linux": [ "amd64", "arm64" ],
"darwin": [ "amd64", "arm64" ],
"windows": [ "amd64" ],
"freebsd": [ "amd64" ]
},
"dependencies": {
"configure": {
"requires": {
"external": { "cargo-pgrx": "" }
}
},
"build": {
"requires": {
"external": {
"curl": "",
"awk": "",
"sed": "",
"perl": "5.20"
}
},
"recommends": {
"external": {
"jq": "",
"perl": "5.38"
}
}
},
"test": {
"requires": {
"contrib": [ "pg_regress", "plpgsql", "plperl" ],
"packages": {
"github.com/theory/pgtap": "1.1.0"
}
}
},
"run": {
"requires": {
"contrib": [ "plperl" ],
"packages": {
"github.com/theory/hostname": ""
}
}
}
}
}
Notes:
The external
field is the tricky one. How does one provide a canonical
dependency for a third-party library or app that may have loads of different
names on different platforms, different packaging systems (apt vs rpm), and
even different versions of them, or split apart for development and runtime
use? Some possibilities:
- Use purl and let downstream services handle the conversion
- Create a rule set that allows one to predict dependency name changes
across ecosystems, similar to repology-rules
- Just use Repology?
Other notes:
- Decide on a version range standards to adopt. Start with the PGXN version
range, pep-0508, npm ranges, and Gem::Version.
- Is
pipeline
really necessary, given configure requirements? I think so,
because it tells the client the preferred build system to use, in case it
can’t detect it for some reason.
PGXN Compatibility:
Significantly differs from the existing prereqs
object, so this one is keyed
under dependencies
and we can support either.
Classification
Classification metadata lets the extension developer associate additional
information to improve discovery, such as key words. It might also allow
selections from a curated list of extension classifications, such as the
category slugs supported for the cargo categories field.
Ecosystem apps use this data to organize extensions under key words or
categories, making it easier for users to find extensions that are frequently
used together or for specific workloads or tasks.
Example:
{
"tags": [
"testing",
"pair",
"parameter"
],
"categories": [
"Machine Learning"
]
}
PGXN Compatibility:
Conflicts with the existing root-level tags
array, but we can support either
one.
Example: PGML Extension
A hand-coded example that attempts to capture all the complexity of the pgml
extension:
{
"package": "github.com/postgresml/postgresml/pgml-extension",
"name": "pgml",
"version": "2.8.2",
"abstract": "pgml: Created by the PostgresML team",
"maintainers": [
{
"name": "the PostgresML team",
"url": "https://github.com/postgresml/postgresml/"
}
],
"license": "MIT",
"spec": {
"version": "2.0.0",
"url": "https://pgxn.org/meta/spec.txt"
},
"artifacts": [
{
"type": "source zip",
"url": "https://github.com/postgresml/postgresml/archive/refs/tags/v2.8.2.zip",
"sha256": "2b9d2416096d2930be51e5332b70bcd97846947777a93e4a3d65fe1b5fd7b004"
},
{
"type": "source tgz",
"url": "https://github.com/postgresml/postgresml/archive/refs/tags/v2.8.2.tar.gz",
"sha256": "845f28339c6159ac32daccea1cd17b386ea083c3e60bb8d58fb737725afe7eb5"
}
],
"resources": {
"homepage": "https://postgresml.org/",
"issues": "https://github.com/postgresml/postgresml/issues",
"documentation": "https://postgresml.org/docs/",
"support": "https://discord.com/invite/DmyJP3qJ7U",
"repository": "https://github.com/postgresml/postgresml",
"badges": [
{
"alt": "Tests Passing",
"src": "https://github.com/postgresml/postgresml/actions/workflows/ci.yml/badge.svg"
},
{
"alt": "Chat Activity",
"src": "https://camo.githubusercontent.com/1988e7aaaa6a0c8ddcf880faacf1ec3263e23fac4aeff4710922082fde84442a/68747470733a2f2f696d672e736869656c64732e696f2f646973636f72642f31303133383638323433303336393330303939",
"link": "https://discord.gg/DmyJP3qJ7U"
}
]
},
"manifest": {
"extensions": {
"pgml": {
"sql": "sql/pgml--2.8.2.sql",
"preload": "shared",
"tle": false,
"control": {
"comment": "pgml: Created by the PostgresML team",
"default_version": "2.8.2",
"module_pathname": "$libdir/pgml",
"relocatable": false,
"superuser": true,
"schema": "pgml",
"trusted": true
}
}
}
},
"dependencies": {
"postgres": {
"version": ">= 15, < 16"
},
"pipeline": "pgxs",
"platforms": {
"linux": [
"amd64",
"arm64"
],
"darwin": [
"amd64",
"arm64"
]
},
"dependencies": {
"configure": {
"requires": {
"external": {
"cargo-pgrx": "",
"bison": "",
"cmake": "",
"flex": "",
"libclang-dev": "",
"libopenblas-dev": "",
"libpython3-dev": "",
"libreadline-dev": "",
"libssl-dev": "",
"pkg-config": ""
}
}
},
"run": {
"requires": {
"external": {
"libopenblas": "",
"libpython3": "",
"libreadline": "",
"libssl": "",
"python3": ""
}
},
"recommends": {
"external": {
"python(pyarrow)": "=11.0.0",
"python(catboost)": "",
"python(lightgbm)": "",
"python(torch)": "",
"python(torchaudio)": "",
"python(torchvision)": "",
"python(xgboost)": "",
"python(accelerate)": "",
"python(bitsandbytes)": "",
"python(ctransformers)": "",
"python(huggingface-hub)": "",
"python(deepspeed)": "",
"python(einops)": "",
"python(optimum)": "",
"python(peft)": "",
"python(tokenizers)": "",
"python(transformers)": "",
"python(transformers-stream-generator)": "",
"python(InstructorEmbedding)": "",
"python(sentence-transformers)": "",
"python(rouge)": "",
"python(sacrebleu)": "",
"python(sacremoses)": "",
"python(datasets)": "",
"python(orjson)": "",
"python(langchain)": ""
}
}
}
},
"variations": [
{
"where": {
"platforms": {
"linux": []
}
},
"dependencies": {
"prereqs": {
"run": {
"recommends": {
"external": {
"python(auto-gptq)": "",
"python(xformers)": ""
}
}
}
}
}
}
]
},
"classification": {
"tags": [
"machine learning",
"ml",
"transformers"
],
"categories": [
"Machine Learning"
]
}
}
Note that PostgresML also provides a dashboard app, but it does not ship with
the extension, so it’s not listed here. It could have its own package metadata
and be separately distributed.
For the Python dependencies, in the absence of a neutral standard for
specifying package, here they’re specified using a syntax borrowed from RPM to
make clear that they’re Python dependencies. A couple of those dependencies
are Linux-only, so listed in variations
where the OS is Linux.
The second metadata format is registry metadata. It starts by with a copy the
package metadata, but builds from there, adding
information (such as the artifact link, SHA, and cryptographic signature from
the registry itself) and sections (defined below).
Providing a separate metadata format enables broader community collaboration
to augment the metadata for extension packages.
Ideally an ecosystem of trusted services will be developed by various parties,
who will report back to the root registry as the source of record for all
metadata about extension packages. For example, a binary distribution service
like apt.postgresql.org or trunk will submit links and checksums for
binary packages and perhaps periodic download stats.
Broadly speaking, there are three typologies for registry metadata:
Distributions, Reports, and Classifications.
Distributions
A list of links and checksums for downloading binary distributions for the
extension package. The root registry page for the package can therefore list
all the ways to install an extension and expand the list as new binary
registries add it.
Along with the Artifacts package metadata, the distribution
metadata allows installer tools to select the best option to install based on
the OS, Postgres version, and cluster configuration (e.g., install the
community Apt package if the cluster was installed from the community Apt
registry).
Each item must have:
- Registry: The name of the distribution registry, unique
- Type: The type of distribution: Apt, RPM, TLE, etc.
- URL: Link to the the registry’s page for the distribution
- Command: In lieu of a URL and SHA it can provide instructions to
install the distribution, e.g.,
apt-get install postgresql16-pair
.
Each distribution registry must be trusted by the root registry to submit this
information to add to the metadata, and have its own page with additional
contextual information: its home page, docs, basic instructions, whatever. A
README of sorts.
Example:
[
{
"registry": "trunk.pgxn.org",
"type": "trunk",
"command": "pgxn install github.com/example/pair"
},
{
"registry": "apt.postgresql.org",
"type": "apt",
"command": "apt-get install postgresql16-pair"
},
{
"registry": "yum.postgresql.org",
"type": "rpm",
"command": "yum install postgresql16-pair"
},
{
"registry": "pgt.dev",
"type": "apt",
"url": "https://pgt.dev/extensions/pair",
"command": "trunk install pair"
},
{
"registry": "pgxman.com",
"type": "apt",
"url": "https://pgxman.com/x/pair",
"command": "pgxman install pair"
},
{
"registry": "database.dev",
"type": "sql",
"url": "https://database.dev/example/pair",
"command": "SELECT dbdev.install('example-pair');"
},
]
Other Options:
- Information about installing from a registry that doesn’t offer URLs, like
Homebrew, where one just needs to know what the formula is called. Maybe
the same should apply to Apt?
Metrics and Reports
This object includes data such as number of downloads, build and test status
on various Postgres/OS/version combinations, binary packaging distributions,
test coverage, security certification, vulnerability detection, quality
metrics and user ratings, and more. Some of these bits of data might include
badges, e.g., for ratings or security assessments.
In the broader ecosystem, it would be the responsibility of the root registry
to ensure that data comes only from trusted sources. However this data will be
publicly readable, allowing any downstream applications to extend and
publish metrics and reports with their own information.
Example:
{
"aggregates": {
"downloads": 32
},
"sources": {
"smoke-tester.example.com": {
"stats": {
"passes": 32,
"fails": 8,
"unknown": 0
},
"link": "https://smoke-tester.example.com/extension/pair",
"badge": "https://smoke-tester.example.com/extension/pair.svg"
},
"pgt.dev": {
"downloads": 12,
"tags": ["two", "kv"],
"categories": ["Analytics"]
},
"pgxman.com": {
"downloads": 20
},
"ratings.example.com": {
"stats": {
"ratings": 324,
"average": 3.2,
"comments": 13
},
"link": "https://ratings.example.com/extension/pair",
"badge": "https://ratings.example.com/extension/pair.svg"
},
"stats.example.com": {
"contributors": 2,
"stars": 14,
"last_active": "2024-01-12",
},
"security-scan.example.com": {
"advisories": {
"high": 2,
"moderate": 11,
"low": 1,
"cves": 0
},
"link": "https://security-scan.example.com/extension/pair",
"badge": "https://security-scan.example.com/extension/pair.svg"
}
}
}
Notes:
-
The aggregates
section aggregates results from multiple sources, for
example summing all downloads or averaging ratings. The list of items to
aggregate could evolve regularly.
-
Each key in sources
identifies a trusted downstream source of
information. Each would have its own schema describing its objects and
their meaning, along with URI templates to link to. For example,
stats.example.com
might have these templates:
{
"contributors": {
"alt": "List of people who have contributed to the {name} project",
"badge": "https://stats.example.com/{ package }/contributors.svg",
"uri": "https://stats.example.com/{ package }/contributors"
},
"stars": {
"alt": "Star ratings for the {name} project",
"badge": "https://stats.example.com/{ package }/stars.svg",
"uri": "https://stats.example.com/{ package }/stars"
}
}
-
Might be worth adopting more of an API of multiple files, rather than one big file.
Example: Homebrew analytics. Note also its support for date ranges for analytics
(30/90/365 days, stats when directly installed vs. installed as a dependency).