PostgreSQL Table Access Method for delta-compressed versioned data
A PostgreSQL Table Access Method (TAM) extension for storing versioned data with automatic delta compression. Built on the xpatch delta encoding library.
Store versioned rows (document revisions, config history, audit logs, etc.) with massive space savings. Compression is automatic and transparent - you just INSERT and SELECT normally.
-- Create a table using the xpatch access method
CREATE TABLE documents (
doc_id INT,
version INT,
content TEXT
) USING xpatch;
-- Configure grouping and ordering (optional - auto-detection works for most cases)
SELECT xpatch.configure('documents',
group_by => 'doc_id',
order_by => 'version'
);
-- Insert versions normally
INSERT INTO documents VALUES (1, 1, 'Hello World');
INSERT INTO documents VALUES (1, 2, 'Hello PostgreSQL World!');
INSERT INTO documents VALUES (1, 3, 'Hello PostgreSQL World! Updated.');
-- Query normally - reconstruction is automatic
SELECT * FROM documents WHERE doc_id = 1 ORDER BY version;
-- Check compression stats
SELECT * FROM xpatch_stats('documents');
_xp_seq column is visible in SELECT * (PostgreSQL limitation)-- Describe a table: shows config, schema, and storage stats
SELECT * FROM xpatch.describe('documents');
-- Warm the cache for faster subsequent queries
SELECT * FROM xpatch.warm_cache('documents');
SELECT * FROM xpatch.warm_cache('documents', max_groups => 100);
-- Get compression statistics
SELECT * FROM xpatch_stats('documents');
-- Inspect internal storage for a specific group (debugging/analysis)
SELECT * FROM xpatch_inspect('documents', 1); -- group_value = 1
-- Get cache statistics (requires shared_preload_libraries)
SELECT * FROM xpatch_cache_stats();
-- Get xpatch library version
SELECT xpatch_version();
-- Dump all table configs as SQL (for backup/migration)
SELECT * FROM xpatch.dump_configs();
-- Fix config OIDs after pg_restore
SELECT xpatch.fix_restored_configs();
Space savings depend heavily on your data patterns. Here are real benchmarks with 5000 rows (100 documents x 50 versions each):
| Data Pattern | xpatch Size | heap Size | Space Saved |
|---|---|---|---|
| Incremental changes (base content + small additions) | 416 KB | 9.2 MB | 95% smaller |
| Identical content across versions | 488 KB | 648 KB | 25% smaller |
| Completely random data | 816 KB | 728 KB | 12% larger (overhead) |
Key insight: xpatch shines when versions share content. For typical document versioning (where each version is similar to the previous), expect 10-20x space savings. For random/unrelated data, xpatch adds overhead and provides no benefit.
Important: These benchmarks are rough indicators, not precise measurements. Your mileage will vary based on hardware, data patterns, cache state, and query complexity.
Benchmark setup: 10,100 rows (101 documents x 100 versions), incremental text data. xpatch: 776 KB, heap: 17 MB.
| Operation | xpatch | heap | Slowdown |
|---|---|---|---|
| Full table COUNT | |||
| - Cold cache | 44ms | 2.6ms | 17x slower |
| - Warm cache | 20ms | 1.4ms | 14x slower |
| Point lookup (single doc, 100 rows) | 0.7ms | 0.05ms | 14x slower |
| Point lookup (single row) | 0.13ms | 0.02ms | 6x slower |
| GROUP BY aggregate | 27ms | 3ms | 9x slower |
| Latest version per doc | 28ms | 5.5ms | 5x slower |
| Text search (LIKE) | 3.4ms | 1.5ms | 2x slower |
| INSERT (100 rows) | 33ms | 0.3ms | 100x slower |
| Parallel scan (2 workers) | 31ms | 3.5ms | 9x slower |
Key observations:
A note on write performance:
"100x slower" sounds alarming, but look at the absolute numbers: 33ms for 100 rows is ~0.33ms per row. For versioned document storage, audit logs, or real-time collaboration (saving state every few seconds), sub-millisecond writes are more than fast enough.
Also remember that writes parallelize across groups. If you have 50 users editing 50 different documents, all 50 writes happen concurrently. Sequential writes only apply within a single group's version chain—which is inherent to delta compression, not a limitation.
When to use xpatch:
When NOT to use xpatch:
The shared memory cache dramatically improves read performance for repeated access:
-- First query (cold): ~35ms for 5000 rows
SELECT COUNT(*) FROM documents;
-- Second query (warm): ~1ms for 5000 rows
SELECT COUNT(*) FROM documents;
To enable the shared memory cache, add to postgresql.conf:
shared_preload_libraries = 'pg_xpatch'
To test pg-xpatch against a real versioning system, we stored the complete file history of the tokio repository (35,827 file versions across 2,641 files, 296 MB raw content) and compared storage size and random access speed against git's packfile format.
Setup:
group_by=path, compress_depth=1000, keyframe_every=100, enable_zstd=trueStorage comparison (total on-disk size):
| Storage Method | Total Size | What's Included |
|---|---|---|
| Raw file content | 296 MB | Just the file blobs |
| Git (normal bare clone) | 18 MB | Packfile + commit/tree objects + refs |
Git (gc --aggressive) |
11 MB | Same, with aggressive packing |
| pg-xpatch | 12 MB | Table + TOAST + indexes + per-row metadata |
Git with aggressive packing is slightly smaller (11 MB vs 12 MB), but pg-xpatch provides full SQL queryability over the data. Both systems store more than just file content — git stores commit/tree objects, pg-xpatch stores commit hashes, timestamps, messages, and indexes.
Internal delta compression: xpatch's delta engine achieves 42x compression on the content columns alone (296 MB → 7 MB), but the total relation size (12 MB) includes non-delta columns and index overhead.
Random access performance (100 random point lookups, warmed cache):
| Method | Median | p95 | Max |
|---|---|---|---|
| pg-xpatch | 0.20 ms | < 1 ms | 22 ms |
git (git show) |
1.32 ms | 1.47 ms | 1.81 ms |
pg-xpatch is ~6.5x faster on median, with 95% of queries under 1 ms. Git has more consistent latency (tight 1-2 ms range), while pg-xpatch has occasional tail latency spikes from cold chain reconstructions.
Insert and engine performance:
The delta engine performs up to 22,000 base comparisons per second (~45μs per comparison). Each insert compares against up to compress_depth previous versions to find the best delta base, so practical insert speed depends on how many comparisons are needed. With group_by, chains are short (avg 13.57 versions here), meaning fewer comparisons per insert and higher row throughput — up to 336 rows/s for this dataset.
Parallel writes: Since groups are independent, multiple groups can be written concurrently from separate connections with no performance degradation.
# Run PostgreSQL with pg-xpatch pre-installed
docker run -d --name pg-xpatch \
-p 5432:5432 \
-e POSTGRES_PASSWORD=secret \
ghcr.io/imgajeed76/pg-xpatch:latest
# Connect and enable the extension
psql -h localhost -U postgres -c "CREATE EXTENSION pg_xpatch;"
Download from GitHub Releases:
# Download and extract (replace VERSION with actual version, e.g., v0.1.1)
tar -xzf pg_xpatch-VERSION-pg16-linux-amd64.tar.gz
cd pg_xpatch-VERSION-pg16-linux-amd64
# Install
sudo cp pg_xpatch.so $(pg_config --pkglibdir)/
sudo cp pg_xpatch.control *.sql $(pg_config --sharedir)/extension/
# (Optional) Enable shared memory cache - add to postgresql.conf:
# shared_preload_libraries = 'pg_xpatch'
# Restart PostgreSQL, then:
psql -c "CREATE EXTENSION pg_xpatch;"
Requirements:
cargo install cbindgen)git clone https://github.com/ImGajeed76/pg-xpatch
cd pg-xpatch
make clean && make && make install
psql -c "CREATE EXTENSION pg_xpatch;"
A pre-configured Docker environment is available in .devcontainer/:
# Build and run
docker build -t pg-xpatch-dev .devcontainer/
docker run -d --name pg-xpatch-dev -v $(pwd):/workspace pg-xpatch-dev
# Build and test inside container
docker exec pg-xpatch-dev bash -c "cd /workspace && make && make install"
docker exec -u postgres pg-xpatch-dev psql -c "CREATE EXTENSION pg_xpatch;"
For most tables, xpatch auto-detects the configuration:
_xp_seqSELECT xpatch.configure('my_table',
group_by => 'doc_id', -- Column that groups versions (optional)
order_by => 'version', -- Column that orders versions
delta_columns => ARRAY['content', 'metadata']::text[], -- Columns to compress
keyframe_every => 100, -- Full snapshot every N versions (default: 100)
compress_depth => 1, -- How many previous versions to consider (default: 1)
enable_zstd => true -- Enable zstd compression (default: true)
);
-- Full table description
SELECT * FROM xpatch.describe('my_table');
-- Just the config
SELECT * FROM xpatch.get_config('my_table');
group_by column (e.g., document ID)order_by column (e.g., version number)When you SELECT a delta-compressed row:
xpatch automatically adds an _xp_seq column to track sequence numbers. This column:
CREATE TABLE ... USING xpatchSELECT * (PostgreSQL doesn't support truly hidden columns)SELECT doc_id, version, content FROM ...xpatch automatically creates indexes for efficient lookups:
_xp_seq index on table creation(group_by, _xp_seq) index when group_by is configured# Run all tests (20 test files)
# First create the test database and extension
createdb xpatch_test
psql -d xpatch_test -c "CREATE EXTENSION pg_xpatch;"
# Then run all test files
for f in test/sql/*.sql; do
psql -d xpatch_test -f "$f"
done
# Or use the test runner
./test/run_tests.sh run
The basic test suite covers:
In addition to the basic test suite, pg-xpatch has been validated with 240 comprehensive tests covering production scenarios. These tests are not yet included in the repository but document the testing that was performed.
| Category | Tests | Description |
|---|---|---|
| Data Types | 41 | TEXT, BYTEA, JSON, JSONB, Unicode, binary data, empty strings, very large content (100MB+) |
| Transactions | 12 | Commit, rollback, savepoints, nested transactions, isolation levels (READ COMMITTED, REPEATABLE READ, SERIALIZABLE) |
| SQL Compatibility | 25 | JOINs, CTEs, window functions, aggregates, subqueries, UNION/INTERSECT, EXPLAIN, prepared statements |
| Concurrency | 10 | Parallel inserts (different/same groups), concurrent read/write, connection storms, long vs short transactions |
| Crash Recovery | 4 | Kill -9 survival, data integrity after crash, repeated crash cycles |
| Adversarial Inputs | 54 | SQL injection (8 patterns), integer overflow/underflow, invalid UTF-8, malformed data, resource exhaustion |
| Edge Cases | 57 | Views, triggers, foreign keys, cursors, indexes, RETURNING clause, COPY operations, locking |
| Backup/Restore | 18 | pg_dump/pg_restore cycle, data integrity verification, extension upgrade simulation |
_xp_seq visible: PostgreSQL doesn't support hidden columnsThese issues exist in the current implementation and may be addressed in future versions:
xpatch_tuple_satisfies_snapshot() function uses proper MVCC checks for buffer-backed tuples, but trusts that virtual tuples (created during delta reconstruction) were built from visible source tuples. This is correct behavior but means visibility is checked at reconstruction time, not query time.These issues are documented for transparency. For typical workloads (versioned document storage, audit logs), they don't cause problems.
Thoroughly tested on PostgreSQL 16 with 240+ test cases. Other versions may work but are not officially supported.
pg-xpatch is dual-licensed: AGPL-3.0-or-later for open source, with a commercial option for proprietary use.
I'm a huge fan of open source. I also don't want massive corporations extracting value from community work without giving anything back. AGPL solves this - if you modify pg-xpatch and distribute it (including running it as a service), those improvements stay open.
That said, I'm not trying to build a licensing business here. This is about fairness, not revenue.
Probably not if you're:
Maybe if you're:
Email me at xpatch-commercial@alias.oseifert.ch and let's talk.
Small businesses? Probably free - I just want to know who's using it and how.
Larger companies? Yeah, I'll ask for something, but it'll be reasonable. You have the resources to support open source work, so let's make it fair.
Would rather contribute code than pay? Even better. Help make pg-xpatch better and we'll figure out the licensing stuff.
I'm not interested in complex contracts or pricing games. Just don't be a massive corp that takes community work and gives nothing back. That's literally the only thing I'm trying to prevent.
If you contribute code, you're granting us rights to use it under both AGPL and commercial terms. This sounds scarier than it is - it just means we can handle licensing requests without tracking down every contributor for permission.
The AGPL version stays open forever. This just gives us flexibility to be reasonable with companies that need commercial licenses.
See LICENSE-AGPL.txt for the full text, or LICENSE-COMMERCIAL.txt for commercial terms.
Contributions are welcome! Please open an issue or pull request on GitHub.
Before submitting:
test/sql/*.sql)test/sql/*.sql