Contents
PLAN_MULTI_DATABASE.md — Multi-Database Support
Status: Draft
Target version: Post-1.0 (Scale S3)
Author: pg_trickle project
1. Current Constraint
pg_trickle is locked to operating inside a single PostgreSQL database (the
one where CREATE EXTENSION pg_trickle was run). This is a fundamental
architectural constraint, not a missing feature.
Root causes:
| Constraint | Why it matters |
|---|---|
Spi::connect() executes in the extension’s current database |
Cross-database SPI is impossible in PostgreSQL |
Background worker registers at _PG_init() time |
One BGW per _PG_init call — each database that loads pg_trickle gets its own BGW slot |
pg_shmem_init!() allocates fixed shared memory |
The slot count is fixed at server start; must accommodate N databases |
Trigger catalog (pg_trigger) |
Triggers target tables in the current database only |
Change buffer schema (pgtrickle_changes) |
Lives in the extension’s database |
The multi-database constraint is therefore not a single code change but a set of orthogonal problems that interact.
2. Option Analysis
Option A: Per-database background worker (current approach, scaled)
How it works today: Each database that installs pg_trickle gets one BGW slot
when the server loads shared_preload_libraries = 'pg_trickle'. The BGW
connects to its own database.
Extension: This already gives multi-database isolation for free — each
database is independent. The constraint is that max_worker_processes limits
total BGW count across the whole server.
Cost: Low — no code change needed for basic multi-database use.
Limitation: Each database consumes one BGW slot. 10 databases = 10 slots.
Verdict: This is the correct first approach. Document it and test it.
Option B: Cross-database dblink polling
A single “coordinator” BGW polls other databases via dblink. Streaming tables
in “secondary” databases register with the coordinator.
Cost: Very high — requires dblink connectivity, cross-DB transaction
coordination, and a coordinator catalog.
Limitation: dblink adds latency and complexity; foreign transactions are
not atomic with local triggers.
Verdict: Not recommended.
Option C: External orchestrator (see REPORT_PARALLELIZATION.md §D)
A process outside PostgreSQL manages refresh scheduling across databases, calling the refresh functions via libpq.
Cost: High — requires the external process, service discovery, and
connection management.
Verdict: Viable for 100+ databases at scale; overkill for typical use.
Option D: PostgreSQL logical replication + Subscriber
Stream changes from source databases to a central “hub” database using logical replication. pg_trickle operates only in the hub.
Cost: High — additional replication slots, latency, schema synchronization.
Verdict: Viable as a separate architectural pattern, not an extension feature.
3. Schema Isolation Requirements
Each database that installs pg_trickle has:
<database>/
pgtrickle.pgt_stream_tables -- catalog
pgtrickle_changes.changes_<oid> -- change buffers
pgtrickle.<function> -- SQL API
No cross-database state sharing is needed or desired. The per-database BGW model (Option A) already satisfies this.
4. Shared Memory Considerations
pg_shmem_init!() allocates a fixed PgLwLock and PgAtomic block at
server start. If multiple databases load pg_trickle, each gets its own shmem
block because each _PG_init() call is isolated per-database. Confirm this
with a two-database smoke test.
5. Recommended Implementation Phases
Phase 1 (v0.3.0): Test and document multi-database
- Add a multi-database E2E test: two databases on the same PG server, each
with
pg_trickleinstalled, verify independent BGWs. - Document memory and BGW slot consumption per database.
- Expose
pg_trickle.enabledper-database to allow disabling in low-priority DBs.
Phase 2 (Post-1.0): BGW pool sizing guidance
- Add a GUC
pg_trickle.max_databases(informational; actual limit ismax_worker_processes). - Add a monitoring view
pgtrickle.server_worker_inventoryshowing all registered BGWs and their databases.
Phase 3 (Post-1.0): External orchestrator integration
- Document the
pgtrickle.manual_refresh(stream_table_name)pattern for external callers. - See REPORT_PARALLELIZATION.md §D.
6. Interaction with CNPG
CNPG clusters can host multiple PostgreSQL databases. The per-database BGW
model works transparently — each DB that has pg_trickle installed will have
an active BGW. The CNPG cluster-example.yaml should document the
max_worker_processes guidance.
See PLAN_CITUS.md for Citus (distributed PG) considerations, which has similar but more complex multi-node constraints.