Blue-Green Deployment for pg_trickle

Type: REPORT · Status: Exploration · Date: 2026-03-03

Problem Statement

Propagating changes through a large DAG of stream tables can take significant time — especially initial FULL refreshes on wide or high-cardinality tables. When users need to modify a pipeline (new query logic, schema changes, operator upgrades, schedule tuning), the existing pipeline must keep serving stale-but-consistent data while the replacement pipeline catches up in the background. Once the new pipeline is current, a hot swap redirects consumers transparently.

This is analogous to blue-green deployment in application servers, applied to incremental view maintenance pipelines.

Use Cases

  1. Query logic change — rewrite a defining query (e.g., add a new join or aggregation) without downtime.
  2. Schema evolution — upstream ALTER TABLE requires rebuilding downstream STs; keep the old pipeline serving while the new one reinitializes.
  3. Refresh mode migration — switch from FULL to DIFFERENTIAL (or vice versa) with a safety net.
  4. DAG restructuring — split, merge, or reorder stream tables in the dependency graph.
  5. Extension upgrade — new pg_trickle version with different DVM operators needs pipeline rebuild.

Design Goals

Goal Priority
Zero (or near-zero) read downtime during swap Must-have
Works for individual STs and whole pipelines Must-have
Automatic convergence detection + manual override Must-have
Rollback capability Should-have
No breaking changes to existing catalog schema Should-have
Minimal scheduler overhead during dual-pipeline phase Nice-to-have

Current Architecture — What Already Supports Blue-Green

A thorough analysis of the existing internals reveals that pg_trickle is already largely blue-green friendly at the infrastructure level.

CDC Sharing (Ready)

Triggers and change buffer tables are keyed by source OID, not by stream table ID. The pgtrickle.pgt_change_tracking.tracked_by_pgt_ids array supports multiple consumers. Both the blue (existing) and green (new) STs can consume from the same change buffers simultaneously with independent frontiers.

  • Trigger naming: pg_trickle_cdc_{source_oid} — one trigger per source regardless of how many STs depend on it.
  • Buffer table: pgtrickle_changes.changes_{source_oid} — shared.
  • Deferred cleanup uses MIN(frontier) across all consumers, so green’s presence prevents premature deletion of rows it hasn’t consumed yet.

Independent Frontiers (Ready)

Each ST has its own frontier JSONB column in pgt_stream_tables tracking per-source LSN watermarks. A green ST with a different pgt_id maintains completely independent frontier state.

Internal Key Isolation (Ready)

  • MERGE template cache — keyed by pgt_id (thread-local HashMap).
  • Prepared statements — per-session, keyed by pgt_id.
  • Advisory locks — keyed by pgt_id.
  • Retry/backoff state — keyed by pgt_id.

Two versions of the same logical ST with different pgt_id values will not collide in any internal data structure.

DAG Rebuild (Ready)

signal_dag_rebuild() is a cheap atomic increment. The scheduler rebuilds the full DAG from catalog tables on the next cycle, so adding/removing STs is a live operation with no restart required.

Status Lifecycle (Ready)

The INITIALIZING → ACTIVE → SUSPENDED → ERROR state machine exists. A green ST can be created in INITIALIZING state, manually refreshed to catch up, then flipped to ACTIVE for the scheduler to pick up.


Core Challenge: Identity & Naming

The central tension is the UNIQUE constraint on (pgt_schema, pgt_name) in pgt_stream_tables and the UNIQUE on pgt_relid (storage table OID). Two STs cannot share the same qualified name or point to the same physical table. The swap mechanism must navigate this.


Approaches

Approach A: Shadow Tables with Atomic Rename

Concept: Create the green ST with a shadow name (e.g., order_totals__pgt_green), let it catch up, then do an atomic three-way rename in a single transaction.

Lifecycle:

1. User calls:  pgtrickle.create_green('order_totals', query => '...', ...)
2. System creates ST:  order_totals__pgt_green  (status: INITIALIZING)
3. System runs FULL refresh on green ST
4. Green ST flips to ACTIVE — scheduler runs differentials alongside blue
5. Convergence detected (or user calls promote)
6. In one transaction:
     ALTER TABLE order_totals         RENAME TO order_totals__pgt_retiring;
     ALTER TABLE order_totals__pgt_green RENAME TO order_totals;
     UPDATE pgtrickle.pgt_stream_tables ...  -- fix names, relids
     DROP the retired ST (or keep for rollback)
7. Signal DAG rebuild

Pros: - Simple mental model — two STs, one rename. - Consumers querying SELECT * FROM order_totals see no change. - Minimal catalog schema changes (only need a green_of BIGINT FK column). - Works for both per-ST and whole-pipeline swaps.

Cons: - ALTER TABLE ... RENAME takes AccessExclusiveLock — brief (sub-ms catalog-only operation) but blocks concurrent queries momentarily. - For whole-pipeline swap of N STs, N renames in one transaction — lock ordering must be deterministic to avoid deadlocks. - Downstream STs referencing the blue ST by OID in their defining_query need query text rewriting at promote time. - Not truly zero-downtime: there is a lock window (typically < 1ms per table).

Complexity: Low–Medium.


Approach B: Views as Public Interface

Concept: The public-facing name is always a view. The physical storage table has a versioned name. Swap = CREATE OR REPLACE VIEW pointing to the new storage table.

Lifecycle:

1. On initial ST creation:
     Storage table:  order_totals__pgt_v1
     Public view:    CREATE VIEW order_totals AS SELECT * FROM order_totals__pgt_v1;
2. Green version:    order_totals__pgt_v2  (new ST, separate pgt_id)
3. Once caught up:   CREATE OR REPLACE VIEW order_totals AS SELECT * FROM order_totals__pgt_v2;
4. Drop v1 ST (or retain for rollback)

Pros: - CREATE OR REPLACE VIEW is lightweight, faster than rename. - Clean separation: logical name (view) vs physical storage (versioned table). - Rollback = point the view back. - View replacement doesn’t invalidate open cursors on the old version.

Cons: - Breaking change for existing deployments — all STs would now live behind views. This changes behavior for INSERT (views need INSTEAD OF triggers or are read-only), \d output, pg_class catalogue lookups, index visibility from pg_indexes, etc. - Adds an indirection layer to every read query (minor but measurable on micro-benchmarks). - Downstream ST defining_query that references the view gets rewritten via the existing view-inlining code in api.rs — the inliner resolves views to their base query, so it would see the underlying versioned table. This creates a coupling: green STs depending on another green ST would need their view pointer updated first (ordering constraint). - More complex catalog model: need to track both view OID and storage table OID per entry.

Complexity: Medium–High. The breaking-change aspect makes this unsuitable as a retrofit unless pg_trickle adopts views-as-public-interface from the start (e.g., as a major version change).


Approach C: Pipeline Generations (First-Class Concept)

Concept: Introduce a “pipeline” entity with a generation counter. STs belong to a pipeline + generation. Only one generation is “active” at a time.

New catalog table:

CREATE TABLE pgtrickle.pgt_pipelines (
    pipeline_id    BIGSERIAL PRIMARY KEY,
    pipeline_name  TEXT UNIQUE NOT NULL,
    active_gen     INT NOT NULL DEFAULT 1,
    green_gen      INT,        -- NULL when no green is being prepared
    created_at     TIMESTAMPTZ DEFAULT now()
);

Extended pgt_stream_tables:

ALTER TABLE pgtrickle.pgt_stream_tables
    ADD COLUMN pipeline_id  BIGINT REFERENCES pgtrickle.pgt_pipelines,
    ADD COLUMN generation   INT;

The UNIQUE constraint on (pgt_schema, pgt_name) would be relaxed to UNIQUE (pgt_schema, pgt_name, generation).

Lifecycle:

1. pgtrickle.begin_green('my_pipeline')
   → clones all STs in the pipeline with generation + 1
   → new storage tables, new pgt_ids, inherits queries/config
   → green STs start catching up (FULL → DIFFERENTIAL)

2. pgtrickle.promote_green('my_pipeline')
   → atomically swap active_gen, rename tables, update references
   → mark old generation as retired

3. pgtrickle.rollback_green('my_pipeline')
   → drop green-generation STs, reset green_gen to NULL

4. pgtrickle.cleanup_generation('my_pipeline', gen)
   → drop retired storage tables and catalog entries

Pros: - First-class concept with clear semantics and good observability. - pgt_status() can show generation info: “green is 95% caught up.” - Supports both whole-pipeline and per-ST swaps (single-ST “pipeline”). - Enables rollback without keeping shadow tables. - Generation info is queryable — useful for monitoring and alerting.

Cons: - Largest schema change: new table, altered pgt_stream_tables, migration path for existing deployments, extension upgrade SQL. - “Pipeline” grouping may not map cleanly to arbitrary DAG subsets — what if a user wants to green-deploy 3 of 10 STs that share sources? - The rename/swap mechanics under the hood are the same as Approach A — this is essentially Approach A with richer metadata. - More concepts for users to learn.

Complexity: High, but most comprehensive.


Approach D: Schema-Based Isolation

Concept: Blue pipeline lives in the user’s schema. Green pipeline lives in a staging schema (e.g., pgtrickle_staging). Swap = rename schemas or adjust search_path.

Lifecycle:

1. pgtrickle.prepare_green(staging_schema => 'pgtrickle_staging')
   → recreate all STs in the staging schema
2. Green STs catch up in isolation
3. ALTER SCHEMA public RENAME TO public_old;
   ALTER SCHEMA pgtrickle_staging RENAME TO public;
4. Drop old schema

Pros: - Complete isolation during build-up. - Schema rename is a single catalog operation (fast).

Cons: - Very coarse-grained — swaps the entire schema, not just STs. - Source tables in the original schema → cross-schema query references. - ALTER SCHEMA ... RENAME is AccessExclusiveLock on all contained objects. - Doesn’t support per-ST granularity at all. - Application search_path changes required.

Complexity: Medium, but too inflexible for most scenarios.


Comparison Matrix

Criterion A: Shadow Rename B: Views C: Generations D: Schema
Per-ST granularity
Whole-pipeline swap
No breaking changes ⚠️ Schema migration
Rollback support ⚠️ Manual ✅ Easy ✅ Built-in ⚠️ Manual
Lock duration ~ms ~0 ~ms ~ms × N
Implementation effort Low Medium-High High Medium
Observability Basic Basic Rich Basic
Downstream query rewrite Required Automatic (via view) Required Not needed
Catalog schema changes Minimal Moderate Significant None

Convergence Detection

For auto-swap, the system needs to determine when the green pipeline has “caught up” to the blue pipeline. Options:

1. Frontier LSN Comparison (Recommended)

Compare green.frontier.sources[oid].lsnblue.frontier.sources[oid].lsn for all shared sources. When all sources pass, green is at least as fresh as blue.

  • Pro: Already tracked, no extra queries, precise.
  • Con: Doesn’t guarantee data equivalence (different query logic produces different results even at the same LSN).

2. Data Timestamp Comparison

green.data_timestamp ≥ blue.data_timestamp — simpler but coarser.

  • Pro: Single scalar comparison.
  • Con: Timestamp resolution may not match LSN precision.

3. Lag Threshold

User-defined maximum acceptable lag: “swap when green is within 5 seconds of blue.” Useful for time-sensitive workloads where exact convergence isn’t required.

4. Row Count / Content Hash

Compare COUNT(*) or pg_trickle.content_hash(table) between blue and green storage tables.

  • Pro: Deterministic proof of equivalence.
  • Con: Expensive on large tables; only meaningful when queries are identical.

Recommendation: Use frontier LSN comparison (#1) as the primary mechanism, with an optional lag threshold (#3) for user control. Content hash (#4) can be offered as a verify flag for cautious deployments.


Recommendation

Approach A (Shadow Tables with Atomic Rename) as the implementation strategy, augmented with lightweight metadata from Approach C for observability.

Rationale

  • Approach A is the lowest-risk, lowest-effort path that solves all must-have requirements.
  • The internal machinery (CDC sharing, independent frontiers, pgt_id isolation) already supports dual pipelines — what’s missing is just the orchestration layer.
  • Adding a green_of BIGINT FK and pipeline_group TEXT column to pgt_stream_tables gives the observability benefits of Approach C without the full pipeline entity overhead.
  • Approach B (views) would be a better long-term architecture but is a breaking change that warrants its own migration plan.
  • Approach D is too coarse for real-world use.

Proposed API Surface

-- Create a green version of an existing stream table.
-- Inherits query/schedule/mode from blue unless overridden.
SELECT pgtrickle.create_green(
    'public.order_totals',
    query    => 'SELECT ... FROM orders JOIN ...',  -- optional override
    schedule => '10s',                               -- optional override
    mode     => 'DIFFERENTIAL'                       -- optional override
);

-- Check green progress: frontier lag, row count, status.
SELECT * FROM pgtrickle.green_status('public.order_totals');
-- Returns: green_name, status, frontier_lag_bytes, data_timestamp_lag,
--          blue_row_count, green_row_count, converged (bool)

-- Promote: atomic swap green → active, retire blue.
-- Fails if green hasn't converged (override with force => true).
SELECT pgtrickle.promote_green('public.order_totals');
SELECT pgtrickle.promote_green('public.order_totals', force => true);

-- Promote all green STs that have converged.
SELECT * FROM pgtrickle.promote_all_green();

-- Abort: drop green ST, clean up.
SELECT pgtrickle.cancel_green('public.order_totals');

-- Rollback: swap back to the retired blue (if not yet cleaned up).
SELECT pgtrickle.rollback_green('public.order_totals');

Catalog Changes (Minimal)

ALTER TABLE pgtrickle.pgt_stream_tables
    ADD COLUMN green_of       BIGINT REFERENCES pgtrickle.pgt_stream_tables(pgt_id),
    ADD COLUMN pipeline_group TEXT;
  • green_of — points to the blue ST’s pgt_id. NULL for normal STs.
  • pipeline_group — optional label for grouping multi-ST swaps.

Promote Transaction (Pseudocode)

BEGIN;

-- 1. Lock both tables deterministically (by OID, ascending) to avoid deadlocks
LOCK TABLE blue_table IN ACCESS EXCLUSIVE MODE;
LOCK TABLE green_table IN ACCESS EXCLUSIVE MODE;

-- 2. Rename
ALTER TABLE blue_table  RENAME TO order_totals__pgt_retired;
ALTER TABLE green_table RENAME TO order_totals;

-- 3. Update catalog
UPDATE pgtrickle.pgt_stream_tables
   SET pgt_name = 'order_totals__pgt_retired',
       status   = 'SUSPENDED'
 WHERE pgt_id   = :blue_id;

UPDATE pgtrickle.pgt_stream_tables
   SET pgt_name  = 'order_totals',
       green_of  = NULL
 WHERE pgt_id    = :green_id;

-- 4. Rewrite defining_query of any downstream STs that reference
--    the old blue OID → new green OID
UPDATE pgtrickle.pgt_dependencies
   SET source_relid = :green_oid
 WHERE source_relid = :blue_oid;

-- 5. Signal DAG rebuild
SELECT pgtrickle.signal_dag_rebuild();

COMMIT;

The blue ST remains in the catalog as SUSPENDED until explicitly cleaned up, enabling rollback.


Open Questions

  1. Downstream OID rewrites. When a green ST replaces a blue ST that other STs depend on, the downstream defining_query text contains the old OID (or table name). How deep does the rewrite need to go? Just the dependency edges, or the SQL text too? (The SQL text uses names, not OIDs, so renaming the table should suffice — but needs verification.)

  2. Consistency groups during transition. If blue and green STs participate in the same diamond consistency group, could the scheduler produce inconsistent results? Likely no — they have independent pgt_ids and different names, so the consistency group logic treats them as separate entities. But edge cases need analysis.

  3. Whole-pipeline ordering during promote. For a pipeline with 5 STs in a DAG, the promote must happen in topological order (upstream first) so that downstream STs immediately see the promoted upstream. Or should the promote be all-at-once in a single transaction?

  4. CDC trigger sharing with modified queries. If the green ST has a different defining_query that references additional source tables not in the blue ST’s dependency set, new CDC triggers must be created for those sources. The existing setup_cdc_for_source() handles this, but the green ST creation flow needs to call it.

  5. Storage table column mismatch. If the green ST has a different column set (due to query changes), the rename alone isn’t sufficient — downstream queries expecting the blue schema will break. Should promote_green verify schema compatibility, or is this the user’s responsibility?

  6. Advisor lock contention. Both blue and green STs use independent advisory locks (keyed by pgt_id), so no contention. But during the promote transaction, we need to ensure neither ST is mid-refresh. The promote should acquire both advisory locks before proceeding.

  7. Auto-promote timing. Should the scheduler perform auto-promote, or should it be a separate background check? Embedding it in the scheduler loop adds complexity; a separate polling function called by cron may be simpler.


Related Work

  • PLAN_UPGRADE_MIGRATIONS.md — extension upgrade migrations (overlapping concern for version-to-version transitions).
  • PLAN_HYBRID_CDC.md — CDC mode transitions already implement a form of “hot swap” from trigger to WAL.
  • PLAN_FUSE.md — anomalous change volume detection could trigger auto-rollback of a green pipeline.
  • REPORT_EXTERNAL_PROCESS.md — external sidecar could orchestrate blue-green at a higher level.
  • REPORT_DOWNSTREAM_CONSUMERS.md — downstream consumer patterns (change feeds, NOTIFY, logical replication) and how they interact with blue-green swap.

Next Steps

  1. Validate Approach A with a manual proof-of-concept: create two STs on the same source, let both catch up, then rename in a transaction. Measure lock duration and verify downstream queries.
  2. Prototype create_green / promote_green as SQL functions — minimal viable implementation.
  3. Design the convergence check — implement green_status() using frontier comparison.
  4. Evaluate Approach B (views) as a longer-term migration path, potentially gated behind a GUC (pg_trickle.use_view_aliases = on).
  5. Write an ADR once we commit to an approach.