Contents

pg_trickle — Overall Project Status & Performance Report

Date: 2026-03-24
Version: 0.10.0 (updated with deep gap analysis)
Author: Deep analysis of architecture, configuration, code paths, benchmarks, and plans


Executive Summary

pg_trickle is a PostgreSQL 18 extension implementing streaming tables with incremental view maintenance (IVM) based on differential dataflow (DBSP). At v0.10.0, the project has reached a high level of maturity with 21 DVM operators, hybrid CDC, parallel refresh, and broad SQL coverage including recursive CTEs, window functions, LATERAL joins, and set operations.

Key findings:

Dimension Status Rating
Correctness 36 edge cases catalogued; all P0 resolved; EC-01 ≥3-scan boundary remains ★★★★☆
Performance 7–42× speedup at 1% change rate; MERGE is 70–97% of time ★★★★☆
Defaults Conservative and safe; advanced features opt-in ★★★★★
Scalability Parallel refresh implemented but off by default ★★★☆☆
Safety 51% unsafe reduction; 1 production panic site; 65 unwrap calls in core modules ★★★☆☆
Ergonomics Good basics; silent AUTO mode downgrades; no explain functions ★★★☆☆
Observability NOTIFY alerts, refresh stats, health checks, dependency tree ★★★★☆
SQL Coverage 21 operators, 60+ aggregate functions; VALUES/TABLESAMPLE unsupported ★★★★☆
DVM Engine 19.7K-line parser; 25 group-rescan aggregates; no offline SQL validation ★★★☆☆
Documentation Good reference; missing support matrix, interaction guides, patterns ★★★☆☆
Deployment PgBouncer, CNPG, backup/restore, dbt integration ★★★★☆

Overall assessment: The project is production-ready for typical analytical workloads. The primary performance bottleneck is PostgreSQL’s own MERGE execution (70–97% of refresh time), meaning the Rust-side pipeline is highly optimized. The main opportunities for improvement are in scaling to larger deployments (100+ stream tables), reducing CDC write-side overhead, unlocking the parallel refresh feature by default, and addressing the ergonomics/observability gaps identified in this deep analysis (§12–§17).

Priority note (2026-06): The project’s primary goals now explicitly target maximum performance, low latency, and high throughput with differential refresh as the default mode and full refresh as a last resort. Scalability (★★★☆☆) is the most critical gap relative to this ambition. Items A-2 (Columnar Change Tracking) and D-4 (Shared Change Buffers) have been promoted to v0.12.0 to address this, and WAKE-1 (adaptive scheduler wake) has been pulled forward to v0.11.0. See ROADMAP.md for details.


Table of Contents

  1. Architecture & Design Quality
  2. Default Configuration Analysis
  3. Performance Status
  4. CDC Pipeline Analysis
  5. DVM Engine Analysis
  6. Scheduler & Coordination
  7. Safety & Error Handling
  8. Scalability Assessment
  9. End-to-End Latency Analysis
  10. Recommendations for Future Releases
  11. Summary & Prioritized Roadmap
  12. Deep Gap Analysis — DVM Correctness & SQL Coverage
  13. Deep Gap Analysis — Safety & Code Quality
  14. Deep Gap Analysis — Performance Opportunities
  15. Deep Gap Analysis — Ergonomics & API Design
  16. Deep Gap Analysis — Documentation & Onboarding
  17. Deep Gap Analysis — Testing & Verification

1. Architecture & Design Quality

1.1 Component Overview

The system is structured into well-separated layers:

User SQL API (api.rs)
    ↓
Catalog (catalog.rs) ←→ DDL Hooks (hooks.rs)
    ↓
CDC Layer (cdc.rs / wal_decoder.rs)
    ↓ Change buffers (pgtrickle_changes schema)
DVM Engine (dvm/) — Parser → OpTree → Delta SQL
    ↓
Refresh Engine (refresh.rs) — FULL / DIFFERENTIAL / IMMEDIATE
    ↓
Scheduler (scheduler.rs) ←→ DAG (dag.rs) ←→ Shared Memory (shmem.rs)
    ↓
Monitoring (monitor.rs) — Stats, alerts, health checks

1.2 Design Strengths

  • Single-binary extension: No external dependencies beyond PostgreSQL itself. No message queues, no separate services.
  • Theoretical foundation: Grounded in DBSP (Budiu et al. 2023), giving formal correctness guarantees for differential computation.
  • Layered optimization: Each layer has independent optimization passes (CDC column pruning → DVM predicate pushdown → MERGE planner hints → prepared statements).
  • Adaptive behavior: Automatic fallback from DIFFERENTIAL to FULL when change ratio exceeds threshold; auto-backoff when refresh outpaces schedule.
  • Crash safety: Frontier-based versioning ensures no data loss; orphaned RUNNING records auto-recovered on scheduler restart.

1.3 Design Concerns

  • Thread-local caching: Delta SQL templates and MERGE plans cached per backend session via thread_local!. Cross-session invalidation uses a shared atomic counter (CACHE_GENERATION), but cache warm-up cost hits every new connection.
  • Shared memory ring buffer: 32-slot invalidation ring for DAG changes. Overflow triggers full O(V+E) DAG rebuild. Under heavy DDL (>32 CREATE/ALTER in one tick), this is a cliff.
  • SPI coupling: All catalog access and MERGE execution happen through SPI, which holds lightweight locks and may conflict with concurrent DDL. SPI blocks are kept short per coding guidelines, but the total SPI time per refresh cycle is significant.

2. Default Configuration Analysis

2.1 Default Values — Safety Assessment

The defaults prioritize safety and predictability over maximum throughput:

GUC Default Assessment
enabled true ✅ Expected
cdc_mode 'auto' ✅ Safe — starts with triggers, upgrades to WAL if available
cdc_trigger_mode 'statement' ✅ Optimal — 50–80% less write overhead than row-level
scheduler_interval_ms 1000 ✅ Balanced — 1s wake cycle; responsive without busy-looping
min_schedule_seconds 1 ✅ After ergonomics fix (was 60)
default_schedule_seconds 1 ✅ Fast default for isolated CALCULATED STs
differential_max_change_ratio 0.15 ✅ Conservative — falls back to FULL at 15% change rate
cleanup_use_truncate true ✅ O(1) cleanup vs O(n) DELETE
use_prepared_statements true ✅ 1–2ms savings per refresh
merge_planner_hints true ✅ Dynamic optimizer hints improve join strategies
merge_work_mem_mb 64 ✅ Reasonable for MERGE sorting
tick_watermark_enabled true ✅ Critical for cross-source consistency
compact_threshold 100,000 ✅ Triggers compaction before buffers grow large

2.2 Defaults That Limit Performance (by design)

GUC Default Why Off Impact if Enabled
parallel_refresh_mode 'off' Safety — new feature, needs explicit opt-in 2.5–5× throughput for independent ST graphs
auto_backoff false Predictability — users expect fixed schedules Prevents scheduler overload when refresh > schedule
tiered_scheduling false Simplicity — equal freshness for all STs Reduces refresh load for rarely-queried STs
buffer_partitioning 'off' Overhead — partition management adds CDC cost Faster cleanup for high-throughput sources
allow_circular false Correctness — cyclic STs have convergence risks Enables feedback-loop patterns
block_source_ddl false Ergonomics — blocking DDL is surprising Prevents schema drift breaking STs silently

2.3 Default Configuration Verdict

The defaults are well-chosen. They provide: - ✅ Predictable behavior — no surprises for new users - ✅ Safe operations — conservative fallback thresholds - ✅ Good baseline performance — statement-level triggers, prepared statements, planner hints all enabled by default - ⚠️ Suboptimal for large deployments — parallel refresh off, tiered scheduling off, no auto-backoff

The design follows the principle of “safe by default, fast by configuration.”


3. Performance Status

3.1 Current Benchmark Results

Headline numbers at 100K rows, 1% change rate (typical production):

Scenario FULL (ms) INCREMENTAL (ms) Speedup
scan 300 4.6 65.9×
filter 134 3.4 39.5×
aggregate 17 2.5 6.8×
join 294 18.0 16.3×
join_agg 28 14.6 1.9×

Zero-change latency: < 10ms (target achieved via EXISTS(<buffer>) short-circuit)

3.2 Performance Evolution

The project has gone through 9 optimization phases:

Phase Key Optimization Impact
Baseline INCREMENTAL slower than FULL for some cases
P1 JSONB fix, drop unused columns, LSN bounds 2.7× improvement
P2 Delta SQL caching, MERGE template caching +10–20ms/refresh
P3 LATERAL VALUES for aggregates, PK resolution Correctness + speed
P4 No-op short-circuit, prepared statements 41.7× peak
P5 Deferred cleanup, warm-cycle tuning Stability
P6 Statement-level triggers 50–80% write reduction
P7 Parallel refresh infrastructure 2.5–5× throughput (opt-in)
P8–P9 Algebraic aggregates, covering indexes, compaction 20–50% latency reduction

3.3 Where Time Is Spent

Critical path breakdown (100K rows, 1% change rate):

Component Time % of Total
Decision phase < 1ms < 1%
Delta SQL generation (Rust) < 1ms < 1%
MERGE plan + execution (PG) 3–15ms 70–97%
Cleanup (deferred) < 7.7ms 2–20%
SPI overhead < 1ms < 5%

The Rust-side pipeline is essentially negligible. Further Rust optimization yields diminishing returns. The bottleneck is PostgreSQL’s MERGE executor — plan compilation, index lookups, heap writes, and WAL generation.

3.4 Pure Rust Micro-benchmarks

From Criterion.rs (benches/):

Operation Time
quote_ident (simple) 93.9 ns
col_list (10 cols) 1.37 µs
prefixed_col_list (20 cols) 4.00 µs
expr_to_sql (column ref) 26.3 ns
lsn_gt comparison 78.9 ns
frontier_json serialize (20) 3.76 µs
dag_build_linear_100 62.5 µs
xxh64 (4096 bytes) 384.1 ns
diff_scan (20 cols) 47.7 µs
diff_inner_join 30.7 µs
diff_full_pipeline 54.2 µs

All operations are sub-millisecond. The delta SQL generation pipeline (diff_full_pipeline at 54.2 µs) is 3–4 orders of magnitude faster than MERGE execution.

3.5 Scaling Characteristics

Change Rate 10K rows 100K rows Trend
1% 1.5–5× 5–66× Better at scale
10% 1.4–2.2× 1.8–3.3× Moderate benefit
50% 0.6–1.3× 0.6–1.1× FULL is better; adaptive fallback kicks in

INCREMENTAL scales super-linearly with table size (because delta stays small while FULL grows linearly). The adaptive fallback threshold at 15% prevents INCREMENTAL from being used when it would be slower.


4. CDC Pipeline Analysis

4.1 Trigger-Based CDC (Default)

Architecture: Row-level or statement-level AFTER triggers capture DML into typed buffer tables (pgtrickle_changes.changes_<oid>).

Statement-level triggers (default since v0.4.0): - Single trigger invocation per DML statement (not per row) - Uses REFERENCING NEW TABLE AS __pgt_new transition tables - 50–80% less write-side overhead than row-level for bulk DML - Slightly higher per-invocation planning cost (amortized over batch size)

Buffer table schema: - lsn — WAL position for ordering - action — I/U/D/T (insert/update/delete/truncate) - pk_hash — xxHash64 of primary key columns - changed_cols — Bitmask for UPDATE (which columns changed) - new_*/old_* — Per-column typed values (no JSONB serialization)

4.2 CDC Optimizations Implemented

Optimization Version Impact
Statement-level triggers v0.4.0 50–80% write overhead reduction
Typed buffer columns (no JSONB) v0.1.0 Eliminates serialization cost
Single covering index v0.10.0 20% trigger overhead reduction
Changed columns bitmask v0.4.0 Skip unchanged columns in UPDATE
Selective CDC capture v0.9.0 50–80% reduction for wide tables (only capture referenced columns)
Change buffer compaction v0.10.0 50–90% delta reduction (cancel INSERT→DELETE pairs)
Partitioned buffers v0.6.0 Faster cleanup via DETACH+DROP
RLS bypass on buffers v0.5.0 No policy overhead on internal tables

4.3 CDC Write-Side Overhead (Estimated)

Based on trigger overhead analysis planning:

Table Width DML Type Est. Overhead/Row Write Slowdown
Narrow (3 cols) INSERT 1–3 µs 1.5–2.5×
Narrow (3 cols) UPDATE 2–5 µs 2.0–3.0×
Medium (8 cols) UPDATE 3–8 µs 2.5–4.0×
Wide (20 cols) UPDATE 5–15 µs 3.0–5.0×

The write-side overhead is moderate and acceptable for OLTP workloads. Statement-level triggers reduce this significantly for batch operations.

4.4 WAL-Based CDC (Optional)

Available when wal_level = 'logical'. Uses pgoutput logical replication:

  • Transition lifecycle: TRIGGER → TRANSITIONING → WAL (with automatic fallback)
  • Text-based parsing: pgoutput messages parsed positionally (G2.3)
  • Progressive timeout: 1×/2×/3× timeout with LOG warnings before abort

Current state: Functional but secondary. Trigger-based CDC is preferred for single-transaction atomicity and simpler operations.

4.5 CDC Bottlenecks & Opportunities

Bottleneck Severity Mitigation
PL/pgSQL trigger overhead Medium Statement-level triggers (implemented)
WAL writes for buffer tables Medium UNLOGGED buffers (planned, behind GUC)
pgoutput text parsing (WAL mode) Low Positional parsing (implemented)
Column-level capture for SELECT * Low Falls back to full capture (correct)
Buffer index maintenance on INSERT Low Single covering index (implemented)

5. DVM Engine Analysis

5.1 Operator Coverage

21 operators covering virtually all analytical SQL patterns:

Category Operators Status
Basic Scan, Filter, Project ✅ Optimized
Joins Inner, Left, Right, Full, Semi, Anti ✅ Correct (B3-2 weight aggregation)
Aggregates GROUP BY with 60+ aggregate functions ✅ Algebraic maintenance for AVG/STDDEV/VAR
Set Operations UNION ALL, INTERSECT, EXCEPT
Advanced Window, Distinct, Subquery
CTE CteScan, RecursiveCte, RecursiveSelfRef ✅ Semi-naive evaluation
Lateral LateralFunction, LateralSubquery
Scalar Subquery ScalarSubquery ✅ Decorrelation rewrite

5.2 Query Rewriting Pipeline

Six normalization passes before DVM differentiation:

  1. View inlining — Fixpoint closure (inline views, re-inline if result contains more views)
  2. DISTINCT ON → ROW_NUMBER OVER — Canonical form
  3. GROUPING SETS/CUBE/ROLLUP → UNION ALL — Explosion guard (max_grouping_set_branches GUC, default 64)
  4. Scalar subquery decorrelation — Rewrites correlated subqueries
  5. De Morgan normalization — Multi-pass sublink rewriting
  6. Nested window expression lifting — Canonical window form

5.3 DVM Performance Characteristics

  • Delta SQL generation: 47–54 µs for typical queries (negligible)
  • Template caching: Eliminates re-parsing after first execution (~45ms saved per refresh cycle)
  • CTE delta memoization: Each CTE reference reuses the same delta subquery (no redundant computation)
  • Predicate pushdown: Filter conditions pushed into change buffer scans (reduces data volume early)

5.4 DVM Correctness Measures

  • TPC-H validation: All 22 queries verified against known-correct results
  • Property-based testing: proptest regressions tracked
  • Weight aggregation: Z-set algebra for diamond flows (replaces incorrect DISTINCT ON)
  • Semi-naive evaluation: Recursive CTEs use incremental fixpoint
  • Delete-and-Rederive: Correct deletion for recursive CTEs

6. Scheduler & Coordination

6.1 Architecture

Two-tier background worker model:

  1. Launcher (one per cluster) — discovers databases, spawns schedulers
  2. Per-database scheduler — reads DAG, coordinates refreshes, manages retries

6.2 Refresh Coordination

The scheduler executes a topological refresh cycle:

  1. Build/update DAG from catalog
  2. Detect consistency groups (diamond dependencies → atomic refresh)
  3. For each group in topological order:
    • Check schedule staleness
    • Determine refresh action (FULL/DIFFERENTIAL/NO_DATA)
    • Execute with retry backoff (max 3 consecutive errors → auto-suspend)
  4. Record history, emit NOTIFY alerts

6.3 Parallel Refresh (opt-in)

Implemented in v0.4.0 via dynamic background workers:

  • Coordinator dispatches independent execution units to workers
  • Worker tokens: CAS-based allocation with cluster-wide budget (max_dynamic_refresh_workers default 4)
  • Wave reset: After all in-flight workers complete and any succeeded, re-evaluate upstream counts for cascading refreshes
  • Diamond protection: Atomic groups always execute serially within a consistency boundary

6.4 Advanced Scheduling Features

Feature Default Description
Tiered scheduling Off HOT (1×), WARM (2×), COLD (4×), FROZEN (skip)
Auto-backoff Off Doubles schedule when refresh > schedule interval
Adaptive threshold On Auto-tunes FULL/DIFF cutoff from historical data
Cyclic SCC iteration Off Fixed-point convergence for circular dependencies
Crash recovery On Auto-marks orphaned RUNNING records as FAILED
Tick watermark On Cross-source snapshot consistency

6.5 Scheduler Bottlenecks

Bottleneck Impact Current Mitigation
Serialized refresh (default) Only 1 ST refreshed at a time Parallel mode available (opt-in)
DAG rebuild on ring overflow O(V+E) with >32 concurrent DDL Incremental rebuild fallback
Per-ST catalog lock probes FOR UPDATE SKIP LOCKED per ST Capped by concurrent_refreshes GUC
History queries for cost model Last 10+5 records per ST per cycle SPI queries; infrequent

7. Safety & Error Handling

7.1 Unsafe Code

  • Before reduction: 1,309 unsafe blocks
  • After reduction (v0.7.0): 641 unsafe blocks (51% reduction)
  • Approach: Six well-documented safe abstractions (pg_cstr_to_str, pg_list, cast_node!, parse_query, SubTransaction, function conversions)
  • All remaining unsafe blocks have // SAFETY: comments per project guidelines

7.2 Error Handling

  • PgTrickleError enum with classified variants (user, schema, system, internal)
  • No unwrap() or panic!() in code reachable from SQL
  • SQLSTATE-based retry classification (F29) — identifies retriable errors (lock timeouts, serialization failures) vs permanent errors
  • Exponential backoff with jitter for retries
  • Max consecutive errors (default 3) before auto-suspend with NOTIFY alert

7.3 Concurrency Safety

  • PgLwLock for shared state (short hold times)
  • PgAtomic (CAS) for counters (worker tokens, cache generation, DAG version)
  • Advisory locks for non-blocking compaction attempts
  • SAVEPOINT for atomic consistency group refresh (rollback on any failure)
  • FOR UPDATE SKIP LOCKED for concurrent ST refresh coordination

7.4 Data Safety

  • Frontier-based versioning: Stream tables represent query result at a consistent past timestamp (Delayed View Semantics)
  • Cross-source tick watermark: All refreshes in a tick capped to a single WAL LSN
  • Crash recovery: Orphaned RUNNING records auto-detected and marked FAILED
  • DML guard trigger: Prevents direct INSERT/UPDATE/DELETE on stream tables
  • Schema change detection: DDL event triggers detect upstream changes and mark STs for reinitialization

8. Scalability Assessment

8.1 Current Scalability Profile

Dimension Status Limit
Stream table count Good Tested with ~100 STs; DAG operations O(V+E)
Source table size Good 100K–1M rows benchmarked; MERGE scales with delta size
Change rate Good Adaptive fallback prevents degradation at high rates
Column count Good Selective CDC + bitmask up to 63 columns
Query complexity Good 21 operators; TPC-H 22 queries validated
Concurrent databases Good Per-database scheduler isolation
Concurrent writers Moderate Statement-level triggers reduce contention
Parallel refresh Opt-in 4 workers default; proven 2.5–5× throughput

8.2 Scalability Limits

Constraint Current Concern
Invalidation ring 32 slots >32 concurrent DDL → full DAG rebuild
Changed columns bitmask 63 columns (BIGINT) >63 columns loses per-column tracking
TopK LIMIT 1,000 default Large LIMIT values degrade IMMEDIATE mode
Recursive CTE depth 100 default Deep recursion may not converge
Fixpoint iterations 100 default Cyclic STs may not converge
MERGE executor PG-internal Cannot optimize beyond planner hints

8.3 Deployment Compatibility

Platform Status Notes
PostgreSQL 18 Primary target
PgBouncer ✅ v0.10.0 Pooler compatibility mode
CloudNativePG ✅ v0.1.1 Extension-only Docker images
Backup/restore ✅ v0.8.0 Auto-reconnect infrastructure
dbt ✅ v0.4.0 Materialization + health macros
Logical replication ⚠️ WAL CDC mode; not for read replicas

9. End-to-End Latency Analysis

9.1 Latency Components

For a single stream table refresh cycle:

Source DML committed
    ↓ [CDC trigger overhead: 1–15 µs/row]
Change buffer populated
    ↓ [Scheduler wake: 0–1000ms (scheduler_interval_ms)]
Scheduler tick
    ↓ [Schedule check: < 1ms]
    ↓ [Decision phase: < 1ms]
    ↓ [Delta SQL generation: < 0.1ms (cached)]
Refresh execution
    ↓ [MERGE plan + execute: 3–15ms at 1% change rate]
    ↓ [History logging: < 1ms]
Stream table updated
    ↓ [Deferred cleanup: runs next cycle]

9.2 End-to-End Latency Budget

Component Best Case Typical Worst Case
CDC trigger < 1ms (bulk) 1–5ms 15ms (wide, row-level)
Scheduler wake delay 0ms 500ms 1000ms
Schedule check < 1ms < 1ms < 1ms
Delta SQL (cached) < 0.1ms < 0.1ms 45ms (cold cache)
MERGE execution 3ms 10ms 300ms+ (large delta)
Total ~4ms ~515ms >1300ms

Dominant latency factor: Scheduler wake interval contributes the most to typical end-to-end latency. For IMMEDIATE mode (IVM), the latency drops to the MERGE execution time alone (~3–15ms), since changes are applied synchronously within the same transaction.

9.3 Throughput Analysis

Metric Value Condition
Refreshes/second (serial) ~50–200 1 ST, 1% change rate
Refreshes/second (parallel) ~200–1000 4 workers, independent STs
Zero-change throughput ~500+ EXISTS short-circuit, < 2ms
Source write throughput impact 1.5–5× slower Depends on table width and trigger mode
Max STs per database ~100+ tested DAG rebuild < 100ms

10. Recommendations for Future Releases

10.1 High Priority — Performance & Throughput

R1: Enable Parallel Refresh by Default (v0.11)

Current: parallel_refresh_mode = 'off'
Recommendation: Change default to 'on'
Impact: 2.5–5× throughput improvement for deployments with independent STs
Risk: Low — the feature is implemented and tested since v0.4.0; six releases of stabilization
Prerequisite: Validate with TPC-H and dbt integration tests under parallel mode

R2: UNLOGGED Change Buffers (v0.11–v0.12)

Current: Change buffers are standard WAL-logged tables
Recommendation: Add GUC unlogged_change_buffers (default 'auto')
Impact: ~30% reduction in CDC write-side WAL volume
Risk: Medium — UNLOGGED tables are lost on crash. Requires frontier reset logic to detect lost buffers and trigger full refresh. This is acceptable because change buffers are transient by nature.
Mitigation: On recovery, detect missing UNLOGGED tables and force FULL refresh for affected STs

R3: Reduce Scheduler Wake Interval for Latency-Sensitive Workloads (v0.11)

Current: scheduler_interval_ms = 1000 (1 second)
Recommendation: Provide a low_latency profile or reduce default to 200ms
Impact: Reduces median end-to-end latency from ~515ms to ~115ms
Risk: Higher CPU usage on scheduler process; mitigate with idle/active detection (skip tick if no pending changes)
Trade-off: Consider adaptive wake interval — 100ms when changes pending, 5000ms when idle

R4: SemiJoin Delta-Key Pre-Filtering (v0.11)

Current: SemiJoin Part 2 rescans full left table
Recommendation: Implement O-1 from TPC-H benchmarking plan
Impact: 15–26× improvement for Q18, Q20, Q21 type queries
Risk: Low — restricted to SemiJoin operator diff, no architectural changes

R5: Cost-Based Refresh Strategy Selection — Done in v0.10.0

Status: ✅ Implemented as item B-4 in v0.10.0. The estimate_cost_based_threshold + compute_adaptive_threshold functions apply a 60/40 blend of ratio-based and history-driven cost model from pgt_refresh_history; cold-start fallback to the fixed GUC threshold.

Recommendation: Integrate historical timing data more aggressively

10.2 Medium Priority — Scalability

R6: Partitioned Stream Tables (v0.11–v0.12)

Current: Stream tables are single unpartitioned tables
Recommendation: Support RANGE/LIST/HASH partitioning for large stream tables (10M+ rows)
Impact: Partition-scoped MERGE reduces lock contention and enables partition pruning at query time
Risk: Medium — partition management adds complexity; need to handle partition creation/detach in refresh cycle

R7: UNLOGGED-to-Logged Buffer Promotion (v0.12)

Current: Buffer partitioning available but off by default
Recommendation: Default buffer_partitioning = 'auto' for sources with high write throughput (detected by change buffer growth rate)
Impact: Faster cleanup (DETACH+DROP vs TRUNCATE) and potential for parallel scans of old partitions
Risk: Low — partition lifecycle management already implemented

R8: Shared Change Buffers (v0.13)

Current: Each source table has its own change buffer
Recommendation: Allow multiple STs watching the same source to share a single change buffer (with per-ST LSN frontier tracking)
Impact: Reduces write amplification for fan-out patterns (one source → many STs)
Risk: Medium — requires careful frontier management and concurrent consumer coordination

R9: Increase Invalidation Ring Capacity (v0.11)

Current: 32-slot ring buffer
Recommendation: Increase to 128 or 256 slots
Impact: Reduces frequency of full DAG rebuilds under heavy DDL
Risk: Minimal — slightly more shared memory; O(ring_size) dedup check is still fast

10.3 Medium Priority — Safety & Ergonomics

R10: Enable Auto-Backoff by Default (v0.11)

Current: auto_backoff = false
Recommendation: Change default to true
Impact: Prevents scheduler overload when refresh consistently exceeds schedule interval; reduces log noise and wasted CPU
Risk: Low — well-tested since v0.4.0; users can still override
Rationale: Auto-backoff is the safe behavior. The current default silently wastes resources when refresh can’t keep up.

R11: Enable Tiered Scheduling by Default (v0.12)

Current: tiered_scheduling = false
Recommendation: Change default to true with sensible tier detection (e.g., based on last-queried timestamp from pg_stat_user_tables)
Impact: Automatically reduces refresh frequency for cold STs; improves overall throughput for hot STs
Risk: Low — users can override per-ST; adds modest complexity to scheduler logic

R12: Block Source DDL by Default (v0.12)

Current: block_source_ddl = false
Recommendation: Change default to true with clear error messages
Impact: Prevents silent schema drift that breaks STs; forces explicit ALTER STREAM TABLE after schema changes
Risk: Medium — may surprise users who expect transparent DDL propagation. Good error messages are essential.

R13: Wider Changed-Column Bitmask (v0.12)

Current: BIGINT bitmask limited to 63 columns
Recommendation: Use array of BIGINTs or bytea for tables with >63 columns
Impact: Column-level tracking for very wide tables
Risk: Low — implementation is straightforward; most tables are <63 columns

10.4 Low Priority — Advanced Optimizations

R14: Async CDC via Custom Output Plugin (v0.14+)

Current: Text-based pgoutput parsing
Recommendation: Custom output plugin with binary-efficient format
Impact: Would eliminate text parsing overhead for WAL-based CDC
Risk: High — requires C-level output plugin development and maintenance; pgoutput is the standard

R15: Multi-Table Delta Batching — Done in v0.10.0

Status: ✅ Implemented as items B3-2 + B3-3 in v0.10.0. Weight aggregation (GROUP BY __pgt_row_id, SUM(weight) HAVING SUM(weight) != 0) correctly replaces the incorrect DISTINCT ON approach. Six diamond-flow property-based tests verify correctness.

Recommendation: Merge delta computation for co-scheduled STs with shared sources

R16: Adaptive Scheduler Wake Interval (v0.12)

Current: Fixed 1000ms wake interval
Recommendation: Event-driven wake using pg_notify from CDC triggers
Impact: Near-zero latency when changes appear; near-zero CPU when idle
Risk: Medium — adds trigger→scheduler signaling path; needs NOTIFY coalescing to avoid thundering herd

R17: MERGE Bypass for Append-Only Patterns — Done in v0.5.0

Status: ✅ Implemented as item A-3a in v0.5.0. The APPEND ONLY declaration on CREATE STREAM TABLE enables a direct INSERT path (no MERGE). A heuristic automatically reverts to full MERGE on first observed DELETE or UPDATE, emitting a WARNING + pgtrickle_alert NOTIFY.

Recommendation: Use direct INSERT for pure append patterns

R18: Index-Aware MERGE Planning — Done in v0.10.0

Status: ✅ Implemented as item A-4 in v0.10.0. SET LOCAL enable_seqscan = off is injected when delta row count is below merge_seqscan_threshold (0.001 × ST size). Covering index auto-created on __pgt_row_id with INCLUDE clause for ≤8-column schemas. Reverts at transaction end via SET LOCAL (not SET).

Recommendation: Hint index usage based on change buffer PK distribution


11. Summary & Prioritized Roadmap

Immediate Wins (v0.11)

ID Recommendation Effort Impact
R1 Parallel refresh on by default Low 2.5–5× throughput
R4 SemiJoin delta-key pre-filter Medium 15–26× for affected queries
R9 Larger invalidation ring Trivial Prevents full DAG rebuilds
R10 Auto-backoff on by default Trivial Prevents scheduler overload

Near-Term (v0.12)

ID Recommendation Effort Impact
R2 UNLOGGED change buffers Medium ~30% CDC write reduction
R3/R16 Adaptive scheduler wake interval Medium 4× latency reduction
R6 Partitioned stream tables High Unlocks 10M+ row STs
R7 Auto buffer partitioning for high-throughput sources Low Faster cleanup
R11 Tiered scheduling on by default Low Resource efficiency
R12 Block source DDL by default Low Safety
R13 Wider column bitmask (>63 columns) Low Wide table support

Medium-Term (v0.13–v0.14)

ID Recommendation Effort Impact
R8 Shared change buffers Medium Fan-out efficiency

Long-Term (v1.0+)

ID Recommendation Effort Impact
R14 Custom WAL output plugin Very High Optimal CDC throughput
Distributed stream tables Very High Multi-node scalability
External orchestrator High 100+ ST deployments

Appendix A: Version History (Performance Milestones)

Version Date Performance Milestone
v0.1.0 2026-02-26 Baseline DVM engine, trigger CDC
v0.1.3 2026-03-02 JSONB fix, delta SQL caching (2.7× improvement)
v0.2.0 2026-03-04 TopK, diamond consistency, IMMEDIATE mode
v0.3.0 2026-03-11 Correctness fixes, HAVING, FULL OUTER JOIN
v0.4.0 2026-03-12 Parallel refresh, statement-level triggers, prepared statements
v0.5.0 2026-03-13 Append-only fast path, source gating
v0.6.0 2026-03-14 Partitioned tables, idempotent DDL
v0.7.0 2026-03-15 Watermarks, circular deps, xxHash for wide tables
v0.8.0 2026-03-17 Backup/restore, reliability hardening
v0.9.0 2026-03-20 Algebraic aggregates (O(1) AVG/STDDEV/VAR), predicate pushdown
v0.10.0 2026-03-23 Covering indexes (20–50%), compaction (50–90%), PgBouncer support

Appendix B: Benchmark Matrix

5 scenarios × 3 change rates × 2 table sizes = 30 combinations

Scenarios: scan, filter, aggregate, join, join_agg
Change rates: 1%, 10%, 50%
Table sizes: 10K, 100K rows

Additional benchmarks: - TPC-H (22 queries at SF-0.01 and SF-0.1) - Trigger overhead (narrow/medium/wide × INSERT/UPDATE/DELETE) - Criterion.rs micro-benchmarks (DVM operators, utility functions) - Continuous regression detection via Bencher.dev

Appendix C: GUC Quick Reference (Performance-Relevant)

GUC Default Tuning Guidance
scheduler_interval_ms 1000 Lower for fresher data; higher for less CPU
differential_max_change_ratio 0.15 Lower = more FULL refreshes; higher = more DIFF
merge_work_mem_mb 64 Increase for large deltas (sorting)
merge_seqscan_threshold 0.001 Disables seqscan when delta is tiny fraction of ST
parallel_refresh_mode off Set to ‘on’ for independent ST graphs
max_dynamic_refresh_workers 4 Match to available CPU cores
auto_backoff false Enable if refresh often exceeds schedule
tiered_scheduling false Enable for large ST counts with mixed access patterns
compact_threshold 100000 Lower for faster buffers; 0 to disable
buffer_partitioning off Enable for high-throughput sources
cleanup_use_truncate true Keep true unless using change history

12. Deep Gap Analysis — DVM Correctness & SQL Coverage

Added 2026-03-24. Findings from deep source-level audit of src/dvm/, src/refresh.rs, and src/api.rs.

12.1 Known Correctness Boundaries

G12-1: EC-01 Phantom-Row-After-DELETE for ≥3-Scan Right Subtrees

Severity: Medium (silent incorrect result) File: src/dvm/operators/join_common.rs:1052–1088

The EC-01 fix (R₀ via EXCEPT ALL) is gated at join_scan_count(child) <= 2. Join subtrees with ≥3 scan nodes on the right side — TPC-H Q7, Q8, Q9 all qualify — retain the original phantom-row-after-DELETE bug: when both the left and right rows of a join are deleted in the same batch, the left-side DELETE can be silently dropped because R₁ (post-change right) no longer contains the partner row.

The threshold exists because EXCEPT ALL materializes the full right-side snapshot for wide join trees, causing PostgreSQL to spill multi-GB temp files. The NOT MATERIALIZED CTE hint partially mitigates this but doesn’t fully solve the cascading materialization problem for deep join trees.

Impact: Incorrect results for multi-way joins under simultaneous cross-table DELETE. Affects any join chain where the right subtree has ≥3 leaf scans.

Recommendation: Design a per-subtree CTE-based snapshot strategy to replace EXCEPT ALL for deep join trees. Track as EC01B (v0.12.0).

G12-2: TopK Refresh Assumption Violation

File: src/refresh.rs:1144–1200

TopK refresh assumes LIMIT without OFFSET or correlated WHERE conditions. If violated, the TopK micro-refresh may silently produce wrong results. No explicit runtime check validates this assumption.

Impact: Silent correctness bug if TopK path is reached with unsupported LIMIT patterns.

Recommendation: Add an assertion guard that validates TopK eligibility at refresh time, not just at creation time. Fall back to FULL refresh if assumptions are violated. Low effort (~2h).

G12-3: MERGE Duplicate-Row Error on Concurrent Changes

File: src/refresh.rs:738–750

When multiple delta rows target the same ST row (e.g., rapid INSERT→UPDATE→DELETE within one scheduler tick), PostgreSQL rejects with “MERGE command cannot affect row a second time.” The current fix materializes a deduplicated CTE before MERGE.

Impact: Forces materialization of the entire delta, defeating streaming optimization. For large deltas (100K+ rows), this adds measurable latency.

Recommendation: Investigate a two-pass MERGE approach: first-pass applies non-conflicting rows directly, second-pass handles the deduplicated residual. Alternatively, explore PostgreSQL 18’s MERGE ... ON CONFLICT path if available.

12.2 SQL Support Gaps

SQL Feature Status Severity Notes
VALUES in FROM Rejected Low Unsupported FROM item (VALUES clause...) — rare in analytical queries
TABLESAMPLE Rejected Low Non-deterministic by nature; documented rejection
Nested aggregates Rejected Medium AVG(SUM(...)) — requires subquery rewrite; users can work around
Non-equijoin conditions Partial Medium >, <, LIKE conditions in JOIN ON are not differentiated; may silently produce wrong results or fall back to FULL
NATURAL JOIN Warned Low F38 warning about column drift on ALTER; DIFFERENTIAL still works
VALUES() constructor in expressions Rejected Low VALUES (1),(2),(3) as standalone FROM rejected
JSON_TABLE advanced Partial Low Basic structure parsed; complex column definitions may be silently dropped
JSON_ARRAYAGG / JSON constructor ops Partial Low PG 18 SQL/JSON operators may not be fully recognized
Multi-column IN subquery Unclear Medium EXPR IN (subquery) where subquery returns multiple columns — status unclear; may produce wrong results

12.3 Aggregate Performance Classification

25 aggregate functions use group-rescan (O(group_size)) rather than algebraic O(1) maintenance:

Category Functions Strategy
Algebraic (O(1)) SUM, COUNT, AVG, STDDEV*, VAR, COVAR_, REGR_*, CORR Auxiliary columns
Min/Max (rescan) MIN, MAX Group-rescan (removing min/max requires full scan)
Group-rescan STRING_AGG, ARRAY_AGG, JSON[B]AGG, JSON[B]OBJECT_AGG, JSON_OBJECTAGG_STD, JSON_ARRAYAGG_STD, BIT_AND, BIT_OR, BIT_XOR, BOOL_AND, BOOL_OR, XMLAGG, ANY_VALUE, MODE, PERCENTILE_CONT, PERCENTILE_DISC, HYP_RANK, HYP_DENSE_RANK, HYP_PERCENT_RANK, HYP_CUME_DIST, ComplexExpression, UserDefined Full group rescan per affected group

The group-rescan strategy is correct but expensive for queries with many groups and frequent changes. Users are not warned at creation time that their aggregate choice has O(group_size) refresh cost.

Recommendation: - Emit a WARNING at create_stream_table time when group-rescan aggregates are used in DIFFERENTIAL mode, noting the performance implication. - Expose aggregate strategy classification in explain_st(). - Long-term: implement Welford-style algebraic maintenance for BIT_AND/BIT_OR and BOOL_AND/BOOL_OR (these support algebraic inversion).

12.4 Silent AUTO Mode Downgrades

When refresh_mode = 'AUTO' (default), the system silently downgrades from DIFFERENTIAL to FULL in 6+ scenarios:

  1. Window functions in expressions (EC-03 lift)
  2. Unsupported DVM constructs detected
  3. DVM parser failure
  4. Materialized views / foreign tables as sources
  5. Nested window rewrite fallback
  6. Non-deterministic expression detected (for some categories)

These downgrades emit pgrx::info!() messages which are at LOG level and typically invisible to interactive users.

Impact: Users experience unexplained performance regressions. The “effective” refresh mode is not stored in the catalog — only the requested mode is visible.

Recommendation: - Store effective_refresh_mode in pgt_stream_tables catalog. - Add explain_refresh_mode(name) SQL function that returns a human-readable explanation of why the current mode was selected. - Upgrade mode-downgrade messages from info!() to warning!().


13. Deep Gap Analysis — Safety & Code Quality

Added 2026-03-24. Findings from audit of src/ for panics, unwraps, code quality, and architectural concerns.

13.1 Production Panic Sites

File Line Code Risk
scheduler.rs 2358 unwrap_or_else(|_| panic!("check_skip_needed: SPI select failed")) Real — SPI failure during scheduler tick crashes the background worker
refresh.rs 3051, 3068 panic!("expected InvalidArgument, got {other:?}") Test-only (#[cfg(test)]) — no production risk

Assessment: Only 1 real production panic path confirmed. However, the broader audit found 65 unwrap()/expect() calls across core modules (scheduler.rs, hooks.rs, cdc.rs, catalog.rs, refresh.rs, api.rs). Most are in test code, but several are in production paths protected only by PostgreSQL invariant assumptions (e.g., “FROM list always has ≥1 item”).

13.2 Unwrap Calls in Parser (Production Risk)

Location Code Justification
parser.rs:7123 func_list.head().unwrap() PG guarantees non-empty FuncCall list
parser.rs:9048 from_list.head().unwrap() FROM always has ≥1 item
parser.rs:9389 target_list.head().unwrap() SELECT always has ≥1 target
parser.rs:14324 .expect("at least one predicate") BoolExpr flattening assumes ≥1 arg

These are justified by PostgreSQL parser guarantees but lack // SAFETY: comments. If PostgreSQL internals change (e.g., PG 19 parse tree changes), these could become panics without any compile-time or test-time warning.

Recommendation: - Add // INVARIANT: PostgreSQL parser guarantees <reason> comments to each. - Convert to .ok_or(PgTrickleError::InternalError(...)) for defense-in-depth, especially before PG 19 backward-compatibility work begins.

13.3 println! in Production Code

File Line Issue
refresh.rs 2269 println!("MERGE SQL TEMPLATE:\n{}", merge_template) — unguarded debug output

This prints the full MERGE SQL template to stdout on every refresh cycle. In a production PostgreSQL deployment, this goes to the server log at an uncontrolled level.

Recommendation: Replace with pgrx::log!() guarded by a pg_trickle.log_merge_sql GUC (default false). Effort: ~30 min.

13.4 Parser File Complexity

src/dvm/parser.rs is 19,674 lines — by far the largest file in the project (25% of total source). Key concerns:

  • Three separate from_item_to_sql() variants (lines ~7970, ~8914, ~12449) with overlapping logic.
  • Deep parse tree recursion (5–7 levels common) making stack overflow theoretically possible for pathological queries.
  • 664 unsafe blocks (95% of all unsafe in the project), all wrapping pg_sys FFI calls with proper SAFETY comments.
  • Limited macro extraction for visitor patterns — repetitive match arms for node type dispatch.

Recommendation: Not urgent for correctness, but refactoring into sub-modules (by SQL construct: joins, aggregates, subqueries, CTEs, window functions) would improve maintainability and make the PG 19 compatibility work (BC2) significantly easier.

13.5 Error Message Quality

Error messages are technically accurate but not actionable:

Current Message Problem
"unsupported operator for DIFFERENTIAL mode: {0}" No suggested fix or alternative
"cycle detected in dependency graph: A → B → C" No guidance on how to break the cycle
"upstream table schema changed: OID {0}" OID is meaningless to users
"query parse error: {0}" No indication of which part of the query failed

Recommendation: Add “HINT” text to error messages at the API boundary. PostgreSQL supports structured error messages with DETAIL and HINT fields. Use ereport!(ERROR, ... HINT: "...") instead of raw error!(). Example: ERROR: unsupported operator for DIFFERENTIAL mode: TABLESAMPLE HINT: TABLESAMPLE produces non-deterministic results. Use refresh_mode => 'FULL' or remove the TABLESAMPLE clause.

13.6 Memory Ordering

Shared memory atomics in src/shmem.rs use correct ordering: - Relaxed for monotonic counters (CACHE_GENERATION) — correct - AcqRel/Acquire for CAS operations (worker tokens) — correct

No issues found.


14. Deep Gap Analysis — Performance Opportunities

Added 2026-03-24.

14.1 CDC Write-Side Overhead (Unmeasured)

The write-side cost of CDC triggers has never been benchmarked. plans/performance/PLAN_TRIGGERS_OVERHEAD.md defines a 5-scenario benchmark plan but has no status and is not referenced in the roadmap.

Impact: Without data, we cannot make informed decisions about: - Whether change_buffer_unlogged GUC is worth implementing - Whether statement-level triggers are sufficient for high-throughput OLTP - Whether the covering index strategy is optimal

Recommendation: Implement the trigger overhead benchmark as BENCH-W1/W2 (tracked in v0.12.0 roadmap).

14.2 IMMEDIATE Mode Performance

IMMEDIATE mode (transactional IVM) has several deferred Phase 4 optimizations:

Item Description Impact Status
ENR-based transition tables Replace PL/pgSQL temp table copy with Ephemeral Named Relations Eliminates CREATE/DROP TEMP TABLE overhead per trigger invocation Not started (requires unsafe pg_sys ENR APIs)
C-level trigger functions Replace PL/pgSQL wrapper with C/Rust function Eliminates PL/pgSQL interpreter overhead Not started (very high complexity)
Aggregate fast-path Single UPDATE for invertible aggregates instead of full delta SQL O(1) per-DML for simple aggregates Not started
Prepared statement reuse Keep SPI prepared statements across invocations in same transaction Eliminates re-parse per trigger call Not started

These are tracked as A2 in the post-1.0 roadmap. For IMMEDIATE-heavy workloads, the PL/pgSQL wrapper is the dominant cost.

14.3 Group-Rescan Aggregate Cost

25 aggregate functions use group-rescan (see §12.3). For queries like: sql SELECT department, STRING_AGG(name, ', ' ORDER BY name) FROM employees GROUP BY department every change to employees triggers a full re-aggregation of the affected department group. With 1000 employees per department, this is 1000× more expensive than an algebraic SUM.

Quick wins: - BIT_AND/BIT_OR/BIT_XOR — these are algebraically invertible. BIT_AND requires auxiliary column tracking the bitwise AND of all values; removal of a value requires full rescan only if the removed value was the sole contributor to a 0 bit. Due to the complexity of tracking per-bit contributors, these are correctly classified as group-rescan for now. - BOOL_AND/BOOL_OR — similar to BIT_AND/BIT_OR but on single bits. Algebraic maintenance would require tracking the count of TRUE values per group. Feasible but low priority.

14.4 MERGE Template Deduplication Overhead

When multiple delta rows target the same ST __pgt_row_id, the entire delta must be materialized and deduplicated before MERGE (§12.3). This prevents PostgreSQL from streaming the delta directly into the MERGE executor.

For workloads with rapid UPDATE→UPDATE sequences (e.g., counters, status fields), this materialization can dominate refresh time.

Recommendation: Profile the frequency of delta deduplication across real-world workloads. If ≥10% of refreshes need dedup, investigate a pre-MERGE compaction pass in the change buffer itself (cancel out INSERT→DELETE pairs before delta SQL generation — partially implemented as B3-1 compaction but not integrated with MERGE dedup).

14.5 Thread-Local Cache Cold-Start

Delta SQL templates and MERGE plans are cached per backend session via thread_local!. Cross-session invalidation uses a shared atomic counter (CACHE_GENERATION), but every new connection pays a cold-cache penalty of ~45ms on first refresh.

For connection-pooler deployments (PgBouncer) where connections are reused across many sessions, this is amortized. But for serverless or short-connection workloads, the cold-start cost is significant.

Recommendation: Investigate shared-memory template caching using PostgreSQL’s shared buffer mechanism. This would eliminate per-connection cold-start entirely but requires careful invalidation management.


15. Deep Gap Analysis — Ergonomics & API Design

Added 2026-03-24.

15.1 Function Parameter Overload

create_stream_table() has 10 parameters with 7 optional:

SELECT pgtrickle.create_stream_table(
  name, query,
  schedule?, refresh_mode?, initialize?,
  diamond_consistency?, diamond_schedule_policy?,
  cdc_mode?, append_only?, pooler_compatibility_mode?
);

Related parameters (e.g., diamond_consistency and diamond_schedule_policy) are separated without logical grouping. Discovery of advanced options like pooler_compatibility_mode requires reading the full SQL reference.

Recommendation: - Provide configuration profiles: create_stream_table(name, query, profile => 'low_latency') that set multiple GUCs at once. - Document the “80% use case” prominently: most users only need (name, query) or (name, query, schedule).

15.2 Invisible Query Rewrite Pipeline

The 8-pass query rewrite pipeline transforms the user’s query before DVM parsing:

  1. View inlining (fixpoint)
  2. Nested window expression lifting
  3. DISTINCT ON → ROW_NUMBER rewrite
  4. GROUPING SETS/CUBE/ROLLUP expansion
  5. Scalar subquery decorrelation (WHERE)
  6. Correlated scalar decorrelation (SELECT)
  7. SubLink-in-OR rewriting
  8. ROWS FROM transformation

Users never see the rewritten query, making debugging impossible when the DVM engine produces unexpected results or falls back to FULL mode.

Recommendation: Add explain_query_rewrite(query TEXT) SQL function that returns the rewritten query and a log of which passes were applied. This is pure diagnostics — no production impact.

15.3 Missing Convenience Functions

Function Use Case Priority
explain_refresh_mode(name) Show why DIFFERENTIAL or FULL was chosen High
explain_query_rewrite(query) Show query after rewrite pipeline High
diagnose_errors(name) Show last N errors + root cause + suggested fixes High
list_auxiliary_columns(name) Show hidden __pgt_* columns and their purpose Medium
validate_query(query) Dry-run validation without creating the ST Medium
explain_refresh_cost(name) Estimate refresh cost based on change buffer size Low
bulk_create(definitions JSONB) Create multiple STs in one call with DAG ordering Low
export_definition(name) Export ST definition as reproducible SQL Low

15.4 Error Recovery Experience

When a stream table enters ERROR status (consecutive_errors ≥ 3): - pgt_status() shows status = 'ERROR' but no root cause - Must manually ALTER STREAM TABLE ... status => 'ACTIVE' to retry - No indication of whether the root cause is fixed

Recommendation: The diagnose_errors(name) function should return: - Last 5 error messages from pgt_refresh_history - Error classification (user/schema/system) - Whether the error is retryable - Suggested fix for each error - Whether the root cause has been resolved (e.g., dropped table restored)

15.5 Configuration Interaction Effects

23 GUCs with undocumented interaction effects:

Interaction Effect Documentation
scheduler_interval_ms + min_schedule_seconds + default_schedule_seconds Three separate timers that determine actual refresh frequency; unclear which takes precedence Not documented
differential_max_change_ratio + merge_seqscan_threshold + auto_backoff + merge_planner_hints All affect DIFFERENTIAL performance; users don’t know which to tune Not documented as a group
cdc_mode='wal' + refresh_mode='IMMEDIATE' Invalid combination; only caught at runtime Not documented
append_only=true + DELETE on source Silently reverts to false with no warning Not documented
diamond_schedule_policy + diamond_consistency='off' Policy is ignored when consistency is off Not obvious

Recommendation: Create a GUC interaction matrix in CONFIGURATION.md. Provide “tuning profiles” for common scenarios: low-latency, high-throughput, resource-constrained.

15.6 Auxiliary Column Visibility

Stream tables contain hidden __pgt_* columns that surprise users:

Column Purpose When Present
__pgt_row_id Row identity for MERGE Always
__pgt_count Multiplicity for DISTINCT When DISTINCT present
__pgt_count_l, __pgt_count_r Join multiplicity When FULL JOIN present
__pgt_union_dedup_count UNION deduplication When UNION (not UNION ALL) present
__pgt_aux_count_* AVG auxiliary counter When AVG present
__pgt_aux_sum_* STDDEV/VAR auxiliary When STDDEV/VAR present
__pgt_aux_sum2_* STDDEV/VAR sum-of-squares When STDDEV/VAR present
__pgt_aux_nonnull_* NULL transition counter When SUM over FULL JOIN
__pgt_aux_sum[xy]_* COVAR/REGR auxiliary When COVAR/REGR present

Users writing SELECT * get unexpected columns. The FAQ mentions this briefly but the SQL reference doesn’t document them.

Recommendation: - Document in SQL_REFERENCE.md under its own section. - Add list_auxiliary_columns(name) function. - Consider a view layer (pgtrickle.user_view_<name>) that excludes __pgt_* columns, or add pg_attribute marking to hide them from SELECT * (using attisdropped-adjacent mechanism).


16. Deep Gap Analysis — Documentation & Onboarding

Added 2026-03-24.

16.1 Getting Started Complexity

The GETTING_STARTED.md guide jumps directly to a 3-layer recursive department tree with CTEs, multi-table joins, and cascading refreshes. Most users' first stream table will be a simple aggregation.

Recommendation: Restructure as progressive complexity: 1. Hello World — single-table aggregation (GROUP BY + SUM) 2. Two-table join — orders + customers 3. Schedulingcalculated vs cron vs duration 4. Monitoringpgt_status(), explain_st(), dependency_tree() 5. Multi-layer — chained STs, diamond patterns 6. Advanced — recursive CTEs, IMMEDIATE mode

16.2 Missing Operator Support Matrix

No single document answers “Does my query support DIFFERENTIAL mode?” Users must cross-reference SQL_REFERENCE.md, DVM_OPERATORS.md, and FAQ.md.

Recommendation: Add to DVM_OPERATORS.md a support matrix:

SQL Feature DIFFERENTIAL FULL IMMEDIATE Notes
Inner JOIN
LEFT JOIN
FULL OUTER JOIN SUM/AVG may trigger group-rescan
Semi/Anti JOIN
LATERAL
Window functions ⚠️ In expressions → FULL fallback
Recursive CTE Non-monotone rejected
VALUES in FROM Rejected
TABLESAMPLE Non-deterministic
Nested aggregates Requires subquery rewrite

16.3 Missing Best-Practice Patterns

No documentation covers common data architecture patterns:

Pattern Description Status
Bronze → Silver → Gold Multi-layer ST pipeline for data warehouse Not documented
Event sourcing Base table as event log → ST as current state Not documented
Slowly Changing Dims Type-1/Type-2 SCD patterns with STs Not documented
Fan-out (one → many) One source feeding multiple STs Briefly mentioned
Diamond convergence A → B,C → D pattern Documented in ARCHITECTURE
Hot/warm/cold tiering Different refresh rates by access frequency GUC exists, pattern not documented

16.4 Monitoring Function Discovery

GETTING_STARTED.md ends with pgt_status() but doesn’t introduce the 15+ monitoring functions available:

Function Purpose Discoverability
pgt_status() Current status ✅ In GETTING_STARTED
explain_st(name) Deep inspect ❌ Only in SQL_REFERENCE
dependency_tree(name) DAG visualization ❌ Only in SQL_REFERENCE
check_cdc_health() CDC status ❌ Only in SQL_REFERENCE
st_refresh_stats(name) Refresh timing ❌ Only in SQL_REFERENCE
get_refresh_history(name) History ❌ Only in SQL_REFERENCE
health_check() Overall health ❌ Only in SQL_REFERENCE
change_buffer_sizes() Buffer stats ❌ Only in SQL_REFERENCE
list_sources(name) Source tables ❌ Only in SQL_REFERENCE
trigger_inventory() CDC triggers ❌ Only in SQL_REFERENCE
worker_pool_status() Parallel workers ❌ Only in SQL_REFERENCE
refresh_timeline(name) Refresh history timeline ❌ Only in SQL_REFERENCE
slot_health() WAL slot health ❌ Only in SQL_REFERENCE
watermark_status() Watermark state ❌ Only in SQL_REFERENCE
st_auto_threshold(name) Adaptive threshold ❌ Only in SQL_REFERENCE

Recommendation: Add a “Monitoring Quick Reference” section to GETTING_STARTED.md with the 5 most useful functions and links to the full reference.


17. Deep Gap Analysis — Testing & Verification

Added 2026-03-24.

17.1 Test Inventory

Tier Count Focus
Unit tests 1,194 SQL parsing, OpTree, diff operators, helpers
Integration tests 101 Bare PG catalog schema behavior
E2E tests 340+ Full extension (CDC, refresh, background workers)
Total 1,635+

Source: 78,539 lines Rust source / 59,870 lines Rust tests (test:source ratio = 0.76:1).

17.2 Coverage Ceiling

Structural code coverage is estimated at ~63%. The ceiling is caused by: - Error handling paths rarely triggered in tests - Platform-specific code (#[cfg(target_os)]) - Emergency fallback paths - Logging-only code branches

17.3 Missing Test Coverage Areas

Area Gap Priority Effort
External SQL corpora No sqllogictest, JOB, or Nexmark integration High 1–2 weeks
Differential fuzzing No SQLancer or property-based SQL generation High 1–2 weeks
Property tests (items 5+6) Topology stress + DAG helper properties not started Medium 6–10 days
Write-side benchmarks CDC trigger overhead never measured Medium 3–5 days
IMMEDIATE mode stress No high-concurrency IMMEDIATE mode test (100+ concurrent DML) Medium 2–3 days
PG version matrix Only PG 18 tested; PG 16/17 untested Medium 2–3 weeks (with CI matrix)
Multi-database No test for scheduler across multiple databases Low 1–2 days
Long-running stability No soak test (24h+ continuous refresh) Low 1–2 days

17.4 Ignored Test Suites

File Ignored Count Reason
e2e_tpch_tests.rs 11 TPC-H benchmarks (not regression); run separately
e2e_bench_tests.rs 16 Performance microbenchmarks; run separately
e2e_upgrade_tests.rs 8 Requires separate Docker upgrade image build

All #[ignore] annotations are intentional. No accidentally-ignored tests found.

17.5 Testing Infrastructure Strengths

  • Testcontainers-based E2E with custom Docker images
  • Property-based testing with proptest (4/7 planned items done)
  • TPC-H validation at SF-0.01 and SF-0.1
  • Criterion.rs micro-benchmarks with Bencher.dev regression detection
  • Light E2E tier using cargo pgrx package for fast PR validation
  • 6-tier test pyramid (unit → integration → light-E2E → full-E2E → TPC-H → dbt)

17.6 Verification Gaps

Gap Risk Recommendation
No offline MERGE SQL validation Generated SQL could be syntactically invalid for edge-case column names Add a EXPLAIN (COSTS OFF) dry-run for generated MERGE templates during E2E tests
No stack-depth limit for parser recursion Pathological queries could cause stack overflow in the 19.7K-line parser Add a depth counter to parse tree visitors; reject queries exceeding limit
No regression test for EC-01 boundary ≥3-scan right subtrees are documented as limited but no test asserts the boundary Add a negative test that demonstrates the known limitation with a comment explaining the boundary

Appendix D: Codebase Metrics

Metric Value
Source code (Rust) 78,539 lines
Test code (Rust) 59,870 lines
Test:source ratio 0.76:1
Largest file parser.rs (19,674 lines, 25% of source)
Total unsafe blocks 695 (664 in parser.rs alone)
Total pg_extern functions 43 (28 in api.rs, 15 in monitor.rs)
Total GUCs 23
DVM operators 21
Aggregate functions 60+
Edge cases catalogued 36 (EC-01 through EC-36)
Plans in plans/ 50+ files
Plans complete ~30
Plans in progress ~5
Plans proposed/deferred ~15