pg_trickle — Overall Project Status & Performance Report

pg_trickle — Overall Project Status & Performance Report

Date: 2026-03-24
Version: 0.10.0 (updated with deep gap analysis)
Author: Deep analysis of architecture, configuration, code paths, benchmarks, and plans

Executive Summary

pg_trickle is a PostgreSQL 18 extension implementing streaming tables with incremental view maintenance (IVM) based on differential dataflow (DBSP). At v0.10.0, the project has reached a high level of maturity with 21 DVM operators, hybrid CDC, parallel refresh, and broad SQL coverage including recursive CTEs, window functions, LATERAL joins, and set operations.

Key findings:

Dimension	Status	Rating
Correctness	36 edge cases catalogued; all P0 resolved; EC-01 ≥3-scan boundary remains	★★★★☆
Performance	7–42× speedup at 1% change rate; MERGE is 70–97% of time	★★★★☆
Defaults	Conservative and safe; advanced features opt-in	★★★★★
Scalability	Parallel refresh implemented but off by default	★★★☆☆
Safety	51% unsafe reduction; 1 production panic site; 65 unwrap calls in core modules	★★★☆☆
Ergonomics	Good basics; silent AUTO mode downgrades; no explain functions	★★★☆☆
Observability	NOTIFY alerts, refresh stats, health checks, dependency tree	★★★★☆
SQL Coverage	21 operators, 60+ aggregate functions; VALUES/TABLESAMPLE unsupported	★★★★☆
DVM Engine	19.7K-line parser; 25 group-rescan aggregates; no offline SQL validation	★★★☆☆
Documentation	Good reference; missing support matrix, interaction guides, patterns	★★★☆☆
Deployment	PgBouncer, CNPG, backup/restore, dbt integration	★★★★☆

Overall assessment: The project is production-ready for typical analytical workloads. The primary performance bottleneck is PostgreSQL’s own MERGE execution (70–97% of refresh time), meaning the Rust-side pipeline is highly optimized. The main opportunities for improvement are in scaling to larger deployments (100+ stream tables), reducing CDC write-side overhead, unlocking the parallel refresh feature by default, and addressing the ergonomics/observability gaps identified in this deep analysis (§12–§17).

Priority note (2026-06): The project’s primary goals now explicitly target maximum performance, low latency, and high throughput with differential refresh as the default mode and full refresh as a last resort. Scalability (★★★☆☆) is the most critical gap relative to this ambition. Items A-2 (Columnar Change Tracking) and D-4 (Shared Change Buffers) have been promoted to v0.12.0 to address this, and WAKE-1 (adaptive scheduler wake) has been pulled forward to v0.11.0. See ROADMAP.md for details.

Architecture & Design Quality
Default Configuration Analysis
Performance Status
CDC Pipeline Analysis
DVM Engine Analysis
Scheduler & Coordination
Safety & Error Handling
Scalability Assessment
End-to-End Latency Analysis
Recommendations for Future Releases
Summary & Prioritized Roadmap
Deep Gap Analysis — DVM Correctness & SQL Coverage
Deep Gap Analysis — Safety & Code Quality
Deep Gap Analysis — Performance Opportunities
Deep Gap Analysis — Ergonomics & API Design
Deep Gap Analysis — Documentation & Onboarding
Deep Gap Analysis — Testing & Verification

1. Architecture & Design Quality

1.1 Component Overview

The system is structured into well-separated layers:

User SQL API (api.rs)
    ↓
Catalog (catalog.rs) ←→ DDL Hooks (hooks.rs)
    ↓
CDC Layer (cdc.rs / wal_decoder.rs)
    ↓ Change buffers (pgtrickle_changes schema)
DVM Engine (dvm/) — Parser → OpTree → Delta SQL
    ↓
Refresh Engine (refresh.rs) — FULL / DIFFERENTIAL / IMMEDIATE
    ↓
Scheduler (scheduler.rs) ←→ DAG (dag.rs) ←→ Shared Memory (shmem.rs)
    ↓
Monitoring (monitor.rs) — Stats, alerts, health checks

1.2 Design Strengths

Single-binary extension: No external dependencies beyond PostgreSQL itself. No message queues, no separate services.
Theoretical foundation: Grounded in DBSP (Budiu et al. 2023), giving formal correctness guarantees for differential computation.
Layered optimization: Each layer has independent optimization passes (CDC column pruning → DVM predicate pushdown → MERGE planner hints → prepared statements).
Adaptive behavior: Automatic fallback from DIFFERENTIAL to FULL when change ratio exceeds threshold; auto-backoff when refresh outpaces schedule.
Crash safety: Frontier-based versioning ensures no data loss; orphaned RUNNING records auto-recovered on scheduler restart.

1.3 Design Concerns

Thread-local caching: Delta SQL templates and MERGE plans cached per backend session via thread_local!. Cross-session invalidation uses a shared atomic counter (CACHE_GENERATION), but cache warm-up cost hits every new connection.
Shared memory ring buffer: 32-slot invalidation ring for DAG changes. Overflow triggers full O(V+E) DAG rebuild. Under heavy DDL (>32 CREATE/ALTER in one tick), this is a cliff.
SPI coupling: All catalog access and MERGE execution happen through SPI, which holds lightweight locks and may conflict with concurrent DDL. SPI blocks are kept short per coding guidelines, but the total SPI time per refresh cycle is significant.

2. Default Configuration Analysis

2.1 Default Values — Safety Assessment

The defaults prioritize safety and predictability over maximum throughput:

GUC	Default	Assessment
`enabled`	`true`	✅ Expected
`cdc_mode`	`'auto'`	✅ Safe — starts with triggers, upgrades to WAL if available
`cdc_trigger_mode`	`'statement'`	✅ Optimal — 50–80% less write overhead than row-level
`scheduler_interval_ms`	`1000`	✅ Balanced — 1s wake cycle; responsive without busy-looping
`min_schedule_seconds`	`1`	✅ After ergonomics fix (was 60)
`default_schedule_seconds`	`1`	✅ Fast default for isolated CALCULATED STs
`differential_max_change_ratio`	`0.15`	✅ Conservative — falls back to FULL at 15% change rate
`cleanup_use_truncate`	`true`	✅ O(1) cleanup vs O(n) DELETE
`use_prepared_statements`	`true`	✅ 1–2ms savings per refresh
`merge_planner_hints`	`true`	✅ Dynamic optimizer hints improve join strategies
`merge_work_mem_mb`	`64`	✅ Reasonable for MERGE sorting
`tick_watermark_enabled`	`true`	✅ Critical for cross-source consistency
`compact_threshold`	`100,000`	✅ Triggers compaction before buffers grow large

2.2 Defaults That Limit Performance (by design)

GUC	Default	Why Off	Impact if Enabled
`parallel_refresh_mode`	`'off'`	Safety — new feature, needs explicit opt-in	2.5–5× throughput for independent ST graphs
`auto_backoff`	`false`	Predictability — users expect fixed schedules	Prevents scheduler overload when refresh > schedule
`tiered_scheduling`	`false`	Simplicity — equal freshness for all STs	Reduces refresh load for rarely-queried STs
`buffer_partitioning`	`'off'`	Overhead — partition management adds CDC cost	Faster cleanup for high-throughput sources
`allow_circular`	`false`	Correctness — cyclic STs have convergence risks	Enables feedback-loop patterns
`block_source_ddl`	`false`	Ergonomics — blocking DDL is surprising	Prevents schema drift breaking STs silently

2.3 Default Configuration Verdict

The defaults are well-chosen. They provide: - ✅ Predictable behavior — no surprises for new users - ✅ Safe operations — conservative fallback thresholds - ✅ Good baseline performance — statement-level triggers, prepared statements, planner hints all enabled by default - ⚠️ Suboptimal for large deployments — parallel refresh off, tiered scheduling off, no auto-backoff

The design follows the principle of “safe by default, fast by configuration.”

3. Performance Status

3.1 Current Benchmark Results

Headline numbers at 100K rows, 1% change rate (typical production):

Scenario	FULL (ms)	INCREMENTAL (ms)	Speedup
scan	300	4.6	65.9×
filter	134	3.4	39.5×
aggregate	17	2.5	6.8×
join	294	18.0	16.3×
join_agg	28	14.6	1.9×

Zero-change latency: < 10ms (target achieved via EXISTS(<buffer>) short-circuit)

3.2 Performance Evolution

The project has gone through 9 optimization phases:

Phase	Key Optimization	Impact
Baseline	—	INCREMENTAL slower than FULL for some cases
P1	JSONB fix, drop unused columns, LSN bounds	2.7× improvement
P2	Delta SQL caching, MERGE template caching	+10–20ms/refresh
P3	LATERAL VALUES for aggregates, PK resolution	Correctness + speed
P4	No-op short-circuit, prepared statements	41.7× peak
P5	Deferred cleanup, warm-cycle tuning	Stability
P6	Statement-level triggers	50–80% write reduction
P7	Parallel refresh infrastructure	2.5–5× throughput (opt-in)
P8–P9	Algebraic aggregates, covering indexes, compaction	20–50% latency reduction

3.3 Where Time Is Spent

Critical path breakdown (100K rows, 1% change rate):

Component	Time	% of Total
Decision phase	< 1ms	< 1%
Delta SQL generation (Rust)	< 1ms	< 1%
MERGE plan + execution (PG)	3–15ms	70–97%
Cleanup (deferred)	< 7.7ms	2–20%
SPI overhead	< 1ms	< 5%

The Rust-side pipeline is essentially negligible. Further Rust optimization yields diminishing returns. The bottleneck is PostgreSQL’s MERGE executor — plan compilation, index lookups, heap writes, and WAL generation.

3.4 Pure Rust Micro-benchmarks

From Criterion.rs (benches/):

Operation	Time
`quote_ident` (simple)	93.9 ns
`col_list` (10 cols)	1.37 µs
`prefixed_col_list` (20 cols)	4.00 µs
`expr_to_sql` (column ref)	26.3 ns
`lsn_gt` comparison	78.9 ns
`frontier_json` serialize (20)	3.76 µs
`dag_build_linear_100`	62.5 µs
`xxh64` (4096 bytes)	384.1 ns
`diff_scan` (20 cols)	47.7 µs
`diff_inner_join`	30.7 µs
`diff_full_pipeline`	54.2 µs

All operations are sub-millisecond. The delta SQL generation pipeline (diff_full_pipeline at 54.2 µs) is 3–4 orders of magnitude faster than MERGE execution.

3.5 Scaling Characteristics

Change Rate	10K rows	100K rows	Trend
1%	1.5–5×	5–66×	Better at scale
10%	1.4–2.2×	1.8–3.3×	Moderate benefit
50%	0.6–1.3×	0.6–1.1×	FULL is better; adaptive fallback kicks in

INCREMENTAL scales super-linearly with table size (because delta stays small while FULL grows linearly). The adaptive fallback threshold at 15% prevents INCREMENTAL from being used when it would be slower.

4. CDC Pipeline Analysis

4.1 Trigger-Based CDC (Default)

Architecture: Row-level or statement-level AFTER triggers capture DML into typed buffer tables (pgtrickle_changes.changes_<oid>).

Statement-level triggers (default since v0.4.0): - Single trigger invocation per DML statement (not per row) - Uses REFERENCING NEW TABLE AS __pgt_new transition tables - 50–80% less write-side overhead than row-level for bulk DML - Slightly higher per-invocation planning cost (amortized over batch size)

Buffer table schema: - lsn — WAL position for ordering - action — I/U/D/T (insert/update/delete/truncate) - pk_hash — xxHash64 of primary key columns - changed_cols — Bitmask for UPDATE (which columns changed) - new_*/old_* — Per-column typed values (no JSONB serialization)

4.2 CDC Optimizations Implemented

Optimization	Version	Impact
Statement-level triggers	v0.4.0	50–80% write overhead reduction
Typed buffer columns (no JSONB)	v0.1.0	Eliminates serialization cost
Single covering index	v0.10.0	20% trigger overhead reduction
Changed columns bitmask	v0.4.0	Skip unchanged columns in UPDATE
Selective CDC capture	v0.9.0	50–80% reduction for wide tables (only capture referenced columns)
Change buffer compaction	v0.10.0	50–90% delta reduction (cancel INSERT→DELETE pairs)
Partitioned buffers	v0.6.0	Faster cleanup via DETACH+DROP
RLS bypass on buffers	v0.5.0	No policy overhead on internal tables

4.3 CDC Write-Side Overhead (Estimated)

Based on trigger overhead analysis planning:

Table Width	DML Type	Est. Overhead/Row	Write Slowdown
Narrow (3 cols)	INSERT	1–3 µs	1.5–2.5×
Narrow (3 cols)	UPDATE	2–5 µs	2.0–3.0×
Medium (8 cols)	UPDATE	3–8 µs	2.5–4.0×
Wide (20 cols)	UPDATE	5–15 µs	3.0–5.0×

The write-side overhead is moderate and acceptable for OLTP workloads. Statement-level triggers reduce this significantly for batch operations.

4.4 WAL-Based CDC (Optional)

Available when wal_level = 'logical'. Uses pgoutput logical replication:

Transition lifecycle: TRIGGER → TRANSITIONING → WAL (with automatic fallback)
Text-based parsing: pgoutput messages parsed positionally (G2.3)
Progressive timeout: 1×/2×/3× timeout with LOG warnings before abort

Current state: Functional but secondary. Trigger-based CDC is preferred for single-transaction atomicity and simpler operations.

4.5 CDC Bottlenecks & Opportunities

Bottleneck	Severity	Mitigation
PL/pgSQL trigger overhead	Medium	Statement-level triggers (implemented)
WAL writes for buffer tables	Medium	UNLOGGED buffers (planned, behind GUC)
pgoutput text parsing (WAL mode)	Low	Positional parsing (implemented)
Column-level capture for `SELECT *`	Low	Falls back to full capture (correct)
Buffer index maintenance on INSERT	Low	Single covering index (implemented)

5. DVM Engine Analysis

5.1 Operator Coverage

21 operators covering virtually all analytical SQL patterns:

Category	Operators	Status
Basic	Scan, Filter, Project	✅ Optimized
Joins	Inner, Left, Right, Full, Semi, Anti	✅ Correct (B3-2 weight aggregation)
Aggregates	GROUP BY with 60+ aggregate functions	✅ Algebraic maintenance for AVG/STDDEV/VAR
Set Operations	UNION ALL, INTERSECT, EXCEPT	✅
Advanced	Window, Distinct, Subquery	✅
CTE	CteScan, RecursiveCte, RecursiveSelfRef	✅ Semi-naive evaluation
Lateral	LateralFunction, LateralSubquery	✅
Scalar Subquery	ScalarSubquery	✅ Decorrelation rewrite

5.2 Query Rewriting Pipeline

Six normalization passes before DVM differentiation:

View inlining — Fixpoint closure (inline views, re-inline if result contains more views)
DISTINCT ON → ROW_NUMBER OVER — Canonical form
GROUPING SETS/CUBE/ROLLUP → UNION ALL — Explosion guard (max_grouping_set_branches GUC, default 64)
Scalar subquery decorrelation — Rewrites correlated subqueries
De Morgan normalization — Multi-pass sublink rewriting
Nested window expression lifting — Canonical window form

5.3 DVM Performance Characteristics

Delta SQL generation: 47–54 µs for typical queries (negligible)
Template caching: Eliminates re-parsing after first execution (~45ms saved per refresh cycle)
CTE delta memoization: Each CTE reference reuses the same delta subquery (no redundant computation)
Predicate pushdown: Filter conditions pushed into change buffer scans (reduces data volume early)

5.4 DVM Correctness Measures

TPC-H validation: All 22 queries verified against known-correct results
Property-based testing: proptest regressions tracked
Weight aggregation: Z-set algebra for diamond flows (replaces incorrect DISTINCT ON)
Semi-naive evaluation: Recursive CTEs use incremental fixpoint
Delete-and-Rederive: Correct deletion for recursive CTEs

6. Scheduler & Coordination

6.1 Architecture

Two-tier background worker model:

Launcher (one per cluster) — discovers databases, spawns schedulers
Per-database scheduler — reads DAG, coordinates refreshes, manages retries

6.2 Refresh Coordination

The scheduler executes a topological refresh cycle:

Build/update DAG from catalog
Detect consistency groups (diamond dependencies → atomic refresh)
For each group in topological order:
- Check schedule staleness
- Determine refresh action (FULL/DIFFERENTIAL/NO_DATA)
- Execute with retry backoff (max 3 consecutive errors → auto-suspend)
Record history, emit NOTIFY alerts

6.3 Parallel Refresh (opt-in)

Implemented in v0.4.0 via dynamic background workers:

Coordinator dispatches independent execution units to workers
Worker tokens: CAS-based allocation with cluster-wide budget (max_dynamic_refresh_workers default 4)
Wave reset: After all in-flight workers complete and any succeeded, re-evaluate upstream counts for cascading refreshes
Diamond protection: Atomic groups always execute serially within a consistency boundary

6.4 Advanced Scheduling Features

Feature	Default	Description
Tiered scheduling	Off	HOT (1×), WARM (2×), COLD (4×), FROZEN (skip)
Auto-backoff	Off	Doubles schedule when refresh > schedule interval
Adaptive threshold	On	Auto-tunes FULL/DIFF cutoff from historical data
Cyclic SCC iteration	Off	Fixed-point convergence for circular dependencies
Crash recovery	On	Auto-marks orphaned RUNNING records as FAILED
Tick watermark	On	Cross-source snapshot consistency

6.5 Scheduler Bottlenecks

Bottleneck	Impact	Current Mitigation
Serialized refresh (default)	Only 1 ST refreshed at a time	Parallel mode available (opt-in)
DAG rebuild on ring overflow	O(V+E) with >32 concurrent DDL	Incremental rebuild fallback
Per-ST catalog lock probes	`FOR UPDATE SKIP LOCKED` per ST	Capped by concurrent_refreshes GUC
History queries for cost model	Last 10+5 records per ST per cycle	SPI queries; infrequent

7. Safety & Error Handling

7.1 Unsafe Code

Before reduction: 1,309 unsafe blocks
After reduction (v0.7.0): 641 unsafe blocks (51% reduction)
Approach: Six well-documented safe abstractions (pg_cstr_to_str, pg_list, cast_node!, parse_query, SubTransaction, function conversions)
All remaining unsafe blocks have // SAFETY: comments per project guidelines

7.2 Error Handling

PgTrickleError enum with classified variants (user, schema, system, internal)
No unwrap() or panic!() in code reachable from SQL
SQLSTATE-based retry classification (F29) — identifies retriable errors (lock timeouts, serialization failures) vs permanent errors
Exponential backoff with jitter for retries
Max consecutive errors (default 3) before auto-suspend with NOTIFY alert

7.3 Concurrency Safety

PgLwLock for shared state (short hold times)
PgAtomic (CAS) for counters (worker tokens, cache generation, DAG version)
Advisory locks for non-blocking compaction attempts
SAVEPOINT for atomic consistency group refresh (rollback on any failure)
FOR UPDATE SKIP LOCKED for concurrent ST refresh coordination

7.4 Data Safety

Frontier-based versioning: Stream tables represent query result at a consistent past timestamp (Delayed View Semantics)
Cross-source tick watermark: All refreshes in a tick capped to a single WAL LSN
Crash recovery: Orphaned RUNNING records auto-detected and marked FAILED
DML guard trigger: Prevents direct INSERT/UPDATE/DELETE on stream tables
Schema change detection: DDL event triggers detect upstream changes and mark STs for reinitialization

8. Scalability Assessment

8.1 Current Scalability Profile

Dimension	Status	Limit
Stream table count	Good	Tested with ~100 STs; DAG operations O(V+E)
Source table size	Good	100K–1M rows benchmarked; MERGE scales with delta size
Change rate	Good	Adaptive fallback prevents degradation at high rates
Column count	Good	Selective CDC + bitmask up to 63 columns
Query complexity	Good	21 operators; TPC-H 22 queries validated
Concurrent databases	Good	Per-database scheduler isolation
Concurrent writers	Moderate	Statement-level triggers reduce contention
Parallel refresh	Opt-in	4 workers default; proven 2.5–5× throughput

8.2 Scalability Limits

Constraint	Current	Concern
Invalidation ring	32 slots	>32 concurrent DDL → full DAG rebuild
Changed columns bitmask	63 columns (BIGINT)	>63 columns loses per-column tracking
TopK LIMIT	1,000 default	Large LIMIT values degrade IMMEDIATE mode
Recursive CTE depth	100 default	Deep recursion may not converge
Fixpoint iterations	100 default	Cyclic STs may not converge
MERGE executor	PG-internal	Cannot optimize beyond planner hints

8.3 Deployment Compatibility

Platform	Status	Notes
PostgreSQL 18	✅	Primary target
PgBouncer	✅ v0.10.0	Pooler compatibility mode
CloudNativePG	✅ v0.1.1	Extension-only Docker images
Backup/restore	✅ v0.8.0	Auto-reconnect infrastructure
dbt	✅ v0.4.0	Materialization + health macros
Logical replication	⚠️	WAL CDC mode; not for read replicas

9. End-to-End Latency Analysis

9.1 Latency Components

For a single stream table refresh cycle:

Source DML committed
    ↓ [CDC trigger overhead: 1–15 µs/row]
Change buffer populated
    ↓ [Scheduler wake: 0–1000ms (scheduler_interval_ms)]
Scheduler tick
    ↓ [Schedule check: < 1ms]
    ↓ [Decision phase: < 1ms]
    ↓ [Delta SQL generation: < 0.1ms (cached)]
Refresh execution
    ↓ [MERGE plan + execute: 3–15ms at 1% change rate]
    ↓ [History logging: < 1ms]
Stream table updated
    ↓ [Deferred cleanup: runs next cycle]

9.2 End-to-End Latency Budget

Component	Best Case	Typical	Worst Case
CDC trigger	< 1ms (bulk)	1–5ms	15ms (wide, row-level)
Scheduler wake delay	0ms	500ms	1000ms
Schedule check	< 1ms	< 1ms	< 1ms
Delta SQL (cached)	< 0.1ms	< 0.1ms	45ms (cold cache)
MERGE execution	3ms	10ms	300ms+ (large delta)
Total	~4ms	~515ms	>1300ms

Dominant latency factor: Scheduler wake interval contributes the most to typical end-to-end latency. For IMMEDIATE mode (IVM), the latency drops to the MERGE execution time alone (~3–15ms), since changes are applied synchronously within the same transaction.

9.3 Throughput Analysis

Metric	Value	Condition
Refreshes/second (serial)	~50–200	1 ST, 1% change rate
Refreshes/second (parallel)	~200–1000	4 workers, independent STs
Zero-change throughput	~500+	`EXISTS` short-circuit, < 2ms
Source write throughput impact	1.5–5× slower	Depends on table width and trigger mode
Max STs per database	~100+ tested	DAG rebuild < 100ms

10. Recommendations for Future Releases

10.1 High Priority — Performance & Throughput

R1: Enable Parallel Refresh by Default (v0.11)

Current: parallel_refresh_mode = 'off'
Recommendation: Change default to 'on'
Impact: 2.5–5× throughput improvement for deployments with independent STs
Risk: Low — the feature is implemented and tested since v0.4.0; six releases of stabilization
Prerequisite: Validate with TPC-H and dbt integration tests under parallel mode

R2: UNLOGGED Change Buffers (v0.11–v0.12)

Current: Change buffers are standard WAL-logged tables
Recommendation: Add GUC unlogged_change_buffers (default 'auto')
Impact: ~30% reduction in CDC write-side WAL volume
Risk: Medium — UNLOGGED tables are lost on crash. Requires frontier reset logic to detect lost buffers and trigger full refresh. This is acceptable because change buffers are transient by nature.
Mitigation: On recovery, detect missing UNLOGGED tables and force FULL refresh for affected STs

R3: Reduce Scheduler Wake Interval for Latency-Sensitive Workloads (v0.11)

Current: scheduler_interval_ms = 1000 (1 second)
Recommendation: Provide a low_latency profile or reduce default to 200ms
Impact: Reduces median end-to-end latency from ~515ms to ~115ms
Risk: Higher CPU usage on scheduler process; mitigate with idle/active detection (skip tick if no pending changes)
Trade-off: Consider adaptive wake interval — 100ms when changes pending, 5000ms when idle

R4: SemiJoin Delta-Key Pre-Filtering (v0.11)

Current: SemiJoin Part 2 rescans full left table
Recommendation: Implement O-1 from TPC-H benchmarking plan
Impact: 15–26× improvement for Q18, Q20, Q21 type queries
Risk: Low — restricted to SemiJoin operator diff, no architectural changes

R5: Cost-Based Refresh Strategy Selection — Done in v0.10.0

Status: ✅ Implemented as item B-4 in v0.10.0. The estimate_cost_based_threshold + compute_adaptive_threshold functions apply a 60/40 blend of ratio-based and history-driven cost model from pgt_refresh_history; cold-start fallback to the fixed GUC threshold.

Recommendation: Integrate historical timing data more aggressively

10.2 Medium Priority — Scalability

R6: Partitioned Stream Tables (v0.11–v0.12)

Current: Stream tables are single unpartitioned tables
Recommendation: Support RANGE/LIST/HASH partitioning for large stream tables (10M+ rows)
Impact: Partition-scoped MERGE reduces lock contention and enables partition pruning at query time
Risk: Medium — partition management adds complexity; need to handle partition creation/detach in refresh cycle

R7: UNLOGGED-to-Logged Buffer Promotion (v0.12)

Current: Buffer partitioning available but off by default
Recommendation: Default buffer_partitioning = 'auto' for sources with high write throughput (detected by change buffer growth rate)
Impact: Faster cleanup (DETACH+DROP vs TRUNCATE) and potential for parallel scans of old partitions
Risk: Low — partition lifecycle management already implemented

R8: Shared Change Buffers (v0.13)

Current: Each source table has its own change buffer
Recommendation: Allow multiple STs watching the same source to share a single change buffer (with per-ST LSN frontier tracking)
Impact: Reduces write amplification for fan-out patterns (one source → many STs)
Risk: Medium — requires careful frontier management and concurrent consumer coordination

R9: Increase Invalidation Ring Capacity (v0.11)

Current: 32-slot ring buffer
Recommendation: Increase to 128 or 256 slots
Impact: Reduces frequency of full DAG rebuilds under heavy DDL
Risk: Minimal — slightly more shared memory; O(ring_size) dedup check is still fast

10.3 Medium Priority — Safety & Ergonomics

R10: Enable Auto-Backoff by Default (v0.11)

Current: auto_backoff = false
Recommendation: Change default to true
Impact: Prevents scheduler overload when refresh consistently exceeds schedule interval; reduces log noise and wasted CPU
Risk: Low — well-tested since v0.4.0; users can still override
Rationale: Auto-backoff is the safe behavior. The current default silently wastes resources when refresh can’t keep up.

R11: Enable Tiered Scheduling by Default (v0.12)

Current: tiered_scheduling = false
Recommendation: Change default to true with sensible tier detection (e.g., based on last-queried timestamp from pg_stat_user_tables)
Impact: Automatically reduces refresh frequency for cold STs; improves overall throughput for hot STs
Risk: Low — users can override per-ST; adds modest complexity to scheduler logic

R12: Block Source DDL by Default (v0.12)

Current: block_source_ddl = false
Recommendation: Change default to true with clear error messages
Impact: Prevents silent schema drift that breaks STs; forces explicit ALTER STREAM TABLE after schema changes
Risk: Medium — may surprise users who expect transparent DDL propagation. Good error messages are essential.

R13: Wider Changed-Column Bitmask (v0.12)

Current: BIGINT bitmask limited to 63 columns
Recommendation: Use array of BIGINTs or bytea for tables with >63 columns
Impact: Column-level tracking for very wide tables
Risk: Low — implementation is straightforward; most tables are <63 columns

10.4 Low Priority — Advanced Optimizations

R14: Async CDC via Custom Output Plugin (v0.14+)

Current: Text-based pgoutput parsing
Recommendation: Custom output plugin with binary-efficient format
Impact: Would eliminate text parsing overhead for WAL-based CDC
Risk: High — requires C-level output plugin development and maintenance; pgoutput is the standard

R15: Multi-Table Delta Batching — Done in v0.10.0

Status: ✅ Implemented as items B3-2 + B3-3 in v0.10.0. Weight aggregation (GROUP BY __pgt_row_id, SUM(weight) HAVING SUM(weight) != 0) correctly replaces the incorrect DISTINCT ON approach. Six diamond-flow property-based tests verify correctness.

Recommendation: Merge delta computation for co-scheduled STs with shared sources

R16: Adaptive Scheduler Wake Interval (v0.12)

Current: Fixed 1000ms wake interval
Recommendation: Event-driven wake using pg_notify from CDC triggers
Impact: Near-zero latency when changes appear; near-zero CPU when idle
Risk: Medium — adds trigger→scheduler signaling path; needs NOTIFY coalescing to avoid thundering herd

R17: MERGE Bypass for Append-Only Patterns — Done in v0.5.0

Status: ✅ Implemented as item A-3a in v0.5.0. The APPEND ONLY declaration on CREATE STREAM TABLE enables a direct INSERT path (no MERGE). A heuristic automatically reverts to full MERGE on first observed DELETE or UPDATE, emitting a WARNING + pgtrickle_alert NOTIFY.

Recommendation: Use direct INSERT for pure append patterns

R18: Index-Aware MERGE Planning — Done in v0.10.0

Status: ✅ Implemented as item A-4 in v0.10.0. SET LOCAL enable_seqscan = off is injected when delta row count is below merge_seqscan_threshold (0.001 × ST size). Covering index auto-created on __pgt_row_id with INCLUDE clause for ≤8-column schemas. Reverts at transaction end via SET LOCAL (not SET).

Recommendation: Hint index usage based on change buffer PK distribution

11. Summary & Prioritized Roadmap

Immediate Wins (v0.11)

ID	Recommendation	Effort	Impact
R1	Parallel refresh on by default	Low	2.5–5× throughput
R4	SemiJoin delta-key pre-filter	Medium	15–26× for affected queries
R9	Larger invalidation ring	Trivial	Prevents full DAG rebuilds
R10	Auto-backoff on by default	Trivial	Prevents scheduler overload

Near-Term (v0.12)

ID	Recommendation	Effort	Impact
R2	UNLOGGED change buffers	Medium	~30% CDC write reduction
R3/R16	Adaptive scheduler wake interval	Medium	4× latency reduction
R6	Partitioned stream tables	High	Unlocks 10M+ row STs
R7	Auto buffer partitioning for high-throughput sources	Low	Faster cleanup
R11	Tiered scheduling on by default	Low	Resource efficiency
R12	Block source DDL by default	Low	Safety
R13	Wider column bitmask (>63 columns)	Low	Wide table support

Medium-Term (v0.13–v0.14)

ID	Recommendation	Effort	Impact
R8	Shared change buffers	Medium	Fan-out efficiency

Long-Term (v1.0+)

ID	Recommendation	Effort	Impact
R14	Custom WAL output plugin	Very High	Optimal CDC throughput
—	Distributed stream tables	Very High	Multi-node scalability
—	External orchestrator	High	100+ ST deployments

Appendix A: Version History (Performance Milestones)

Version	Date	Performance Milestone
v0.1.0	2026-02-26	Baseline DVM engine, trigger CDC
v0.1.3	2026-03-02	JSONB fix, delta SQL caching (2.7× improvement)
v0.2.0	2026-03-04	TopK, diamond consistency, IMMEDIATE mode
v0.3.0	2026-03-11	Correctness fixes, HAVING, FULL OUTER JOIN
v0.4.0	2026-03-12	Parallel refresh, statement-level triggers, prepared statements
v0.5.0	2026-03-13	Append-only fast path, source gating
v0.6.0	2026-03-14	Partitioned tables, idempotent DDL
v0.7.0	2026-03-15	Watermarks, circular deps, xxHash for wide tables
v0.8.0	2026-03-17	Backup/restore, reliability hardening
v0.9.0	2026-03-20	Algebraic aggregates (O(1) AVG/STDDEV/VAR), predicate pushdown
v0.10.0	2026-03-23	Covering indexes (20–50%), compaction (50–90%), PgBouncer support

Appendix B: Benchmark Matrix

5 scenarios × 3 change rates × 2 table sizes = 30 combinations

Scenarios: scan, filter, aggregate, join, join_agg
Change rates: 1%, 10%, 50%
Table sizes: 10K, 100K rows

Additional benchmarks: - TPC-H (22 queries at SF-0.01 and SF-0.1) - Trigger overhead (narrow/medium/wide × INSERT/UPDATE/DELETE) - Criterion.rs micro-benchmarks (DVM operators, utility functions) - Continuous regression detection via Bencher.dev

Appendix C: GUC Quick Reference (Performance-Relevant)

GUC	Default	Tuning Guidance
`scheduler_interval_ms`	1000	Lower for fresher data; higher for less CPU
`differential_max_change_ratio`	0.15	Lower = more FULL refreshes; higher = more DIFF
`merge_work_mem_mb`	64	Increase for large deltas (sorting)
`merge_seqscan_threshold`	0.001	Disables seqscan when delta is tiny fraction of ST
`parallel_refresh_mode`	off	Set to ‘on’ for independent ST graphs
`max_dynamic_refresh_workers`	4	Match to available CPU cores
`auto_backoff`	false	Enable if refresh often exceeds schedule
`tiered_scheduling`	false	Enable for large ST counts with mixed access patterns
`compact_threshold`	100000	Lower for faster buffers; 0 to disable
`buffer_partitioning`	off	Enable for high-throughput sources
`cleanup_use_truncate`	true	Keep true unless using change history

12. Deep Gap Analysis — DVM Correctness & SQL Coverage

Added 2026-03-24. Findings from deep source-level audit of src/dvm/, src/refresh.rs, and src/api.rs.

12.1 Known Correctness Boundaries

G12-1: EC-01 Phantom-Row-After-DELETE for ≥3-Scan Right Subtrees

Severity: Medium (silent incorrect result) File: src/dvm/operators/join_common.rs:1052–1088

The EC-01 fix (R₀ via EXCEPT ALL) is gated at join_scan_count(child) <= 2. Join subtrees with ≥3 scan nodes on the right side — TPC-H Q7, Q8, Q9 all qualify — retain the original phantom-row-after-DELETE bug: when both the left and right rows of a join are deleted in the same batch, the left-side DELETE can be silently dropped because R₁ (post-change right) no longer contains the partner row.

The threshold exists because EXCEPT ALL materializes the full right-side snapshot for wide join trees, causing PostgreSQL to spill multi-GB temp files. The NOT MATERIALIZED CTE hint partially mitigates this but doesn’t fully solve the cascading materialization problem for deep join trees.

Impact: Incorrect results for multi-way joins under simultaneous cross-table DELETE. Affects any join chain where the right subtree has ≥3 leaf scans.

Recommendation: Design a per-subtree CTE-based snapshot strategy to replace EXCEPT ALL for deep join trees. Track as EC01B (v0.12.0).

G12-2: TopK Refresh Assumption Violation

File: src/refresh.rs:1144–1200

TopK refresh assumes LIMIT without OFFSET or correlated WHERE conditions. If violated, the TopK micro-refresh may silently produce wrong results. No explicit runtime check validates this assumption.

Impact: Silent correctness bug if TopK path is reached with unsupported LIMIT patterns.

Recommendation: Add an assertion guard that validates TopK eligibility at refresh time, not just at creation time. Fall back to FULL refresh if assumptions are violated. Low effort (~2h).

G12-3: MERGE Duplicate-Row Error on Concurrent Changes

File: src/refresh.rs:738–750

When multiple delta rows target the same ST row (e.g., rapid INSERT→UPDATE→DELETE within one scheduler tick), PostgreSQL rejects with “MERGE command cannot affect row a second time.” The current fix materializes a deduplicated CTE before MERGE.

Impact: Forces materialization of the entire delta, defeating streaming optimization. For large deltas (100K+ rows), this adds measurable latency.

Recommendation: Investigate a two-pass MERGE approach: first-pass applies non-conflicting rows directly, second-pass handles the deduplicated residual. Alternatively, explore PostgreSQL 18’s MERGE ... ON CONFLICT path if available.

12.2 SQL Support Gaps

SQL Feature	Status	Severity	Notes
VALUES in FROM	Rejected	Low	`Unsupported FROM item (VALUES clause...)` — rare in analytical queries
TABLESAMPLE	Rejected	Low	Non-deterministic by nature; documented rejection
Nested aggregates	Rejected	Medium	`AVG(SUM(...))` — requires subquery rewrite; users can work around
Non-equijoin conditions	Partial	Medium	`>`, `<`, `LIKE` conditions in JOIN ON are not differentiated; may silently produce wrong results or fall back to FULL
NATURAL JOIN	Warned	Low	F38 warning about column drift on ALTER; DIFFERENTIAL still works
VALUES() constructor in expressions	Rejected	Low	`VALUES (1),(2),(3)` as standalone FROM rejected
JSON_TABLE advanced	Partial	Low	Basic structure parsed; complex column definitions may be silently dropped
JSON_ARRAYAGG / JSON constructor ops	Partial	Low	PG 18 SQL/JSON operators may not be fully recognized
Multi-column IN subquery	Unclear	Medium	`EXPR IN (subquery)` where subquery returns multiple columns — status unclear; may produce wrong results

12.3 Aggregate Performance Classification

25 aggregate functions use group-rescan (O(group_size)) rather than algebraic O(1) maintenance:

Category	Functions	Strategy
Algebraic (O(1))	SUM, COUNT, AVG, STDDEV, VAR, COVAR_, REGR_, CORR	Auxiliary columns
Min/Max (rescan)	MIN, MAX	Group-rescan (removing min/max requires full scan)
Group-rescan	STRING_AGG, ARRAY_AGG, JSON[B]AGG, JSON[B]OBJECT_AGG, JSON_OBJECTAGG_STD, JSON_ARRAYAGG_STD, BIT_AND, BIT_OR, BIT_XOR, BOOL_AND, BOOL_OR, XMLAGG, ANY_VALUE, MODE, PERCENTILE_CONT, PERCENTILE_DISC, HYP_RANK, HYP_DENSE_RANK, HYP_PERCENT_RANK, HYP_CUME_DIST, ComplexExpression, UserDefined	Full group rescan per affected group

The group-rescan strategy is correct but expensive for queries with many groups and frequent changes. Users are not warned at creation time that their aggregate choice has O(group_size) refresh cost.

Recommendation: - Emit a WARNING at create_stream_table time when group-rescan aggregates are used in DIFFERENTIAL mode, noting the performance implication. - Expose aggregate strategy classification in explain_st(). - Long-term: implement Welford-style algebraic maintenance for BIT_AND/BIT_OR and BOOL_AND/BOOL_OR (these support algebraic inversion).

12.4 Silent AUTO Mode Downgrades

When refresh_mode = 'AUTO' (default), the system silently downgrades from DIFFERENTIAL to FULL in 6+ scenarios:

Window functions in expressions (EC-03 lift)
Unsupported DVM constructs detected
DVM parser failure
Materialized views / foreign tables as sources
Nested window rewrite fallback
Non-deterministic expression detected (for some categories)

These downgrades emit pgrx::info!() messages which are at LOG level and typically invisible to interactive users.

Impact: Users experience unexplained performance regressions. The “effective” refresh mode is not stored in the catalog — only the requested mode is visible.

Recommendation: - Store effective_refresh_mode in pgt_stream_tables catalog. - Add explain_refresh_mode(name) SQL function that returns a human-readable explanation of why the current mode was selected. - Upgrade mode-downgrade messages from info!() to warning!().

13. Deep Gap Analysis — Safety & Code Quality

Added 2026-03-24. Findings from audit of src/ for panics, unwraps, code quality, and architectural concerns.

13.1 Production Panic Sites

File	Line	Code	Risk
`scheduler.rs`	2358	`unwrap_or_else(\|_\| panic!("check_skip_needed: SPI select failed"))`	Real — SPI failure during scheduler tick crashes the background worker
`refresh.rs`	3051, 3068	`panic!("expected InvalidArgument, got {other:?}")`	Test-only (`#[cfg(test)]`) — no production risk

Assessment: Only 1 real production panic path confirmed. However, the broader audit found 65 unwrap()/expect() calls across core modules (scheduler.rs, hooks.rs, cdc.rs, catalog.rs, refresh.rs, api.rs). Most are in test code, but several are in production paths protected only by PostgreSQL invariant assumptions (e.g., “FROM list always has ≥1 item”).

13.2 Unwrap Calls in Parser (Production Risk)

Location	Code	Justification
`parser.rs:7123`	`func_list.head().unwrap()`	PG guarantees non-empty FuncCall list
`parser.rs:9048`	`from_list.head().unwrap()`	FROM always has ≥1 item
`parser.rs:9389`	`target_list.head().unwrap()`	SELECT always has ≥1 target
`parser.rs:14324`	`.expect("at least one predicate")`	BoolExpr flattening assumes ≥1 arg

These are justified by PostgreSQL parser guarantees but lack // SAFETY: comments. If PostgreSQL internals change (e.g., PG 19 parse tree changes), these could become panics without any compile-time or test-time warning.

Recommendation: - Add // INVARIANT: PostgreSQL parser guarantees <reason> comments to each. - Convert to .ok_or(PgTrickleError::InternalError(...)) for defense-in-depth, especially before PG 19 backward-compatibility work begins.

13.3 println! in Production Code

File	Line	Issue
`refresh.rs`	2269	`println!("MERGE SQL TEMPLATE:\n{}", merge_template)` — unguarded debug output

This prints the full MERGE SQL template to stdout on every refresh cycle. In a production PostgreSQL deployment, this goes to the server log at an uncontrolled level.

Recommendation: Replace with pgrx::log!() guarded by a pg_trickle.log_merge_sql GUC (default false). Effort: ~30 min.

13.4 Parser File Complexity

src/dvm/parser.rs is 19,674 lines — by far the largest file in the project (25% of total source). Key concerns:

Three separate from_item_to_sql() variants (lines ~7970, ~8914, ~12449) with overlapping logic.
Deep parse tree recursion (5–7 levels common) making stack overflow theoretically possible for pathological queries.
664 unsafe blocks (95% of all unsafe in the project), all wrapping pg_sys FFI calls with proper SAFETY comments.
Limited macro extraction for visitor patterns — repetitive match arms for node type dispatch.

Recommendation: Not urgent for correctness, but refactoring into sub-modules (by SQL construct: joins, aggregates, subqueries, CTEs, window functions) would improve maintainability and make the PG 19 compatibility work (BC2) significantly easier.

13.5 Error Message Quality

Error messages are technically accurate but not actionable:

Current Message	Problem
`"unsupported operator for DIFFERENTIAL mode: {0}"`	No suggested fix or alternative
`"cycle detected in dependency graph: A → B → C"`	No guidance on how to break the cycle
`"upstream table schema changed: OID {0}"`	OID is meaningless to users
`"query parse error: {0}"`	No indication of which part of the query failed

Recommendation: Add “HINT” text to error messages at the API boundary. PostgreSQL supports structured error messages with DETAIL and HINT fields. Use ereport!(ERROR, ... HINT: "...") instead of raw error!(). Example: ERROR: unsupported operator for DIFFERENTIAL mode: TABLESAMPLE HINT: TABLESAMPLE produces non-deterministic results. Use refresh_mode => 'FULL' or remove the TABLESAMPLE clause.

13.6 Memory Ordering

Shared memory atomics in src/shmem.rs use correct ordering: - Relaxed for monotonic counters (CACHE_GENERATION) — correct - AcqRel/Acquire for CAS operations (worker tokens) — correct

No issues found.

14. Deep Gap Analysis — Performance Opportunities

Added 2026-03-24.

14.1 CDC Write-Side Overhead (Unmeasured)

The write-side cost of CDC triggers has never been benchmarked. plans/performance/PLAN_TRIGGERS_OVERHEAD.md defines a 5-scenario benchmark plan but has no status and is not referenced in the roadmap.

Impact: Without data, we cannot make informed decisions about: - Whether change_buffer_unlogged GUC is worth implementing - Whether statement-level triggers are sufficient for high-throughput OLTP - Whether the covering index strategy is optimal

Recommendation: Implement the trigger overhead benchmark as BENCH-W1/W2 (tracked in v0.12.0 roadmap).

14.2 IMMEDIATE Mode Performance

IMMEDIATE mode (transactional IVM) has several deferred Phase 4 optimizations:

Item	Description	Impact	Status
ENR-based transition tables	Replace PL/pgSQL temp table copy with Ephemeral Named Relations	Eliminates CREATE/DROP TEMP TABLE overhead per trigger invocation	Not started (requires `unsafe` pg_sys ENR APIs)
C-level trigger functions	Replace PL/pgSQL wrapper with C/Rust function	Eliminates PL/pgSQL interpreter overhead	Not started (very high complexity)
Aggregate fast-path	Single `UPDATE` for invertible aggregates instead of full delta SQL	O(1) per-DML for simple aggregates	Not started
Prepared statement reuse	Keep SPI prepared statements across invocations in same transaction	Eliminates re-parse per trigger call	Not started

These are tracked as A2 in the post-1.0 roadmap. For IMMEDIATE-heavy workloads, the PL/pgSQL wrapper is the dominant cost.

14.3 Group-Rescan Aggregate Cost

25 aggregate functions use group-rescan (see §12.3). For queries like: sql SELECT department, STRING_AGG(name, ', ' ORDER BY name) FROM employees GROUP BY department every change to employees triggers a full re-aggregation of the affected department group. With 1000 employees per department, this is 1000× more expensive than an algebraic SUM.

Quick wins: - BIT_AND/BIT_OR/BIT_XOR — these are algebraically invertible. BIT_AND requires auxiliary column tracking the bitwise AND of all values; removal of a value requires full rescan only if the removed value was the sole contributor to a 0 bit. Due to the complexity of tracking per-bit contributors, these are correctly classified as group-rescan for now. - BOOL_AND/BOOL_OR — similar to BIT_AND/BIT_OR but on single bits. Algebraic maintenance would require tracking the count of TRUE values per group. Feasible but low priority.

14.4 MERGE Template Deduplication Overhead

When multiple delta rows target the same ST __pgt_row_id, the entire delta must be materialized and deduplicated before MERGE (§12.3). This prevents PostgreSQL from streaming the delta directly into the MERGE executor.

For workloads with rapid UPDATE→UPDATE sequences (e.g., counters, status fields), this materialization can dominate refresh time.

Recommendation: Profile the frequency of delta deduplication across real-world workloads. If ≥10% of refreshes need dedup, investigate a pre-MERGE compaction pass in the change buffer itself (cancel out INSERT→DELETE pairs before delta SQL generation — partially implemented as B3-1 compaction but not integrated with MERGE dedup).

14.5 Thread-Local Cache Cold-Start

Delta SQL templates and MERGE plans are cached per backend session via thread_local!. Cross-session invalidation uses a shared atomic counter (CACHE_GENERATION), but every new connection pays a cold-cache penalty of ~45ms on first refresh.

For connection-pooler deployments (PgBouncer) where connections are reused across many sessions, this is amortized. But for serverless or short-connection workloads, the cold-start cost is significant.

Recommendation: Investigate shared-memory template caching using PostgreSQL’s shared buffer mechanism. This would eliminate per-connection cold-start entirely but requires careful invalidation management.

15. Deep Gap Analysis — Ergonomics & API Design

Added 2026-03-24.

15.1 Function Parameter Overload

create_stream_table() has 10 parameters with 7 optional:

SELECT pgtrickle.create_stream_table(
  name, query,
  schedule?, refresh_mode?, initialize?,
  diamond_consistency?, diamond_schedule_policy?,
  cdc_mode?, append_only?, pooler_compatibility_mode?
);

Related parameters (e.g., diamond_consistency and diamond_schedule_policy) are separated without logical grouping. Discovery of advanced options like pooler_compatibility_mode requires reading the full SQL reference.

Recommendation: - Provide configuration profiles: create_stream_table(name, query, profile => 'low_latency') that set multiple GUCs at once. - Document the “80% use case” prominently: most users only need (name, query) or (name, query, schedule).

15.2 Invisible Query Rewrite Pipeline

The 8-pass query rewrite pipeline transforms the user’s query before DVM parsing:

View inlining (fixpoint)
Nested window expression lifting
DISTINCT ON → ROW_NUMBER rewrite
GROUPING SETS/CUBE/ROLLUP expansion
Scalar subquery decorrelation (WHERE)
Correlated scalar decorrelation (SELECT)
SubLink-in-OR rewriting
ROWS FROM transformation

Users never see the rewritten query, making debugging impossible when the DVM engine produces unexpected results or falls back to FULL mode.

Recommendation: Add explain_query_rewrite(query TEXT) SQL function that returns the rewritten query and a log of which passes were applied. This is pure diagnostics — no production impact.

15.3 Missing Convenience Functions

Function	Use Case	Priority
`explain_refresh_mode(name)`	Show why DIFFERENTIAL or FULL was chosen	High
`explain_query_rewrite(query)`	Show query after rewrite pipeline	High
`diagnose_errors(name)`	Show last N errors + root cause + suggested fixes	High
`list_auxiliary_columns(name)`	Show hidden `__pgt_*` columns and their purpose	Medium
`validate_query(query)`	Dry-run validation without creating the ST	Medium
`explain_refresh_cost(name)`	Estimate refresh cost based on change buffer size	Low
`bulk_create(definitions JSONB)`	Create multiple STs in one call with DAG ordering	Low
`export_definition(name)`	Export ST definition as reproducible SQL	Low

15.4 Error Recovery Experience

When a stream table enters ERROR status (consecutive_errors ≥ 3): - pgt_status() shows status = 'ERROR' but no root cause - Must manually ALTER STREAM TABLE ... status => 'ACTIVE' to retry - No indication of whether the root cause is fixed

Recommendation: The diagnose_errors(name) function should return: - Last 5 error messages from pgt_refresh_history - Error classification (user/schema/system) - Whether the error is retryable - Suggested fix for each error - Whether the root cause has been resolved (e.g., dropped table restored)

15.5 Configuration Interaction Effects

23 GUCs with undocumented interaction effects:

Interaction	Effect	Documentation
`scheduler_interval_ms` + `min_schedule_seconds` + `default_schedule_seconds`	Three separate timers that determine actual refresh frequency; unclear which takes precedence	Not documented
`differential_max_change_ratio` + `merge_seqscan_threshold` + `auto_backoff` + `merge_planner_hints`	All affect DIFFERENTIAL performance; users don’t know which to tune	Not documented as a group
`cdc_mode='wal'` + `refresh_mode='IMMEDIATE'`	Invalid combination; only caught at runtime	Not documented
`append_only=true` + DELETE on source	Silently reverts to `false` with no warning	Not documented
`diamond_schedule_policy` + `diamond_consistency='off'`	Policy is ignored when consistency is off	Not obvious

Recommendation: Create a GUC interaction matrix in CONFIGURATION.md. Provide “tuning profiles” for common scenarios: low-latency, high-throughput, resource-constrained.

15.6 Auxiliary Column Visibility

Stream tables contain hidden __pgt_* columns that surprise users:

Column	Purpose	When Present
`__pgt_row_id`	Row identity for MERGE	Always
`__pgt_count`	Multiplicity for DISTINCT	When DISTINCT present
`__pgt_count_l`, `__pgt_count_r`	Join multiplicity	When FULL JOIN present
`__pgt_union_dedup_count`	UNION deduplication	When UNION (not UNION ALL) present
`__pgt_aux_count_*`	AVG auxiliary counter	When AVG present
`__pgt_aux_sum_*`	STDDEV/VAR auxiliary	When STDDEV/VAR present
`__pgt_aux_sum2_*`	STDDEV/VAR sum-of-squares	When STDDEV/VAR present
`__pgt_aux_nonnull_*`	NULL transition counter	When SUM over FULL JOIN
`__pgt_aux_sum[xy]_*`	COVAR/REGR auxiliary	When COVAR/REGR present

Users writing SELECT * get unexpected columns. The FAQ mentions this briefly but the SQL reference doesn’t document them.

Recommendation: - Document in SQL_REFERENCE.md under its own section. - Add list_auxiliary_columns(name) function. - Consider a view layer (pgtrickle.user_view_<name>) that excludes __pgt_* columns, or add pg_attribute marking to hide them from SELECT * (using attisdropped-adjacent mechanism).

16. Deep Gap Analysis — Documentation & Onboarding

Added 2026-03-24.

16.1 Getting Started Complexity

The GETTING_STARTED.md guide jumps directly to a 3-layer recursive department tree with CTEs, multi-table joins, and cascading refreshes. Most users' first stream table will be a simple aggregation.

Recommendation: Restructure as progressive complexity: 1. Hello World — single-table aggregation (GROUP BY + SUM) 2. Two-table join — orders + customers 3. Scheduling — calculated vs cron vs duration 4. Monitoring — pgt_status(), explain_st(), dependency_tree() 5. Multi-layer — chained STs, diamond patterns 6. Advanced — recursive CTEs, IMMEDIATE mode

16.2 Missing Operator Support Matrix

No single document answers “Does my query support DIFFERENTIAL mode?” Users must cross-reference SQL_REFERENCE.md, DVM_OPERATORS.md, and FAQ.md.

Recommendation: Add to DVM_OPERATORS.md a support matrix:

SQL Feature	DIFFERENTIAL	FULL	IMMEDIATE	Notes
Inner JOIN	✅	✅	✅
LEFT JOIN	✅	✅	✅
FULL OUTER JOIN	✅	✅	✅	SUM/AVG may trigger group-rescan
Semi/Anti JOIN	✅	✅	✅
LATERAL	✅	✅	✅
Window functions	⚠️	✅	✅	In expressions → FULL fallback
Recursive CTE	✅	✅	✅	Non-monotone rejected
VALUES in FROM	❌	✅	❌	Rejected
TABLESAMPLE	❌	❌	❌	Non-deterministic
Nested aggregates	❌	✅	❌	Requires subquery rewrite

16.3 Missing Best-Practice Patterns

No documentation covers common data architecture patterns:

Pattern	Description	Status
Bronze → Silver → Gold	Multi-layer ST pipeline for data warehouse	Not documented
Event sourcing	Base table as event log → ST as current state	Not documented
Slowly Changing Dims	Type-1/Type-2 SCD patterns with STs	Not documented
Fan-out (one → many)	One source feeding multiple STs	Briefly mentioned
Diamond convergence	A → B,C → D pattern	Documented in ARCHITECTURE
Hot/warm/cold tiering	Different refresh rates by access frequency	GUC exists, pattern not documented

16.4 Monitoring Function Discovery

GETTING_STARTED.md ends with pgt_status() but doesn’t introduce the 15+ monitoring functions available:

Function	Purpose	Discoverability
`pgt_status()`	Current status	✅ In GETTING_STARTED
`explain_st(name)`	Deep inspect	❌ Only in SQL_REFERENCE
`dependency_tree(name)`	DAG visualization	❌ Only in SQL_REFERENCE
`check_cdc_health()`	CDC status	❌ Only in SQL_REFERENCE
`st_refresh_stats(name)`	Refresh timing	❌ Only in SQL_REFERENCE
`get_refresh_history(name)`	History	❌ Only in SQL_REFERENCE
`health_check()`	Overall health	❌ Only in SQL_REFERENCE
`change_buffer_sizes()`	Buffer stats	❌ Only in SQL_REFERENCE
`list_sources(name)`	Source tables	❌ Only in SQL_REFERENCE
`trigger_inventory()`	CDC triggers	❌ Only in SQL_REFERENCE
`worker_pool_status()`	Parallel workers	❌ Only in SQL_REFERENCE
`refresh_timeline(name)`	Refresh history timeline	❌ Only in SQL_REFERENCE
`slot_health()`	WAL slot health	❌ Only in SQL_REFERENCE
`watermark_status()`	Watermark state	❌ Only in SQL_REFERENCE
`st_auto_threshold(name)`	Adaptive threshold	❌ Only in SQL_REFERENCE

Recommendation: Add a “Monitoring Quick Reference” section to GETTING_STARTED.md with the 5 most useful functions and links to the full reference.

17. Deep Gap Analysis — Testing & Verification

Added 2026-03-24.

17.1 Test Inventory

Tier	Count	Focus
Unit tests	1,194	SQL parsing, OpTree, diff operators, helpers
Integration tests	101	Bare PG catalog schema behavior
E2E tests	340+	Full extension (CDC, refresh, background workers)
Total	1,635+

Source: 78,539 lines Rust source / 59,870 lines Rust tests (test:source ratio = 0.76:1).

17.2 Coverage Ceiling

Structural code coverage is estimated at ~63%. The ceiling is caused by: - Error handling paths rarely triggered in tests - Platform-specific code (#[cfg(target_os)]) - Emergency fallback paths - Logging-only code branches

17.3 Missing Test Coverage Areas

Area	Gap	Priority	Effort
External SQL corpora	No sqllogictest, JOB, or Nexmark integration	High	1–2 weeks
Differential fuzzing	No SQLancer or property-based SQL generation	High	1–2 weeks
Property tests (items 5+6)	Topology stress + DAG helper properties not started	Medium	6–10 days
Write-side benchmarks	CDC trigger overhead never measured	Medium	3–5 days
IMMEDIATE mode stress	No high-concurrency IMMEDIATE mode test (100+ concurrent DML)	Medium	2–3 days
PG version matrix	Only PG 18 tested; PG 16/17 untested	Medium	2–3 weeks (with CI matrix)
Multi-database	No test for scheduler across multiple databases	Low	1–2 days
Long-running stability	No soak test (24h+ continuous refresh)	Low	1–2 days

17.4 Ignored Test Suites

File	Ignored Count	Reason
`e2e_tpch_tests.rs`	11	TPC-H benchmarks (not regression); run separately
`e2e_bench_tests.rs`	16	Performance microbenchmarks; run separately
`e2e_upgrade_tests.rs`	8	Requires separate Docker upgrade image build

All #[ignore] annotations are intentional. No accidentally-ignored tests found.

17.5 Testing Infrastructure Strengths

Testcontainers-based E2E with custom Docker images
Property-based testing with proptest (4/7 planned items done)
TPC-H validation at SF-0.01 and SF-0.1
Criterion.rs micro-benchmarks with Bencher.dev regression detection
Light E2E tier using cargo pgrx package for fast PR validation
6-tier test pyramid (unit → integration → light-E2E → full-E2E → TPC-H → dbt)

17.6 Verification Gaps

Gap	Risk	Recommendation
No offline MERGE SQL validation	Generated SQL could be syntactically invalid for edge-case column names	Add a `EXPLAIN (COSTS OFF)` dry-run for generated MERGE templates during E2E tests
No stack-depth limit for parser recursion	Pathological queries could cause stack overflow in the 19.7K-line parser	Add a depth counter to parse tree visitors; reject queries exceeding limit
No regression test for EC-01 boundary	≥3-scan right subtrees are documented as limited but no test asserts the boundary	Add a negative test that demonstrates the known limitation with a comment explaining the boundary

Appendix D: Codebase Metrics

Metric	Value
Source code (Rust)	78,539 lines
Test code (Rust)	59,870 lines
Test:source ratio	0.76:1
Largest file	`parser.rs` (19,674 lines, 25% of source)
Total `unsafe` blocks	695 (664 in `parser.rs` alone)
Total `pg_extern` functions	43 (28 in `api.rs`, 15 in `monitor.rs`)
Total GUCs	23
DVM operators	21
Aggregate functions	60+
Edge cases catalogued	36 (EC-01 through EC-36)
Plans in `plans/`	50+ files
Plans complete	~30
Plans in progress	~5
Plans proposed/deferred	~15

PGXN

PostgreSQL Extension Network

Contents

pg_trickle — Overall Project Status & Performance Report

Executive Summary

Table of Contents

1. Architecture & Design Quality

1.1 Component Overview

1.2 Design Strengths

1.3 Design Concerns

2. Default Configuration Analysis

2.1 Default Values — Safety Assessment

2.2 Defaults That Limit Performance (by design)

2.3 Default Configuration Verdict

3. Performance Status

3.1 Current Benchmark Results

3.2 Performance Evolution

3.3 Where Time Is Spent

3.4 Pure Rust Micro-benchmarks

3.5 Scaling Characteristics

4. CDC Pipeline Analysis

4.1 Trigger-Based CDC (Default)

4.2 CDC Optimizations Implemented

4.3 CDC Write-Side Overhead (Estimated)

4.4 WAL-Based CDC (Optional)

4.5 CDC Bottlenecks & Opportunities

5. DVM Engine Analysis

5.1 Operator Coverage

5.2 Query Rewriting Pipeline

5.3 DVM Performance Characteristics

5.4 DVM Correctness Measures

6. Scheduler & Coordination

6.1 Architecture

6.2 Refresh Coordination

6.3 Parallel Refresh (opt-in)

6.4 Advanced Scheduling Features

6.5 Scheduler Bottlenecks

7. Safety & Error Handling

7.1 Unsafe Code

7.2 Error Handling

7.3 Concurrency Safety

7.4 Data Safety

8. Scalability Assessment

8.1 Current Scalability Profile

8.2 Scalability Limits

8.3 Deployment Compatibility

9. End-to-End Latency Analysis

9.1 Latency Components

9.2 End-to-End Latency Budget

9.3 Throughput Analysis

10. Recommendations for Future Releases

10.1 High Priority — Performance & Throughput

R1: Enable Parallel Refresh by Default (v0.11)

R2: UNLOGGED Change Buffers (v0.11–v0.12)

R3: Reduce Scheduler Wake Interval for Latency-Sensitive Workloads (v0.11)

R4: SemiJoin Delta-Key Pre-Filtering (v0.11)

R5: Cost-Based Refresh Strategy Selection — Done in v0.10.0

10.2 Medium Priority — Scalability

R6: Partitioned Stream Tables (v0.11–v0.12)

R7: UNLOGGED-to-Logged Buffer Promotion (v0.12)

R8: Shared Change Buffers (v0.13)

R9: Increase Invalidation Ring Capacity (v0.11)

10.3 Medium Priority — Safety & Ergonomics

R10: Enable Auto-Backoff by Default (v0.11)

R11: Enable Tiered Scheduling by Default (v0.12)

R12: Block Source DDL by Default (v0.12)

R13: Wider Changed-Column Bitmask (v0.12)

10.4 Low Priority — Advanced Optimizations

R14: Async CDC via Custom Output Plugin (v0.14+)

R15: Multi-Table Delta Batching — Done in v0.10.0

R16: Adaptive Scheduler Wake Interval (v0.12)

R17: MERGE Bypass for Append-Only Patterns — Done in v0.5.0

R18: Index-Aware MERGE Planning — Done in v0.10.0

11. Summary & Prioritized Roadmap

Immediate Wins (v0.11)

Near-Term (v0.12)

Medium-Term (v0.13–v0.14)

Long-Term (v1.0+)

Appendix A: Version History (Performance Milestones)

Appendix B: Benchmark Matrix