Gap Analysis: pg_trickle vs. Epsio — Core SQL IVM Engine (PostgreSQL Features Only)

Gap Analysis: pg_trickle vs. Epsio — Core SQL IVM Engine (PostgreSQL Features Only)

Date: 2026-02-28 pg_trickle version: 0.1.2 (PostgreSQL 18 extension, Rust/pgrx) Epsio version: latest (commercial streaming SQL engine, Rust-based, closed-source) Scope: Core SQL incremental view maintenance engine, limited to SQL features available in PostgreSQL. Excludes connectors, deployment, multi-database support (MySQL/MSSQL/Snowflake/BigQuery), operational tooling, monitoring integrations, and non-IVM features.

Executive Summary

Both pg_trickle and Epsio implement incremental view maintenance (IVM) for PostgreSQL. pg_trickle is an in-database PG extension; Epsio is an external sidecar engine that reads via logical replication and writes results back to PG tables. Both build dataflow graphs from SQL queries and stream diffs through stateful operators.

This analysis compares only the core IVM engine for SQL constructs that exist in PostgreSQL’s SQL dialect.

Key findings:

Aggregate functions: pg_trickle supports 39+ vs Epsio’s ~10. Epsio lacks statistical, ordered-set, and most JSON aggregates.
Window functions: pg_trickle has full support (ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, all aggregate windows); Epsio supports only ROW_NUMBER, LAG, and RANK.
Joins: pg_trickle supports all PostgreSQL join types including FULL OUTER and NATURAL; Epsio documents LEFT/RIGHT/INNER only.
Set operations: pg_trickle supports all 6 variants; Epsio supports UNION and UNION ALL only.
Recursion: Neither system provides truly incremental recursion. pg_trickle supports WITH RECURSIVE (recomputation-diff); Epsio has no documented recursive query support.
CTEs: Both support non-recursive CTEs.
Incremental efficiency: Both maintain per-operator state (Epsio uses RocksDB; pg_trickle generates delta SQL executed by PG’s planner with no persistent state).

Summary Table

SQL Feature	Epsio	pg_trickle	Advantage
Aggregate functions	~10	39+	pg_trickle
Window functions	3 (ROW_NUMBER, LAG, RANK)	Full	pg_trickle
Inner / left / right joins	✅	✅	Tied
FULL OUTER JOIN	❌ (not documented)	✅	pg_trickle
NATURAL JOIN	❌ (not documented)	✅	pg_trickle
Semi-join / anti-join	❌ (not documented)	✅ (dedicated operators)	pg_trickle
Correlated subqueries	❌ (not documented)	✅	pg_trickle
LATERAL	❌ (not documented)	✅	pg_trickle
Scalar subqueries	❌ (not documented)	✅	pg_trickle
Subqueries (FROM clause)	✅	✅	Tied
Views as sources	✅	✅ (auto-inlined)	Tied
Set operations	UNION / UNION ALL only	All 6	pg_trickle
Non-recursive CTEs	✅	✅	Tied
Recursive queries (WITH RECURSIVE)	❌ (not documented)	✅ (recomputation-diff)	pg_trickle
GROUPING SETS / CUBE / ROLLUP	❌ (not documented)	✅ (auto-rewritten)	pg_trickle
DISTINCT / DISTINCT ON	✅ / ✅	✅ / ✅ (auto-rewritten)	Tied
ORDER BY + LIMIT (TopK)	✅	✅ (scoped recomputation)	Tied
Persistent operator state	✅ (RocksDB)	❌ (stateless delta SQL per refresh)	Epsio
Continuous updates	✅ (streaming, ~50ms batches)	❌ (periodic scheduled refresh)	Epsio
Transactional consistency	✅ (cross-table, via unified CDC stream)	✅ (single-transaction refresh)	Tied
SQL dialect	Parsed by custom engine	Native PostgreSQL parser	pg_trickle (PG compat)

Detailed Comparison

1. Incremental Computation Model

Aspect	Epsio	pg_trickle (DVM)
Architecture	External sidecar engine; reads PG via logical replication, writes results back to PG tables	In-database PG extension; executes delta SQL within the same PG instance
Operator model	~15 Collection types forming a physical dataflow graph; each Collection streams diffs to the next	21 DVM operator types forming an operator tree; each generates delta SQL CTEs
Operator state	Persistent per-operator state in RocksDB (disk + in-memory cache)	No persistent state; delta SQL reads current table snapshots on each refresh
Execution	Parse SQL → logical plan → optimize → physical plan → Rust-based streaming Collections	Parse SQL via PG parser → DVM operator tree → generate delta SQL CTEs → PG executor
Processing model	Continuous streaming with micro-batches (default 50ms batches, adaptive)	Periodic scheduled refresh (duration intervals or cron expressions)
Optimizer	Custom logical + physical optimizer (filter/projection pushdown, join reordering)	PostgreSQL planner (cost-based, mature)
CDC mechanism	Logical replication slot (external); dedicated Rust-based CDC forwarder	Row-level AFTER triggers (in-database); change buffers in `pgtrickle_changes` schema

Gap for pg_trickle: No persistent operator state — each refresh re-reads current table contents. No continuous streaming — updates are periodic rather than near-real-time. The trigger-based CDC adds overhead to write transactions.

Gap for Epsio: Requires a separate compute instance; results must be network-shipped back to PG. Custom SQL parser means some valid PG SQL may not be supported. Logical replication requires REPLICA IDENTITY FULL on source tables (significant write amplification for large rows). No access to PG’s cost-based optimizer for delta queries.

2. Aggregate Functions

Function	Epsio	pg_trickle	Incremental Strategy
COUNT(*) / COUNT(expr)	✅	✅	Both: algebraic / stateful counter
SUM	✅	✅	Both: algebraic / stateful
AVG	✅	✅	Both: via SUM/COUNT decomposition
MIN / MAX	✅	✅	Epsio: stateful; pg_trickle: rescan on extremum delete
ARRAY_AGG	✅	✅	Both: expensive O(M) / group-rescan
JSON_AGG / JSONB_AGG	✅	✅	Both: group-rescan
STRING_AGG	✅	✅	Both: group-rescan
PERCENTILE_CONT	✅	✅	Both: group-rescan
STDDEV / STDDEV_POP / STDDEV_SAMP	❌	✅	pg_trickle: group-rescan
BIT_AND / BIT_OR / BIT_XOR	❌	✅	pg_trickle: group-rescan
BOOL_AND / BOOL_OR / EVERY / SOME	❌	✅	pg_trickle: group-rescan
JSON_OBJECT_AGG / JSONB_OBJECT_AGG	❌	✅	pg_trickle: group-rescan
MODE	❌	✅	pg_trickle: group-rescan (ordered-set)
PERCENTILE_DISC	❌	✅	pg_trickle: group-rescan (ordered-set)
CORR / COVAR_POP / COVAR_SAMP	❌	✅	pg_trickle: group-rescan
REGR_* (11 functions)	❌	✅	pg_trickle: group-rescan
ANY_VALUE (PG 16+)	❌	✅	pg_trickle: group-rescan
JSON_ARRAYAGG / JSON_OBJECTAGG (PG 16+)	❌	✅	pg_trickle: group-rescan
FILTER (WHERE) clause	❌ (not documented)	✅	pg_trickle: supported on all aggregates
WITHIN GROUP (ORDER BY)	❌	✅	pg_trickle: ordered-set aggregates
Total built-in aggregates	~10	39+

Gap for pg_trickle: None for PostgreSQL-native aggregates — pg_trickle covers all built-in PG aggregate functions.

Gap for Epsio: Missing 30+ aggregate functions available in PostgreSQL (STDDEV, BIT*, BOOL*, all JSON object aggregates, statistical/regression aggregates, ordered-set aggregates except PERCENTILE_CONT). No FILTER clause support documented. No WITHIN GROUP.

3. Window Functions

Feature	Epsio	pg_trickle
ROW_NUMBER	✅	✅
RANK	✅	✅
DENSE_RANK	❌	✅
LAG	✅	✅
LEAD	❌	✅
NTILE	❌	✅
FIRST_VALUE / LAST_VALUE	❌	✅
SUM / AVG / COUNT / MIN / MAX OVER	❌ (not documented)	✅
Frame clauses (ROWS/RANGE/GROUPS)	❌ (not documented)	✅ (full)
PARTITION BY	✅	✅
Named WINDOW clauses	❌ (not documented)	✅
Window in recursive queries	N/A (no recursion)	✅

Gap for pg_trickle: None for PostgreSQL-native window function features.

Gap for Epsio: Only 3 of PostgreSQL’s 11+ window functions are supported. Missing DENSE_RANK, LEAD, NTILE, FIRST_VALUE, LAST_VALUE, NTH_VALUE, CUME_DIST, PERCENT_RANK. No documented support for aggregate window functions (SUM OVER, etc.) or frame clauses.

4. Joins

Feature	Epsio	pg_trickle
Inner join	✅	✅
LEFT OUTER JOIN	✅	✅
RIGHT OUTER JOIN	✅	✅
FULL OUTER JOIN	❌ (not documented)	✅
CROSS JOIN	❌ (not documented)	✅
NATURAL JOIN	❌ (not documented)	✅
Self-join	✅ (implied)	✅
Non-equi join (theta)	❌ (not documented)	✅
Multi-condition join	✅ (ON … AND)	✅

Delta rule comparison: Both use stateful join operators that maintain state for each side. When a diff arrives on one side, the operator looks up matching rows from the other side’s state. Epsio uses RocksDB-backed join state; pg_trickle generates SQL that reads current table snapshots (bilinear DBSP decomposition: Δ(A ⋈ B) = ΔA ⋈ B + A ⋈ ΔB + ΔA ⋈ ΔB).

Gap for pg_trickle: None for PostgreSQL-native join types.

Gap for Epsio: Documentation only mentions LEFT/RIGHT/INNER joins. No documented support for FULL OUTER JOIN, CROSS JOIN, NATURAL JOIN, or non-equi (theta) joins.

5. Subqueries

Feature	Epsio	pg_trickle
Subqueries in FROM clause	✅	✅
Views as sources	✅	✅ (auto-inlined)
Correlated subqueries	❌ (not documented)	✅
EXISTS / NOT EXISTS	❌ (not documented)	✅
IN / NOT IN (subquery)	❌ (not documented)	✅ (semi-join / anti-join operators)
Scalar subquery in SELECT	❌ (not documented)	✅
Scalar subquery in WHERE	❌ (not documented)	✅ (auto-rewritten to CROSS JOIN)
LATERAL subquery	❌ (not documented)	✅
LATERAL SRF (UNNEST, etc.)	❌ (not documented)	✅
ALL (subquery)	❌ (not documented)	✅ (anti-join rewrite)

Gap for pg_trickle: None for PostgreSQL-native subquery features.

Gap for Epsio: Only FROM-clause subqueries and views documented. No documented support for correlated subqueries, EXISTS/NOT EXISTS, IN/NOT IN subqueries, scalar subqueries, or LATERAL. This is a significant gap for analytical workloads.

6. CTEs & Recursion

Feature	Epsio	pg_trickle
Simple CTE (WITH)	✅	✅
Multi-reference CTE	✅ (implied)	✅ (shared delta)
Chained CTEs	✅ (implied)	✅
Recursive queries (WITH RECURSIVE)	❌ (not documented)	✅ (recomputation-diff)
Operators in recursive body	N/A	All

Gap for pg_trickle: Recursion uses recomputation-diff (re-executes the full recursive query and diffs the result), which scales as O(|result|) rather than O(|Δ|). However, it does support the feature.

Gap for Epsio: No documented support for WITH RECURSIVE queries. This is a significant gap for graph traversal, hierarchy queries, and transitive closure workloads.

7. Set Operations

Operation	Epsio	pg_trickle
UNION ALL	✅	✅
UNION (DISTINCT)	✅	✅
EXCEPT (DISTINCT)	❌	✅
EXCEPT ALL	❌	✅
INTERSECT (DISTINCT)	❌	✅
INTERSECT ALL	❌	✅

Gap for Epsio: Only UNION and UNION ALL documented. No EXCEPT or INTERSECT support in any variant. This limits complex analytical queries that rely on set subtraction or intersection.

8. DISTINCT & Grouping

Feature	Epsio	pg_trickle
SELECT DISTINCT	✅	✅
DISTINCT ON (expr, …)	✅	✅ (auto-rewritten to ROW_NUMBER)
GROUP BY	✅	✅
GROUPING SETS	❌ (not documented)	✅ (auto-rewritten to UNION ALL)
CUBE	❌ (not documented)	✅ (auto-rewritten via GROUPING SETS)
ROLLUP	❌ (not documented)	✅ (auto-rewritten via GROUPING SETS)
GROUPING() function	❌ (not documented)	✅
HAVING	❌ (not documented)	✅
ORDER BY + LIMIT (TopK)	✅	✅ (scoped recomputation)

~~Gap for pg_trickle:~~ ✅ Resolved. ORDER BY + LIMIT (TopK) is now supported via scoped recomputation (MERGE-based). Epsio uses true incremental TopK; pg_trickle re-executes the full query when changes exist but skips refresh when no changes are detected.

Gap for Epsio: No documented support for GROUPING SETS, CUBE, ROLLUP, or HAVING. These are critical for multi-dimensional analytical aggregations.

9. Incremental Efficiency by Operator

Operator	Epsio	pg_trickle
Filter (WHERE)	O(D) — stateless, synchronous passthrough	O(D) — delta passthrough
Project (SELECT)	O(D) — stateless, synchronous map	O(D) — delta passthrough
Inner Join	O(D × state) — RocksDB-backed join index	O(D × snapshot) — reads current tables via SQL
Outer Join	O(D × state) — stateful (LEFT/RIGHT only)	O(D × snapshot) — 8-part delta for FULL OUTER
Aggregate (algebraic)	O(D) — stateful reduce (RocksDB counters)	O(D) — algebraic rewrite with `__pgt_count`
Aggregate (group-rescan)	O(M) — re-aggregate in RocksDB state	O(M) — re-aggregate via SQL LEFT JOIN back
DISTINCT	Stateful (implied)	O(D) — handled via GROUP BY + HAVING count
UNION ALL	O(D) — passthrough	O(D) — passthrough
Window function	Limited (3 functions; stateful)	O(D × partition) — recompute via SQL
Recursive CTE	N/A (not supported)	O(result) — recomputation-diff

Key difference: Epsio maintains per-operator state in RocksDB (with in-memory cache and async I/O), enabling continuous O(Δ) incremental updates without re-reading source tables. pg_trickle generates SQL that reads current table snapshots — the PostgreSQL planner optimizes this, but there’s no persistent operator state between refreshes. Epsio’s approach avoids snapshot reads but requires separate compute and storage; pg_trickle’s approach benefits from PG’s mature cost-based optimizer and keeps everything in-database.

10. Correctness & Verification

Aspect	Epsio	pg_trickle
Formal proof	❌	❌
Transactional consistency	✅ (unified CDC stream, cross-table consistency)	✅ (single-transaction refresh)
Consistency model	Eventual (typically ~50ms lag)	Snapshot-consistent at refresh time
Population correctness	✅ (atomic snapshot via pg_current_snapshot)	✅ (full refresh on create)
Sink correctness	✅ (consolidated diffs, single-transaction write)	✅ (delta applied in single transaction)
Property-based testing	Unknown (closed-source)	✅ (assert: Contents(ST) = Q(DB) after each mutation)
TPC-H validation	Unknown	✅ (22-query suite, 20/22 create, 15/22 deterministic)
Consistency guarantee	Eventual consistency (never partial results)	Empirically verified (1,300+ tests)
Error recovery	✅ (auto-repopulate on internal error, retry on DB error)	Via scheduled refresh retry

Gap for pg_trickle: No continuous near-real-time updates — consistency depends on refresh schedule.

Gap for Epsio: Closed-source — verification approach is not publicly documented. Eventual consistency model means brief periods (~50ms) where view lags behind database.

Features Unique to Each System (IVM Engine, PostgreSQL Features Only)

Epsio-only

#	Feature	Impact
1	Persistent operator state (RocksDB)	Avoids re-reading source table snapshots on each update
2	Continuous streaming updates (~50ms batches)	Near-real-time view freshness vs periodic refresh
3	~~ORDER BY + LIMIT (TopK)~~	~~Incrementally maintained “top N” queries~~ — Resolved in pg_trickle (scoped recomputation)
4	Adaptive micro-batching	Auto-adjusts batch size for throughput
5	Automatic error recovery	Repopulates view on internal errors, retries on DB errors

pg_trickle-only

#	Feature	Impact
1	30+ additional aggregate functions	STDDEV, BIT, BOOL, JSON object aggs, statistical, regression, ordered-set
2	Full window function support (11+ functions)	DENSE_RANK, LEAD, NTILE, FIRST_VALUE, LAST_VALUE, aggregate windows, frame clauses
3	FULL OUTER / CROSS / NATURAL joins	Complete PostgreSQL join support
4	Correlated subqueries	EXISTS, NOT EXISTS, IN, NOT IN, scalar subqueries
5	LATERAL subqueries	Row-producing functions, LATERAL JOINs
6	WITH RECURSIVE	Recursive query support (recomputation-diff)
7	EXCEPT / INTERSECT (all 4 variants)	Complete set operations
8	GROUPING SETS / CUBE / ROLLUP	Multi-dimensional analytical aggregations
9	HAVING clause	Aggregate filtering
10	FILTER (WHERE) on aggregates	Per-aggregate conditional filtering
11	WITHIN GROUP (ORDER BY)	Ordered-set aggregate support
12	DISTINCT ON (auto-rewritten)	“Latest per group” pattern
13	Native PG parser	Exact PostgreSQL SQL compatibility
14	PG cost-based optimizer for delta SQL	Mature planner optimizes incremental queries
15	Full PG type system in IVM queries	PostGIS, ranges, domains, custom operators
16	Views as sources (auto-inlined)	Transparent view expansion in IVM
17	Partitioned table support	IVM over partitioned sources
18	In-database execution	No external compute instance, no network round-trips
19	Auto-rewrite pipeline (6 transparent rewrites)	DISTINCT ON, GROUPING SETS, view inlining, etc.

Recommendations for pg_trickle

Worth considering (learning from Epsio)

Priority	Feature	Description	Effort	Rationale
~~Medium~~	~~ORDER BY + LIMIT (TopK)~~	~~Incrementally maintain “top N rows” queries~~	~~16–24h~~	✅ Done — implemented via scoped recomputation (MERGE-based TopK)

Not worth pursuing

Feature	Reason
Persistent operator state	Would require rearchitecting the delta model or adding RocksDB-like external storage. The stateless delta SQL model benefits from PG’s planner and MVCC.
Continuous streaming	pg_trickle’s trigger-based CDC already captures changes; the periodic refresh model with configurable schedules is a deliberate design choice for predictable resource usage.
External sidecar architecture	pg_trickle’s in-database design is a core differentiator — zero deployment overhead, no network hops, no REPLICA IDENTITY FULL requirement.

Conclusion

As pure IVM engines operating on PostgreSQL SQL features, Epsio and pg_trickle target similar use cases but with fundamentally different architectures:

Epsio provides continuous near-real-time updates with persistent operator state (RocksDB-backed). However, its SQL coverage is notably narrow: ~10 aggregate functions, 3 window functions, no FULL OUTER JOIN, no correlated subqueries, no LATERAL, no WITH RECURSIVE, no EXCEPT/INTERSECT, no GROUPING SETS, and no HAVING. It requires a separate compute instance, REPLICA IDENTITY FULL on source tables, and is closed-source.

pg_trickle has dramatically broader SQL coverage: 39+ aggregates, full window functions, all join types, all set operations, correlated subqueries, LATERAL, WITH RECURSIVE, GROUPING SETS/CUBE/ROLLUP, and the full PG type system. It runs in-database with zero external dependencies. Its limitations are the lack of persistent operator state and periodic (rather than continuous) refresh.

Overall assessment: pg_trickle’s SQL coverage is far more comprehensive than Epsio’s. For the vast majority of PostgreSQL queries that users would want to incrementally maintain, pg_trickle already supports the necessary SQL constructs while Epsio does not. Epsio’s advantages are architectural (continuous updates, persistent state) rather than SQL coverage.

The only SQL-level feature worth considering from Epsio is ORDER BY + LIMIT (TopK) incremental maintenance, which is a useful pattern pg_trickle does not currently support.

PGXN

PostgreSQL Extension Network

Contents