REPORT_TIMESCALEDB.md — TimescaleDB Synergy & Lessons for pg_trickle

Date: 2026-03-01 Status: RESEARCH


1. Executive Summary

TimescaleDB and pg_trickle are complementary PostgreSQL extensions solving different problems — partitioned time-series storage vs incremental view maintenance. They don’t conflict at the storage layer and can coexist.

However, there is a concrete synergy opportunity: pg_trickle can provide rich IVM over hypertables where TimescaleDB’s Continuous Aggregates are limited (no complex JOINs, no subqueries, no window functions, no cross-table aggregation). Several TimescaleDB design patterns are also worth studying for adoption in pg_trickle.

This document catalogues: (1) where the projects complement each other, (2) specific synergy scenarios, (3) engineering challenges for combined use, (4) design ideas worth borrowing, and (5) a phased action plan.


2. Comparison

TimescaleDB pg_trickle
Core problem Efficient time-series storage & querying Keeping materialized views fresh incrementally
Key mechanism Hypertables — auto-partitioning by time, compression, retention policies Differential dataflow (DBSP) — delta query derivation from operator trees
IVM approach Continuous Aggregates — restricted to time_bucket GROUP BY on a single hypertable Stream Tables — broad SQL (JOINs, CTEs, window functions, subqueries, set operations) over any table
Change tracking Invalidation log per hypertable chunk/time-bucket Row-level triggers → WAL logical replication
Workload model Append-heavy time-series (INSERT-dominant) General OLTP (INSERT/UPDATE/DELETE)
Refresh model Materialization policies on invalidated buckets Scheduled or on-demand delta refresh over change buffers
Pipeline support Cagg-on-cagg (hierarchical rollups) Stream-table-on-stream-table DAG with demand-driven scheduling
Storage Compressed columnar chunks for old data Standard PostgreSQL heap tables
License Apache-2.0 (community) / Timescale License (enterprise features) Apache-2.0

3. Synergy Scenarios

3.1. Rich SQL over Hypertables

Continuous Aggregates are limited to GROUP BY time_bucket(...) on a single hypertable with scalar aggregates. Users cannot:

  • JOIN a hypertable with a dimension table (e.g., sensor_readings JOIN devices)
  • Use window functions (ROW_NUMBER, LAG, LEAD)
  • Use subqueries or CTEs in the aggregate definition
  • Combine data from multiple hypertables
  • Use DISTINCT ON, UNION, INTERSECT, EXCEPT

pg_trickle stream tables support all of these. A user could define:

SELECT pgtrickle.create_stream_table(
    'enriched_hourly_readings',
    $$
    SELECT
        time_bucket('1 hour', r.ts) AS bucket,
        d.facility,
        d.region,
        avg(r.temperature) AS avg_temp,
        max(r.temperature) - min(r.temperature) AS temp_spread,
        count(*) AS reading_count
    FROM sensor_readings r
    JOIN devices d ON d.device_id = r.device_id
    WHERE d.is_active
    GROUP BY 1, 2, 3
    $$,
    schedule := '5 minutes'
);

This is impossible with a Continuous Aggregate alone.

3.2. Cross-Hypertable Aggregation

Organizations often have multiple hypertables (e.g., cpu_metrics, memory_metrics, disk_metrics) that need to be combined for dashboards. Continuous Aggregates can’t span tables. A pg_trickle stream table can JOIN and aggregate across them.

3.3. Hierarchical Pipelines Mixing Both

A Continuous Aggregate could provide a pre-materialized time-bucketed rollup that feeds into a pg_trickle stream table for enrichment:

sensor_readings (hypertable)
  └─▶ hourly_readings (Continuous Aggregate — fast time-bucket rollup)
       └─▶ enriched_hourly_readings (pg_trickle stream table — JOIN with devices)
            └─▶ regional_summary (pg_trickle stream table — further aggregation)

3.4. Real-Time Alerting on Time-Series Data

pg_trickle’s NOTIFY-based alerting combined with stream tables over hypertables could enable threshold-based alerts that fire when an aggregated metric crosses a boundary — without polling.


4. Engineering Challenges

4.1. Trigger-Based CDC on Hypertables — HIGH

Hypertables are the parent table; actual data lives in chunk child tables (_timescaledb_internal._hyper_<id>_<chunk_id>_chunk). TimescaleDB redirects INSERT operations to the appropriate chunk based on the time dimension.

Impact: Row-level AFTER INSERT triggers defined on the parent hypertable do fire for inserts — TimescaleDB propagates triggers to chunks. However, this behavior needs validation under:

  • Chunk creation (new chunks appear as data arrives for new time ranges)
  • Chunk compression (rows become read-only columnar data)
  • Chunk decompression (for updates to compressed ranges)
  • Chunk migration (move/copy between tablespaces)

Investigation needed: - [ ] Verify trigger propagation works for newly created chunks - [ ] Test that pg_current_wal_lsn() inside a trigger on a chunk returns a sensible value - [ ] Benchmark trigger overhead on high-throughput hypertable inserts (time- series workloads often do 100K+ inserts/sec)

4.2. WAL-Based CDC on Hypertables — MEDIUM

Logical replication decoding sees DML on chunk tables, not the parent hypertable. pg_trickle’s WAL decoder would need to:

  1. Map chunk OIDs back to the parent hypertable OID
  2. Handle DDL from chunk management (CREATE TABLE for new chunks, ALTER for compression transitions)
  3. Ignore or correctly handle the internal metadata changes that TimescaleDB writes alongside user data

The mapping can be done via _timescaledb_catalog.chunk_timescaledb_catalog.hypertable lookups, but this adds a dependency on TimescaleDB’s internal schema.

4.3. Compression Interactions — MEDIUM

When TimescaleDB compresses a chunk: 1. Data is READ from the uncompressed chunk 2. Compressed data is WRITTEN to a compressed chunk (internal format) 3. The uncompressed chunk is TRUNCATED

Steps 1–3 appear in WAL as normal DML. The DELETE + INSERT pattern would be misinterpreted by pg_trickle’s CDC as real data changes, triggering a spurious full delta computation.

Mitigation options: - Detect compression operations via _timescaledb_catalog metadata and skip the corresponding WAL events - Use TimescaleDB’s compression hooks (if exposed) to suppress CDC during compression - Only track uncompressed (recent) chunks — this naturally aligns with the time-series pattern where only recent data changes

4.4. time_bucket() Function in Delta Queries — LOW

time_bucket() is a TimescaleDB function, not a core PostgreSQL function. When pg_trickle parses a defining query containing time_bucket(), it needs to:

  1. Recognize it as a deterministic, immutable function (safe for IVM)
  2. Not attempt to “differentiate” through it — treat it as a scalar transform

pg_trickle already handles arbitrary immutable functions as pass-through in delta queries, so this should work without changes. Needs validation.

4.5. Continuous Aggregate Internals as Source — LOW

If a user defines a stream table over a Continuous Aggregate’s materialized view, pg_trickle needs to handle the fact that the underlying object is a materialized hypertable (with its own chunk structure), not a plain table. Triggers on the cagg’s materialized hypertable would capture the deltas that TimescaleDB’s materializer produces.


5. Design Ideas Worth Borrowing

5.1. Time-Bucketed Invalidation Tracking

What TimescaleDB does: Instead of tracking every changed row, the invalidation log records which time-bucket ranges were modified:

invalidation_log: (hypertable_id, lowest_modified_value, greatest_modified_value)

On refresh, only the affected time buckets are recomputed — not the entire table, and not individual row deltas.

Applicability to pg_trickle: For source tables with a time/sequence column used in GROUP BY, pg_trickle could maintain a coarser invalidation log alongside (or instead of) per-row change buffers. This would dramatically reduce change buffer size for time-series-style workloads.

Complexity: HIGH — requires detecting temporal grouping patterns in the defining query and implementing range-based invalidation in the refresh engine.

Priority: Phase 3+ (optimization, not correctness)

5.2. Real-Time Aggregation Mode

What TimescaleDB does: Continuous Aggregates support a “real-time” mode that at query time combines: - The materialized (potentially stale) aggregate data - A live query over unmaterialized recent data (since last refresh)

The result is always-fresh reads without requiring a refresh.

Applicability to pg_trickle: A stream table could expose a view that UNIONs: 1. The materialized storage table 2. A delta query over the unflushed change buffer

-- Conceptual: auto-generated view
CREATE VIEW st_realtime AS
    SELECT * FROM st_materialized
    UNION ALL
    SELECT * FROM (/* delta query over change buffer */);

Complexity: MEDIUM-HIGH — the delta query derivation already exists (it’s the refresh path). The challenge is making it performant at query time without a full refresh cycle, and handling deduplication correctly for non-append-only workloads (UPDATEs/DELETEs require MERGE-like semantics, not UNION ALL).

Priority: Phase 2 — high user value; aligns with the “freshness without latency” goal.

See also: PLAN_TRANSACTIONAL_IVM.md for transactionally updated views (immediate IVM), which overlaps with this idea.

5.3. Append-Only Fast Path

What TimescaleDB does: Time-series workloads are insert-only. TimescaleDB optimizes for this by not tracking deletes/updates in the invalidation log.

Applicability to pg_trickle: When pg_trickle detects that a source table is append-only (configurable hint or detected via trigger — no UPDATE/DELETE events observed), the delta query can be simplified:

  • JOINs: Skip the bilinear expansion. For A JOIN B where A is append-only and B is static, the delta is just ΔA JOIN B (one term instead of three).
  • Aggregates: Skip maintaining DELETE counters in auxiliary state.
  • Change buffer: Skip the op column (always ‘I’); use a simpler schema.

Complexity: LOW-MEDIUM — this is a specialization of existing paths.

Priority: Phase 2 — significant performance win for time-series use cases.

5.4. Coarse-Grained Change Tracking for Partitioned Tables

What TimescaleDB does: Invalidation is tracked per-chunk (partition), not per-row. Chunks map to time ranges, so “which chunks changed?” is a O(1) check per chunk.

Applicability to pg_trickle: For PostgreSQL native partitioned tables (not just hypertables), pg_trickle could maintain a partition-level change flag:

  1. On first change to a partition since last refresh, set a dirty flag
  2. On refresh, only scan change buffers for dirty partitions
  3. Clear dirty flags after refresh

This doesn’t change the delta computation — it’s a filter that avoids scanning empty change buffers.

Complexity: LOW — partition OIDs are available from pg_inherits.

Priority: Phase 2 — especially valuable when combined with TimescaleDB.

5.5. Declarative Policy Framework

What TimescaleDB does: Policies are first-class objects:

SELECT add_continuous_aggregate_policy('hourly_readings',
    start_offset => INTERVAL '3 hours',
    end_offset   => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour'
);

Policies have built-in monitoring (timescaledb_information.job_stats), retry logic, exponential backoff, and can be paused/resumed.

Applicability to pg_trickle: pg_trickle already has schedule and status management via alter_stream_table. The lesson is more about UX:

  • Expose refresh policies as queryable objects (already done via pgtrickle.pgt_stream_tables)
  • Add retry/backoff configuration (partially done via consecutive_errors)
  • Consider a pgtrickle.add_refresh_policy() wrapper that mirrors the TimescaleDB ergonomics for users coming from that ecosystem

Priority: Phase 3 — ergonomic polish.

5.6. Chunk-Aware Retention

What TimescaleDB does: add_retention_policy() automatically drops chunks older than a threshold.

Applicability to pg_trickle: Stream table change buffers grow over time. A built-in retention policy for change buffers (drop change buffer entries older than N refreshes or N hours) would prevent unbounded growth. This is partially handled by the post-refresh cleanup, but a dedicated policy would be more robust for error scenarios where refreshes fail repeatedly.

Priority: Phase 3.


6. Action Plan

Phase 1 — Validation (1–2 weeks)

Goal: Confirm that pg_trickle works at all with hypertable source tables.

# Task Effort
1.1 Set up a test environment with both extensions on PG 18 2h
1.2 Create a hypertable and define a pg_trickle stream table over it (simple GROUP BY) 2h
1.3 Verify trigger-based CDC fires correctly on hypertable inserts 2h
1.4 Test with chunk creation (insert data spanning multiple time ranges) 2h
1.5 Test with time_bucket() in defining query 1h
1.6 Measure trigger overhead on high-throughput inserts (10K–100K rows/sec) 4h
1.7 Document findings and any blockers 2h

Deliverable: Test report confirming basic compatibility or listing blockers.

Phase 2 — Core Optimizations (2–4 weeks)

Goal: Implement the highest-value patterns learned from TimescaleDB.

# Task Effort Depends on
2.1 Append-only source hint (source_mode := 'append_only') 3d
2.2 Simplified delta queries for append-only JOINs 3d 2.1
2.3 Partition-level dirty tracking for partitioned/hyper tables 2d Phase 1
2.4 Investigate real-time aggregation mode (design doc) 3d
2.5 E2E test suite: pg_trickle + TimescaleDB scenarios 3d Phase 1

Phase 3 — Advanced Integration (4–8 weeks)

Goal: Handle edge cases and polish the combined experience.

# Task Effort Depends on
3.1 WAL decoder: map chunk OIDs → parent hypertable OID 3d Phase 1
3.2 WAL decoder: filter compression-related DML 2d 3.1
3.3 Stream table over Continuous Aggregate source 3d Phase 1
3.4 Time-bucketed invalidation tracking (design + prototype) 2w 2.4
3.5 Documentation: “Using pg_trickle with TimescaleDB” tutorial 2d Phase 2
3.6 Refresh policy ergonomics (add_refresh_policy wrapper) 2d
3.7 Change buffer retention policy 2d

7. Non-Goals

  • Replacing Continuous Aggregates. For simple time_bucket GROUP BY on a single hypertable, Continuous Aggregates are faster and simpler. pg_trickle targets the cases caggs can’t handle.
  • Depending on TimescaleDB. All TimescaleDB integration must be optional. pg_trickle detects TimescaleDB at runtime (SELECT 1 FROM pg_extension WHERE extname = 'timescaledb') and enables optimizations conditionally.
  • Modifying TimescaleDB internals. All integration works through public APIs and catalog tables.

8. Open Questions

  1. Trigger propagation on new chunks — Does TimescaleDB propagate existing parent-table triggers to chunks created after the trigger was defined? (Likely yes, but needs verification.)

  2. Compression hook — Is there a way to detect that a chunk is being compressed (hook, event trigger, catalog flag) so CDC can ignore the resulting DML?

  3. Cagg materialized hypertable OID stability — When a Continuous Aggregate is refreshed, does the materialized hypertable’s OID remain stable? (Almost certainly yes, but worth confirming for CDC registration.)

  4. time_bucket immutability — Is time_bucket() marked IMMUTABLE in the catalog? pg_trickle requires source functions to be immutable for correct IVM. (Likely yes for fixed-interval overloads; the timezone-aware overloads may be STABLE.)

  5. Licensing — Some TimescaleDB features (e.g., compression, continuous aggregates) are under the Timescale License, not Apache-2.0. Does depending on their catalog tables or behavior create any licensing concerns for pg_trickle? (Likely no — pg_trickle would only query public catalog views, not link to TSL code.)


9. References