Architecture & Internals

Understanding how pg_ttl_index works under the hood.

System Architecture

┌─────────────────────────────────────────────────────────┐
│                   PostgreSQL Server                      │
│                                                          │
│  ┌────────────────────────────────────────────────────┐ │
│  │         Background Worker Process                   │ │
│  │  ┌──────────────────────────────────────────────┐  │ │
│  │  │  1. Wake up every naptime seconds           │  │ │
│  │  │  2. Try acquire advisory lock                │  │ │
│  │  │  3. Call ttl_runner()                        │  │ │
│  │  │  4. Sleep naptime seconds                    │  │ │
│  │  │  5. Repeat                                   │  │ │
│  │  └──────────────────────────────────────────────┘  │ │
│  └────────────────────────────────────────────────────┘ │
│                          │                               │
│                          ▼                               │
│  ┌────────────────────────────────────────────────────┐ │
│  │              ttl_runner() Function                  │ │
│  │  ┌──────────────────────────────────────────────┐  │ │
│  │  │  FOR EACH active TTL in ttl_index_table      │  │ │
│  │  │    LOOP (batch deletion)                     │  │ │
│  │  │      DELETE batch_size rows WHERE expired    │  │ │
│  │  │      UPDATE statistics                       │  │ │
│  │  │      Sleep 10ms (yield to other processes)   │  │ │
│  │  │      EXIT when no more expired rows          │  │ │
│  │  │    END LOOP                                  │  │ │
│  │  │  END FOR                                     │  │ │
│  │  └──────────────────────────────────────────────┘  │ │
│  └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

Components

1. Extension SQL Objects

Location: pg_ttl_index--2.0.0.sql

  • ttl_index_table - Configuration and statistics storage
    • Stores normalized schema_name + table_name
  • ttl_create_index() - Create/update TTL rules
  • ttl_drop_index() - Remove TTL rules
  • ttl_runner() - Execute cleanup logic
  • ttl_summary() - View with computed fields
  • ttl_worker_status() - Worker health information

2. C Extension Layer

Location: src/ directory

  • pg_ttl_index.c - Extension initialization
  • worker.c - Background worker implementation
  • api.c - C functions (ttl_start_worker, ttl_stop_worker)
  • utils.c - Helper utilities

3. Configuration

Location: postgresql.conf

  • shared_preload_libraries - Load extension at startup
  • pg_ttl_index.naptime - Cleanup interval (GUC)
  • pg_ttl_index.enabled - Enable/disable worker (GUC)

Background Worker Lifecycle

Startup Sequence

1. PostgreSQL starts
   └─> Loads shared_preload_libraries
       └─> Loads pg_ttl_index.so
           └─> Registers background worker template

2. User calls ttl_start_worker()
   └─> Spawns background worker process
       └─> Worker connects to current database
           └─> Enters main loop

Main Worker Loop

while (true) {
    // 1. Check if shutdown requested
    if (shutdown_requested) break;

    // 2. Execute cleanup
    execute_sql("SELECT <extension_schema>.ttl_runner()");

    // 3. Sleep for naptime seconds
    sleep(pg_ttl_index_naptime);
}

Shutdown Sequence

1. ttl_stop_worker() called
   └─> Signals worker to terminate
       └─> Worker exits main loop
           └─> Process terminates

Cleanup Algorithm

ttl_runner() Internals

1. Acquire advisory lock (pg_try_advisory_lock)
   └─> If already locked, exit (another instance running)

2. FOR EACH active TTL configuration:
   a. Initialize batch counter

   b. LOOP (until no more expired rows):
      i.   SELECT ctid of expired rows (LIMIT batch_size)
      ii.  DELETE rows with matching ctid
      iii. Update row counter
      iv.  Sleep 10ms (yield)
      v.   EXIT if deleted < batch_size

   c. UPDATE statistics (rows_deleted, last_run)
   d. COMMIT (per-table transaction)

3. Release advisory lock
4. RETURN total rows deleted

Why ctid?

Using ctid (physical row location) is faster than index-based deletion:

-- Slow: requires index scan for each row
DELETE FROM table WHERE id IN (SELECT id FROM ...);

-- Fast: direct physical row access
DELETE FROM table WHERE ctid IN (SELECT ctid FROM ...);

Concurrency Control

Advisory Lock Mechanism

-- At start of ttl_runner()
SELECT pg_try_advisory_lock(hashtext('pg_ttl_index_runner'));

-- If lock acquired = true:
--   Proceed with cleanup
-- If lock acquired = false:
--   Another instance is running, skip this run

-- At end of ttl_runner()
SELECT pg_advisory_unlock(hashtext('pg_ttl_index_runner'));

Benefits: - Prevents overlapping cleanup runs - Safe in clustered/replica environments - No database schema changes required

Auto-Indexing

Index Creation

When ttl_create_index() is called:

1. Generate index name: idx_ttl_{table}_{column}
2. Execute: CREATE INDEX IF NOT EXISTS idx_ttl_... ON schema.table(column)
3. Store index name in ttl_index_table.index_name

Example: sql SELECT ttl_create_index('public.sessions', 'created_at', 3600); -- Creates: idx_ttl_sessions_created_at

Index Usage

The index accelerates the expired row selection:

-- Without index: full table scan
SELECT ctid FROM table WHERE created_at < NOW() - INTERVAL '1 hour';

-- With index: index scan (much faster)
SELECT ctid FROM table WHERE created_at < NOW() - INTERVAL '1 hour';

Batch Deletion Strategy

Why Batching?

┌─────────────────────────────────────┐
│ Without Batching (delete all)       │
├─────────────────────────────────────┤
│ ❌ Long-running transactions        │
│ ❌ Locks table for extended period  │
│ ❌ Large WAL generation             │
│ ❌ bloat and performance issues     │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ With Batching (batch_size = 10K)    │
├─────────────────────────────────────┤
│ ✅ Short transactions                │
│ ✅ Minimal locking                   │
│ ✅ Manageable WAL size              │
│ ✅ Other queries can proceed        │
└─────────────────────────────────────┘

Batch Processing Flow

Expired rows: 250,000
Batch size: 10,000

Iteration 1: Delete 10,000 rows → Sleep 10ms
Iteration 2: Delete 10,000 rows → Sleep 10ms
...
Iteration 25: Delete 10,000 rows → Sleep 10ms
Total: 250,000 rows deleted in 25 batches

Statistics Tracking

How Statistics are Updated

-- At end of each table's cleanup:
UPDATE ttl_index_table
SET 
    last_run = NOW(),
    rows_deleted_last_run = {rows deleted this run},
    total_rows_deleted = total_rows_deleted + {rows deleted this run}
WHERE schema_name = ... AND table_name = ... AND column_name = ...;

Security Hardening

pg_ttl_index functions now run with a fixed search_path captured at install time (SET search_path FROM CURRENT). This prevents runtime search_path changes from redirecting function resolution to attacker-controlled objects.

Counter Precision

  • rows_deleted_last_run: BIGINT (max: 9,223,372,036,854,775,807)
  • total_rows_deleted: BIGINT (will never overflow in practice)

Error Handling

Per-Table Isolation

FOR rec IN SELECT ... FROM ttl_index_table LOOP
    BEGIN
        -- Cleanup for this table
        ...
    EXCEPTION WHEN OTHERS THEN
        -- Log error, continue to next table
        RAISE WARNING 'Failed to cleanup %: %', rec.table_name, SQLERRM;
    END;
END LOOP;

Result: Error in one table doesn’t affect others.

Performance Characteristics

Time Complexity

Operation Complexity Notes
Create TTL index O(n log n) Index creation on existing table
Delete batch O(batch_size) With index
Find expired rows O(log n) Binary search via index
Update statistics O(1) Single row update

Space Complexity

  • ttl_index_table: O(m) where m = number of TTL configurations (typically < 100)
  • Auto-created indexes: O(n) per table, where n = table size

Resource Usage

Per cleanup run: - CPU: Minimal (index lookups + deletes) - Memory: ~10-50 MB (batch processing) - I/O: Proportional to deleted rows - WAL: ~(row_size × rows_deleted)

Comparison with Alternatives

vs. Triggers

Feature pg_ttl_index Triggers
Performance ✅ Batch deletion ❌ Per-row overhead
Complexity ✅ Simple setup ❌ Complex trigger logic
Control ✅ Centralized ❌ Spread across tables

vs. CRON Jobs

Feature pg_ttl_index CRON
Integration ✅ Native PostgreSQL ❌ External dependency
Monitoring ✅ Built-in views ❌ Custom logging
Concurrency ✅ Advisory locks ❌ Manual coordination

vs. Partitioning

Feature pg_ttl_index Partitioning
Granularity ✅ Row-level ❌ Partition-level
Setup ✅ Simple function call ❌ Complex schema changes
Flexibility ✅ Change TTL anytime ❌ Requires repartitioning

Internal Functions (Not Public API)

These functions exist but are not meant for direct use:

  • pg_ttl_index_main() - Worker main loop ©
  • ttl_worker_launch() - Internal worker launcher ©

Source Code Organization

src/
├── pg_ttl_index.c     # Extension initialization, _PG_init()
├── pg_ttl_index.h     # Header file, constants
├── worker.c           # Background worker implementation
├── api.c              # Public C functions (start/stop worker)
├── utils.c            # Helper functions
└── utils.h            # Helper headers

See Also