layout: default title: Composite-PK Pruning

Spec: Composite Primary-Key Pruning

Status: implemented first pass Risk tier: CAUTION Primary goal: make two-column zone maps produce tight pruning for common tenant/fact layouts such as (tenant_id, id) and (entity_id, relation_id, target_id).

Problem

sorted_heap stores per-page min/max for the first two primary-key columns. The documentation says composite PKs can prune on both columns.

A fresh probe found a weaker behavior before this spec was implemented:

  • table shape: (tenant_id int, id int, primary key (tenant_id, id))
  • storage: compacted sorted_heap leaf with valid/sorted zone map
  • query: tenant_id = 1 AND id BETWEEN 100 AND 110
  • plan: Custom Scan (SortedHeapScan)
  • observed pruning: Zone Map: 308 of 310 blocks (pruned 2)

The zone map entries contained second-column ranges such as c2:1..65, c2:66..130, etc. The storage had the information needed for a tight range, but the scan-range computation did not use it tightly in the sorted-prefix path.

Current Contract

The sorted-prefix binary-search path is primarily first-column based. It can find tight ranges when column 1 alone is selective.

The first implementation adds a conservative refinement for this common shape:

  • column 1 has an inclusive equality bound;
  • column 2 has an equality or range bound;
  • the sorted-prefix first-column binary search has already found the candidate prefix span.

Inside that candidate span, the scanner applies the existing per-page sorted_heap_zone_overlaps(...) second-column check and emits only matching contiguous block ranges. If column 2 is not tracked for a page, the overlap check remains conservative.

This is not yet a full lexicographic binary search. It intentionally optimizes the validated (tenant_id = const AND id bounded) case first while preserving fallback behavior for broader shapes.

Target Contract And Follow-Ups

For a compacted/sorted table with a two-column zone map:

  • col1 = const AND col2 = const should map to a tight block range.
  • col1 = const AND col2 BETWEEN lo AND hi should map to a tight block range.
  • col1 BETWEEN lo AND hi without a col2 bound may use the current broader first-column range behavior.
  • col1 IN (...) AND col2 ... can remain conservative in the first pass unless an exact contract is specified.
  • Generic prepared queries with runtime bounds for col1 = $1 and col2 BETWEEN $2 AND $3 should keep the same tight pruning once executor startup resolves the parameters.

Correctness rule:

  • Zone maps may only eliminate blocks. Executor quals remain the final source of truth.
  • If any bound shape is ambiguous or unsupported, fall back to the existing conservative range.

Acceptance Tests

C1. Equality + second-column range

Create compacted table:

CREATE TABLE tenant_events (
    tenant_id int,
    id int,
    payload text,
    PRIMARY KEY (tenant_id, id)
) USING sorted_heap;

Insert enough rows for multiple pages per tenant, compact, and run:

EXPLAIN (ANALYZE, COSTS OFF)
SELECT * FROM tenant_events
WHERE tenant_id = 1 AND id BETWEEN 100 AND 110;

Expected:

  • Custom Scan (SortedHeapScan) is used.
  • scanned blocks are close to the pages overlapping id 100..110, not the whole tenant range.
  • result count is correct.

Status: covered by SH21B regression. The guard requires the query to scan no more than 3 zone-map blocks and verifies the result count is 11.

C2. Equality + second-column equality

Run:

SELECT * FROM tenant_events WHERE tenant_id = 1 AND id = 123;

Expected:

  • tight block range;
  • correct result.

Status: covered by SH21B regression. The guard requires no more than 2 zone-map blocks for tenant_id = 1 AND id = 123.

C3. First-column-only range remains correct

Run:

SELECT * FROM tenant_events WHERE tenant_id BETWEEN 1 AND 2;

Expected:

  • existing first-column pruning behavior remains correct.

Status: covered by SH21B regression. The guard verifies that tenant_id BETWEEN 2 AND 2 remains zone-map pruned and returns the expected 10,000 rows without using a second-column predicate.

C4. Unsorted tail remains conservative

After compact, insert rows that create an unsorted tail.

Expected:

  • sorted prefix uses tight composite pruning;
  • unsorted tail remains conservatively scanned where needed;
  • no false negatives.

Status: covered by SH21B-3 for fixed-col1/bounded-col2 predicates. Existing tail regressions cover first-column behavior.

C5. Generic prepared composite bounds

Run under plan_cache_mode = force_generic_plan:

PREPARE q(int, int, int) AS
SELECT * FROM tenant_events
WHERE tenant_id = $1 AND id BETWEEN $2 AND $3;

EXPLAIN (COSTS OFF) EXECUTE q(1, 100, 110);

Expected:

  • executor-startup runtime bound resolution preserves the composite bounds;
  • scanned blocks stay close to the constant-bound C1 shape;
  • result count is correct.

Status: covered by SH21B-3 regression.

Implementation Sketch

Data model:

  • Keep the current SortedHeapZoneMapEntry layout.
  • Treat page ranges as lexicographic intervals:
    • page lower key: (zme_min, zme_min2)
    • page upper key: (zme_max, zme_max2)

Planner/executor bounds:

  • SortedHeapScanBounds already stores first-column and second-column bounds.
  • Add helper predicates for lexicographic lower/upper comparisons when column 1 equality and column 2 bounds are present.

Sorted-prefix search:

  • Implemented first pass for col1 = const AND col2 bounded:
    • binary-search the first-column prefix span;
    • refine that span by applying second-column overlap checks to zone-map entries;
    • emit contiguous matching ranges.
  • Future optimization:
    • replace the linear refinement inside one column-1 span with true lexicographic binary search only if page metadata can prove endpoint ordering. The current second-column zone-map fields are page-global min/max values, not (col1, col2) lower/upper endpoint pairs.

Adversary checks:

  • Pages can contain mixed column-2 ranges when column 1 min/max spans multiple values. The lexicographic optimization must only engage when it can prove the first-column equality is safe.
  • Sentinel second-column values mean “not tracked”; fallback, do not optimize.
  • UUID/text lossy mapping remains conservative.

Definition of Done

  • C1-C2 regression tests are present in sql/pg_sorted_heap.sql as SH21B.
  • C5 generic prepared runtime bounds are present in sql/pg_sorted_heap.sql as SH21B-3.
  • C4 tail correctness is present in sql/pg_sorted_heap.sql as SH21B-5.
  • Existing single-column pruning tests pass in pg_sorted_heap.
  • Existing sorted_hnsw regression passes after the scan change.