Contents
layout: default title: Limitations
nav_order: 7
Limitations
Supported PK types
The zone map tracks the first two PK columns. Supported types:
| Type | Zone map support |
|---|---|
| int2, int4, int8 | Full |
| timestamp, timestamptz | Full |
| date | Full |
| uuid | Lossy (first 8 bytes; UUIDs sharing a prefix may not be pruned) |
| text, varchar | Lossy, requires COLLATE "C" (byte order must equal sort order) |
Non-C collation text/varchar columns are not tracked. The table still works but queries on those columns will not benefit from scan pruning.
Online compact/merge restrictions
sorted_heap_compact_online and sorted_heap_merge_online are not
supported for tables with UUID, text, or varchar primary keys. The lossy
int64 hash representation causes collisions during change replay.
Use the offline variants (sorted_heap_compact, sorted_heap_merge) instead.
See Online Compact/Merge for Lossy PK Types for the
future lossless replay-key contract.
UPDATE behavior
UPDATE does not re-sort tuples. After many updates, the physical order may
drift from PK order. Run sorted_heap_compact or sorted_heap_merge
periodically on write-heavy tables.
Eager vs lazy mode
By default (eager mode), every UPDATE that widens a zone map entry flushes the meta page to disk. This keeps scan pruning accurate but adds per-UPDATE WAL overhead (~46% of heap throughput for small-column updates).
Set sorted_heap.lazy_update = on to skip per-UPDATE zone map maintenance.
The first UPDATE invalidates the zone map on disk; the planner falls back to
Index Scan. Compact or merge restores zone map pruning. INSERT always uses
eager maintenance regardless of this setting.
The mode is never activated automatically – choose it based on your workload. See the README “UPDATE modes” section for a decision guide.
Zone map validity
- After compact or rebuild, the zone map is marked valid and scan pruning is active.
- Single-row INSERTs into pages already covered by the zone map update it in place (pruning stays active).
- INSERTs into pages beyond zone map coverage invalidate the flag. VACUUM
with
sorted_heap.vacuum_rebuild_zonemap = on(default) automatically rebuilds it.
Block range pruning
heap_setscanlimits() supports only contiguous block ranges. For
non-contiguous distributions (e.g., after many random inserts without
compaction), the scan reads intervening pages but skips tuple processing
on pages outside the bounds.
Zone maps are not index-only scans
Zone maps store page-level min/max metadata. They let SortedHeapScan skip
heap pages, but they do not store row values or tuple identities and cannot
return rows without fetching heap tuples. Any future heap-fetch-avoiding path
needs either a metadata-only proof contract or a covering value-bearing sidecar.
See Zone-Map-Only Fast Paths for the design boundary.
sorted_hnsw ordered-scan contract
The current planner-integrated sorted_hnsw path is intentionally narrow:
- it targets base-relation
ORDER BY embedding <=> query LIMIT k - it is not used when there is no
LIMIT - it is not used when
LIMIT > sorted_hnsw.ef_search - it is not used when extra base-table quals or parameterization would make the current Phase 1 scan under-return candidates
For filtered retrieval flows, materialize/filter first or use the GraphRAG
helper/wrapper API instead of treating sorted_hnsw as a general filtered ANN
index.
Locking
| Operation | Lock level |
|---|---|
sorted_heap_compact |
AccessExclusiveLock (blocks all access) |
sorted_heap_merge |
AccessExclusiveLock |
sorted_heap_compact_online |
ShareUpdateExclusiveLock during copy; brief AccessExclusiveLock for swap |
sorted_heap_merge_online |
Same as compact_online |
sorted_heap_compact_partitions |
Calls sorted_heap_compact leaf-by-leaf |
sorted_heap_merge_partitions |
Calls sorted_heap_merge leaf-by-leaf |
Only one online compact/merge can run on a table at a time. A second concurrent attempt will fail.
Partition parent helpers are operational wrappers, not global transactions: they preflight unsupported leaves and sorted_heap leaves without primary keys, then process concrete leaves one at a time. They reduce temporary disk-space requirements from “whole logical table” to “current leaf rewrite”, but an individual leaf still needs rewrite headroom. If a later leaf fails, earlier leaves are not rolled back by the helper.
Use sorted_heap_partition_maintenance_plan(parent, operation) before large
runs to list all blockers and estimate the current leaf rewrite headroom. The
estimate is intentionally conservative and relation-size based. The plan also
reports each leaf tablespace so operators can connect the estimate to external
tablespace/free-space monitoring; PostgreSQL does not provide a portable
SQL-level free-byte metric.
The manual smoke make test-partition-lock verifies the expected lock
behavior with two sessions and lock_timeout.
See Huge-Table Compaction Operating Model for the detailed rewrite/free-space contract.
Data migration
- pg_dump / pg_restore: the zone map needs a compact after restore to re-enable scan pruning.
- pg_upgrade 16 to 18 and 17 to 18: tested and verified. Data files (including zone map) are copied as-is.
Post-restore checklist:
-- Discover tables/leaves needing post-restore maintenance:
SELECT *
FROM sorted_heap_restore_plan();
-- Concrete sorted_heap table:
SELECT sorted_heap_compact('events'::regclass);
-- Partitioned parent with sorted_heap leaves:
SELECT *
FROM sorted_heap_compact_partitions('events_parent'::regclass);
-- Inspect storage state after maintenance:
SELECT *
FROM sorted_heap_partition_status('events_parent'::regclass);
For sorted_hnsw, rebuild the index/sidecar after restore. It stores physical
heap TIDs, and pg_restore rewrites tuples with new TIDs.
ALTER TABLE
Most ALTER TABLE operations work correctly:
| Operation | Zone map impact |
|---|---|
| ADD COLUMN | No impact |
| DROP COLUMN (non-PK) | No impact |
| RENAME COLUMN | No impact (including PK columns) |
| ALTER TYPE (non-PK) | Table rewrite; compact restores zone map |
| DROP PRIMARY KEY | Disables pruning; re-add PK + compact to restore |