layout: default title: Large-Vector Sublinear Search

nav_order: 20

Spec: Large-Vector Sublinear Search

Status: proposed Risk tier: CAUTION Primary goal: revive IVF-PQ / SQ-family work only where it beats the current defaults on latency, recall, and footprint for very large vector tables.

Problem

The current vector story has three different regimes:

sorted_hnsw is the stable planner-integrated default.
FlashHadamard is a documented experimental packed exhaustive-search lane.
IVF-PQ is a legacy/manual path that can scan compact PQ codes inside physically clustered IVF partitions.

At 103K x 2880D, current docs already record that exhaustive packed FlashHadamard and sorted_hnsw are stronger product defaults than IVF-style pruning. Re-enabling IVF-PQ as a headline feature at that scale would reopen a refuted branch.

The valid revival target is larger tables, where scanning every packed vector per query becomes memory-bandwidth-bound and sublinear routing can pay for its complexity.

Current Evidence

Validated anchors in this repository:

svec_ann_scan(...) already implements C-level IVF-PQ scan: IVF probe, SPI candidate fetch, ADC over PQ codes, top-k selection, and optional exact rerank.
Residual PQ is implemented and documented. It improves intra-partition fidelity at the cost of one distance table per probed centroid.
Physical clustering by (partition_id, id) maps IVF probes to sorted_heap block ranges, so zone-map pruning can skip non-probed partitions.
On 103K x 2880D, IVF/pruning is not the recommended default; current FlashHadamard notes explicitly park pruning at that scale.
sorted_hnsw.build_sq8 and the sidecar HNSW cache already use SQ8 as an internal memory-reduction substrate.
No stable SQ4 row-store or planner-integrated SQ4 scan exists today.

Revival Contract

Do not promote a sublinear lane unless it beats at least one current default in a clearly named operating regime.

Required comparison axes:

latency: p50, p95, and average query time;
quality: hit@1 and recall@10 against exact heap ground truth;
footprint: heap, index/sidecar, codebook, and auxiliary storage;
build/training time;
operational complexity: rebuild/restore behavior, generated columns, codebooks, compaction, and partition routing.

Required baselines:

exact heap search;
sorted_hnsw on svec and/or hsvec;
pgvector HNSW when the dimension is inside pgvector’s envelope;
FlashHadamard packed exhaustive search when a compatible store exists;
IVF-PQ residual with exact rerank.

Candidate Lanes

Lane A: IVF-PQ residual, explicit manual API

This is the lowest-risk revival because the primitives already exist.

Use when:

table size is large enough that exhaustive packed scan is bandwidth-bound;
training and generated columns are acceptable;
storage footprint matters more than lowest single-query latency;
the application can tune nlist, nprobe, M, and rerank_topk.

Do not hide this behind the generic planner until filtered/routed underfill and global-merge semantics are proven.

Lane B: SQ8 candidate scan + exact rerank

SQ8 is a good hardware-native rerank/filter substrate because it is simple, SIMD-friendly, and already used by sorted_hnsw internals.

Potential shape:

store per-row SQ8 codes in a sidecar or generated column;
scan only routed partitions or segments;
exact-rerank a bounded candidate pool from heap vectors;
compare against sorted_hnsw and FlashHadamard at equal recall.

This may be a better first product lane than SQ4 because quality loss is lower and implementation risk is smaller.

Lane C: SQ4 / 4-bit packed scan

SQ4 should not be introduced as plain scalar 4-bit quantization by default. Existing FlashHadamard evidence suggests the 4-bit lane needs rotation, equalization, packed layout, and careful rerank to be competitive.

Valid SQ4 path:

reuse or adapt the FlashHadamard packed scorer/store;
define a PostgreSQL-safe storage lifecycle;
prove quality across more than the current Gutenberg operating point;
include exact rerank gates.

Invalid SQ4 path:

add a raw 4-bit scalar generated column and assume it will be good enough.

Scale Gates

S1. 103K regression gate

Purpose: prevent regression against known evidence.

Expected:

IVF-PQ may remain documented and callable;
it does not become the default recommendation at this scale;
any new SQ path must beat or clearly complement current sorted_hnsw / FlashHadamard rows before promotion.

S2. 500K to 1M product gate

Purpose: find where exhaustive packed scan stops being the best latency / quality / footprint tradeoff.

Benchmark requirements:

at least one real high-dimensional corpus;
at least one lower-dimensional ANN-Benchmarks corpus;
fixed query set and exact heap ground truth;
cold-build and warm-query metrics;
footprint reported from PostgreSQL relation sizes and sidecar files.
synthetic baseline runs should start from make bench-large-vector-synthetic; its defaults are intentionally small for smoke checks, while 500K / 1M runs are selected by overriding BENCH_LARGE_VECTOR_ROWS, BENCH_LARGE_VECTOR_QUERIES, and BENCH_LARGE_VECTOR_DIM.

Promotion condition:

sublinear lane is faster than exhaustive packed scan at comparable recall, or materially smaller than sorted_hnsw at acceptable recall;
benchmark is reproducible from a documented script.

S3. Partitioned large-table gate

Purpose: connect sublinear vector search to the native partitioning work.

Expected:

routing by tenant/time/segment maps to whole partitions or IVF partitions;
selected leaves/shards are globally merged and exact-reranked;
local top-k concatenation is not accepted as a correctness contract.

Implementation Plan

Extend the real-dataset benchmark harness with an opt-in IVF-PQ residual row. Baseline flag: scripts/bench_ann_real_dataset.py --enable-ivfpq. The row reports training time, table build time, compact time, codebook footprint, p50/p95/average latency, recall@K, and rerank settings.
Add a larger synthetic scale harness that can generate 500K and 1M vectors without external downloads, with deterministic query and exact GT sampling. Baseline entry point: make bench-large-vector-synthetic, which reuses the existing exact/sorted_hnsw/pgvector/optional-DiskANN harness.
Compare three candidates on the same harness: sorted_hnsw, IVF-PQ residual, and FlashHadamard packed exhaustive. Baseline flag: scripts/bench_ann_real_dataset.py --enable-flashhadamard. This is an offline packed-storage comparator; it does not claim PostgreSQL storage lifecycle integration.
Only after a winning region is identified, design a product API or planner integration. Until then, keep IVF-PQ as legacy/manual.

Acceptance Tests

L1. IVF-PQ harness row is reproducible

Run the real-dataset harness on a small sample with IVF-PQ enabled.

Smoke shape:

python3 scripts/bench_ann_real_dataset.py --sample-size 1000 --queries 5 --skip-zvec --skip-qdrant --enable-ivfpq --ivfpq-nlist 32 --ivfpq-nprobe 4 --enable-flashhadamard

For offline smoke tests without downloading ANN-Benchmarks data, pass --vectors /path/to/sample.npz where the file contains base and queries 2-D float arrays, or run make bench-ann-matrix-offline-smoke for the small deterministic matrix smoke.

Expected:

codebooks train;
generated columns populate;
compact completes;
svec_ann_scan(...) returns k rows per query;
exact ground truth comparison is reported.
FlashHadamard reports p50/p95/average latency, hit@1, recall@K, encode time, bytes/vector, metadata, and packed scorer backend when enabled.

L2. Recall/latency matrix is apples-to-apples

For every reported method:

same vectors;
same query set;
same k;
same exact heap ground truth;
warm-query timing excludes build/training unless explicitly labeled.

L3. Large-scale winner must beat a baseline

For any promotion claim:

state the exact scale and dimension;
show the baseline it beats;
show recall and footprint, not only latency.

Adversary Notes

IVF can look fast by scanning too few partitions and silently losing recall. Recall gates must be mandatory.
PQ can look small while moving cost into codebooks, generated columns, and exact rerank heap fetches. Footprint must include all auxiliary storage.
SQ8/SQ4 can look better in cache-only experiments than in PostgreSQL because SPI, TOAST, sidecar lifecycle, and MVCC integration dominate.
FlashHadamard pruning failures at 103K do not prove pruning is bad at 1M; they prove the scale boundary matters.
Partition fanout needs global exact rerank. Local top-k concat can under-rank cross-partition candidates.

Decision

For 0.13, keep sorted_hnsw as the stable default and IVF-PQ as legacy/manual. Treat IVF-PQ/SQ revival as a benchmark-gated large-scale track, not as an immediate planner feature.

PGXN

PostgreSQL Extension Network

Contents

nav_order: 20

Spec: Large-Vector Sublinear Search

Problem

Current Evidence

Revival Contract

Candidate Lanes

Lane A: IVF-PQ residual, explicit manual API

Lane B: SQ8 candidate scan + exact rerank

Lane C: SQ4 / 4-bit packed scan

Scale Gates

S1. 103K regression gate

S2. 500K to 1M product gate

S3. Partitioned large-table gate

Implementation Plan

Acceptance Tests

L1. IVF-PQ harness row is reproducible

L2. Recall/latency matrix is apples-to-apples

L3. Large-scale winner must beat a baseline

Adversary Notes

Decision