- nav_order: 21
Spec: SIMD ADC And pgvectorscale DiskANN Comparison

layout: default title: SIMD ADC And DiskANN Comparison

nav_order: 21

Spec: SIMD ADC And pgvectorscale DiskANN Comparison

Status: proposed Risk tier: CAUTION Primary goal: separate local SIMD ADC optimization from an apples-to-apples comparison against pgvectorscale StreamingDiskANN.

Problem

The TODO shorthand combines two different tracks:

SIMD ADC lookup: make our compressed-vector scoring kernels faster.
pgvectorscale DiskANN comparison: benchmark against a graph index with PostgreSQL integration, SBQ compression, rescoring, and filtered-search features.

Treating them as one task would blur algorithmic quality, storage layout, execution model, and PostgreSQL integration overhead.

Current Local Evidence

PQ ADC

svec_pq_adc_lookup(dist_table, code) currently performs scalar lookup and accumulation:

for each subvector m:
    total += dist_table[m][code[m]]

This is simple and portable, but it is still scalar per candidate. The C-level svec_ann_scan(...) path avoids per-row fmgr overhead and should be the baseline before micro-optimizing the standalone SQL ADC function.

FlashHadamard ADC

FlashHadamard already has a packed byte-table scorer and a separate CPU kernel lab:

Apple/NEON int16 LUT is integrated behind FH_INT16=1 and showed a narrow end-to-end win on the validated local path.
Intel/AVX2 int16 LUT was refuted in the existing notes.
Intel/AVX2 float gather is promising in the standalone kernel lab but is not integrated into the engine.

The safe conclusion is hardware-specific: SIMD ADC optimization is worthwhile, but each kernel needs a platform-specific parity and latency gate.

Current pgvectorscale Baseline

As of the upstream README checked on 2026-05-06, pgvectorscale provides:

a diskann access method named StreamingDiskANN;
Statistical Binary Quantization storage layout by default;
label-based filtering through a smallint[] label column in the index;
arbitrary WHERE post-filtering;
query-time knobs such as diskann.query_search_list_size and diskann.query_rescore;
relaxed ordering by default, with materialized CTE reordering recommended when strict final distance order is required;
no UNLOGGED-table index support.

This is not a direct replacement for our ADC scorer. It is a PostgreSQL graph index baseline that should be compared at the product level.

Upstream source: https://github.com/timescale/pgvectorscale

Current Harness Status

scripts/bench_sorted_hnsw_vs_pgvector.sh now includes an optional pgvectorscale_diskann row:

if the vectorscale extension is unavailable, the script emits a benchmark_note|method=pgvectorscale_diskann|status=skipped|... line and still reports the exact, sorted_hnsw, and pgvector rows;
the DiskANN row currently requires pgv_storage=vector; halfvec runs emit an explicit skip note;
if vectorscale is available and registers the diskann access method, the script creates a diskann index on the same synthetic vector corpus;
DiskANN result timing is reported with strict_order=materialized_exact_reorder;
the index-size line includes pgvectorscale_diskann and bench_diskann_total, or skipped when the optional extension is absent.

Track A: SIMD ADC Optimization

A1. PQ ADC

Do not optimize svec_pq_adc_lookup(...) first if the measured path uses svec_ann_scan(...), because the standalone SQL function may not be the hot path.

Required first measurement:

svec_ann_scan(...) phase timing with sorted_heap.ann_timing = on;
candidate count;
M;
ADC time vs SPI fetch vs rerank.

Candidate optimizations:

unroll scalar PQ ADC for common M;
accumulate in float then widen once if quality is unchanged;
add NEON/AVX gather variants only behind compile-time/runtime dispatch;
keep scalar fallback as reference.

Parity gate:

exact same top-k order or bounded score delta against scalar reference;
run with at least two M values and two dimensions.

A2. FlashHadamard ADC

Continue the existing platform-specific path:

keep NEON int16 behind FH_INT16=1 until larger query/dataset validation;
integrate AVX2 gather only if it beats current engine path, not only the standalone microbench;
keep AVX2 int16 refuted unless fresh evidence contradicts it.

Parity gate:

compare top-k overlap, hit@1, recall@10, and score delta against the scalar packed scorer;
benchmark end-to-end, not only kernel throughput.

Track B: pgvectorscale DiskANN Benchmark

Required Methods

Benchmark on the same dataset and query set:

exact heap ground truth;
sorted_hnsw on svec or hsvec;
pgvector HNSW when dimension allows;
pgvectorscale diskann;
optional FlashHadamard packed exhaustive when a compatible store exists;
optional IVF-PQ residual when codebooks are trained.

Required pgvectorscale Settings

Record:

pgvectorscale version;
PostgreSQL version;
storage_layout;
num_neighbors;
search_list_size;
num_dimensions;
num_bits_per_dimension;
diskann.query_search_list_size;
diskann.query_rescore;
whether strict result reordering was applied.

Required Metrics

p50, p95, and average latency;
recall@10 and hit@1 against exact ground truth;
index size, table size, and total footprint;
build time;
memory-related build settings such as maintenance_work_mem;
filter mode: none, label-based, or arbitrary post-filter.

Acceptance Tests

D1. pgvectorscale harness is optional and fail-open

If vectorscale is not installed, the benchmark should skip the DiskANN row and report why. It must not make local PostgreSQL tests depend on an external extension.

D2. Strict-order mode is explicit

Because pgvectorscale can use relaxed ordering, the harness must record whether the final result set was reordered by exact distance before recall/latency reporting.

D3. SIMD kernels keep scalar fallback

Every SIMD ADC implementation must have:

scalar reference path;
platform guard;
parity test;
runtime or build-time disable switch.

D4. Product comparison includes footprint

No benchmark row is accepted without storage footprint. DiskANN, HNSW, FlashHadamard stores, PQ codebooks, and generated-code columns must all be counted.

Adversary Notes

Kernel microbench wins can disappear inside PostgreSQL due to SPI, tuple fetch, TOAST, cache warmup, or rerank overhead.
Relaxed ordering can make a DiskANN result look faster while returning a slightly unsorted top-k; strict-order mode must be visible.
Filtered DiskANN label support and sorted_heap partition routing solve different filtering problems. Compare both only on equivalent predicates.
SBQ/quantized graph storage and our FlashHadamard/PQ storage have different quality/footprint curves. A single recall number is insufficient.
pgvectorscale is an external moving target; benchmark docs must record the exact version or commit.

Decision

For 0.13, keep SIMD ADC and pgvectorscale DiskANN comparison as benchmark tracks. Do not claim superiority until a shared harness reports quality, latency, footprint, build cost, and strict-order behavior on the same workload.

PGXN

PostgreSQL Extension Network

Contents