Contents
CHANGELOG
1.2.2
feat: ZXC compression (
compression='zxc') — adds support for the ZXC asymmetric codec (BSD-3-Clause). Write-Once Read-Many design: encoder is slow; decoder is SIMD-maximized (NEON on ARMv8+, AVX2/AVX-512 on x86_64). Decompression throughput vs LZ4: Neoverse-V2 +24%, x86_64 AMD EPYC +18%, Apple M2 +46%. Not yet in apt — build from source. Auto-detected byMakefile.global.feat: libdeflate compression (
compression='deflate') — adds support for libdeflate, a zlib-compatible codec with better throughput than the standard zlib. Available aslibdeflate-devon Ubuntu/Debian. Auto-detected byMakefile.global.build: all compression libraries are now optional — previously LZ4, ZSTD and libdeflate were hardcoded in
citus_config.h, causing link failures on systems without those libraries. All four codecs (LZ4, ZSTD, Deflate, ZXC) are now detected dynamically at build time viaMakefile.globalheader detection. The extension falls back to PostgreSQL’s built-inpglzwhen no external library is present. Default precedence when available:ZSTD > ZXC > LZ4 > Deflate > pglz.bench: aarch64 benchmark area (
tests/bench/aarch64/) — new directory with serial and parallel benchmark results on ARM Neoverse-N1 / Graviton2 (PostgreSQL 18.1, 1M rows). Includes results for all four compression codecs (ZSTD, LZ4, Deflate, ZXC) with comparison charts. Key finding: ZXC achieves fastest analytical read performance on aarch64 in 6/10 queries, beating even LZ4 despite slightly larger disk size (123 MB vs 118 MB), confirming its SIMD NEON advantage on ARM.
1.2.1
fix: GUC visibility —
storage_engine.enable_vectorization,enable_parallel_execution,enable_dml, andenable_engine_index_scanwere registered withGUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE, hiding them from\dconfigand psql tab-completion. Removed — all operational GUCs are now discoverable. Note: GUCs only take effect when the extension is listed inshared_preload_libraries.fix:
-Wmissing-variable-declarations—ColumnarScanPathMethods,ColumnarScanScanMethods, andColumnarScanExecuteMethodslackedexterndeclarations inengine_customscan.h, causing warnings (fatal with-Werror) under stricter compiler settings.fix:
table_beginscan5-argument compile error on PG16–18 — The PG19 API added a 5thflagsargument totable_beginscan. The call site inRCScan_BeginCustomScanis now guarded with#if PG_VERSION_NUM >= PG_VERSION_19. This error affected builds from thev1.2.0tag against PG16–18.fix:
statement_timeoutcancelsengine.smart_update/engine.colcompress_bulk_updatemid-run —set_config('statement_timeout', '0', false)was applied once before the stripe loop, but PostgreSQL resets session-level GUCs at eachCOMMITinside a procedure. Both procedures now re-apply the timeout overrides at the top of every loop iteration.feat:
engine.smart_updateparallel worker cap —max_parallel_workers_per_gatheris set tomax_parallel_workers / 2at procedure start, preventing the maintenance procedure from consuming the full parallel worker pool. Integer division: 0→0 (serial), 1→0 (serial), 2→1, 4→2, 16→8.
1.2.0
- feat:
index_scanper-table option forrowcompress—rowcompressnow supportsindex_scanas a per-table flag, providing feature parity withcolcompress. Default (false) keeps the analytical mode: range index paths are removed by the planner hook so queries use the batch-compressed sequential scan with batch-level min/max pruning. When set totrue, index scans are allowed (OLTP / document-store mode). New 6th argument toengine.alter_rowcompress_table_set()and new 5th argument toengine.alter_rowcompress_table_reset(). Theengine.rowcompress_optionsview now exposes theindex_scancolumn. Upgrade viaALTER EXTENSION storage_engine UPDATE TO '1.2'.
1.1.5
- compat: PostgreSQL 19 support —
storage_engine.sonow compiles and runs on PostgreSQL 19 (devel). README compatibility table updated. - fix: META.json PGXN license field — changed
licensevalue to the PGXN-recognized stringagpl_3.
1.1.4
- fix:
ORDER BYsilently dropped with parallelColcompressScan— When a query hadORDER BYand the planner chose a parallelColcompressScan, PostgreSQL emittedGather(ColcompressScan)without anySortnode above it, returning rows in arbitrary worker-completion order instead of the requested order. Root cause:ColcompressScanpaths havepathkeys = NIL(columnar data has no inherent physical order), sogenerate_useful_gather_paths()found no pre-sorted partial paths and could not buildGather Merge. Fix: whenroot->query_pathkeys != NIL, aSort(ColcompressScan)partial path is added topartial_pathlistalongside the unsorted one. The planner can now chooseGather Merge(Sort(ColcompressScan))and correctly satisfiesORDER BY. - fix: double
_PG_init()when Citus is inshared_preload_libraries— On PG15 the Citus APT package dynamically loadscitus_columnar.soviadlopen()at load time, which re-entered_PG_init()for any co-loaded extension. This caused:ERROR: attempt to redefine parameter "storage_engine.compression"andERROR: extensible node type "ColumnarScan" already exists. Fix: addedGetConfigOption()early-return guard inengine_guc_init()and anif (GetConfigOption(...) == NULL)block guard inengine_customscan_init(), mirroring theGetCustomScanMethods()guard already in place forRegisterCustomScanMethods. The init functions are now idempotent.
1.1.3
- fix: remove
citus_config.hdependency from vendored safeclib —safeclib/safeclib_private.hincludedcitus_config.h(generated by Citus./configure), causing a fatal compile error on clean clones:fatal error: citus_config.h: No such file or directory. Replaced with inline#definemacros for the standard POSIX feature flags it provided. - fix: suppress
-Wdeclaration-after-statementwarnings — added-Wno-declaration-after-statementtoMakefile.global; the codebase uses C99 mixed declarations which are valid for PostgreSQL extensions. - cleanup: remove unused static functions —
IsIndexPath,RCFindBatchForRowNumber,rowcompress_estimate_rel_size, androwcompress_relation_set_new_filenode_compatwere declared/defined but never called, producing-Wunused-functionwarnings.
1.1.2
- fix: remove stray
#include "citus_version.h"from source files —citus_version.his a file generated by the Citus./configurestep and is not present in a clean clone. Its absence caused a fatal compile error:fatal error: citus_version.h: No such file or directory. Removed from all eight translation units that referenced it. TheHAVE_CITUS_LIBLZ4macro (also defined in that header) was replaced with the standard PostgreSQLHAVE_LIBLZ4macro throughout.
1.1.1
- fix: remove Citus autoconf build artifacts — the root
Makefilewas the Citus 11.1devel toplevel Makefile and required./configure(a Citus-specific autoconf script) to be run before any build could proceed. This causedconfigure: error: C compiler cannot create executablesand other Citus-specific probe failures for users with non-standard toolchains (ccache without a backing compiler, aarch64/ARM Linux, NixOS, etc.). The rootMakefileis now a simple delegator tosrc/backend/engine. A portable, pre-generatedMakefile.globalis now tracked in the repository and usespg_configfromPATH— no./configurestep is needed. The six Citus autoconf artifacts (configure,configure.in,autogen.sh,aclocal.m4,Makefile.global.in,src/include/citus_config.h.in) are removed from the repository. Build is now simply: ```bash sudo make -j$(nproc) installor with an explicit pg_config:
PG_CONFIG=/usr/lib/postgresql/17/bin/pg_config sudo make install ```
1.1.0
- feat:
RowcompressScancustom scan node with batch-level min/max pruning —rowcompresstables now support apruning_columnparameter (engine.alter_rowcompress_table_set(tbl, pruning_column := 'col')). When set,RowcompressScanrecords the serialised min/max value of the pruning column per batch duringengine.rowcompress_repack()or bulk inserts, storing them inengine.row_batch.batch_min_value/batch_max_value. At scan time, batches whose range does not intersect the query predicate are skipped entirely — no decompression, no I/O. The new GUCstorage_engine.enable_custom_scan(defaulton) controls whetherRowcompressScanis injected by the planner hook. - feat:
engine.rowcompress_repack(tbl)— utility function that rewrites all batches of arowcompresstable in sorted order by thepruning_column, maximising pruning efficiency for range queries (e.g. date, timestamp, bigint sequences). - schema:
engine.row_options.pruning_attnum— new nullableint2column; stores the 1-based attribute number of the pruning column. - schema:
engine.row_batch.batch_min_value/batch_max_value— new nullablebyteacolumns; store serialised type-agnostic min/max statistics per batch. - upgrade:
ALTER EXTENSION storage_engine UPDATE TO '1.1'applies the schema changes viastorage_engine--1.0--1.1.sql.
1.0.10
- fix: pg_search (ParadeDB) BM25 transparent compatibility —
IsNotIndexPathinengine_customscan.cnow preservesCustomPathnodes whoseCustomNameequals"ParadeDB Base Scan". Previously,RemovePathsByPredicate(rel, IsNotIndexPath)discarded pg_search’s planner path, causing the@@@operator to fall through as aFilterinsideColcompressScan, which then failed with “Unsupported query shape”. BM25 full-text search on colcompress tables now works transparently — no need forSET storage_engine.enable_custom_scan = false.pdb.score(),pdb.snippet(),===, and multi-fieldAND @@@all work correctly.ColcompressScancontinues to handle all other query shapes (projection pushdown, stripe pruning, parallel scan) without change.
1.0.9
- docs: pg_search 0.23 (ParadeDB) compatibility — colcompress tables are fully
compatible with pg_search BM25 full-text search. The BM25 index (
CREATE INDEX USING bm25) works transparently viaindex_fetch_tuple;@@@,===,pdb.score(), andpdb.snippet()all function correctly. To avoidColcompressScanintercepting the planner before pg_search’sParadeDB Base Scanpath is selected, useSET storage_engine.enable_custom_scan = falsefor queries that use@@@. A future release will auto-detect the@@@operator inColumnarSetRelPathlistHookand skip the hook transparently. - docs: native regex alternative to BM25 for analytics —
~*(POSIX case-insensitive regex) on colcompress tables usesColcompressScanwith full parallelism and stripe-level projection pushdown, achieving the same recall as BM25 at 3× lower latency (60 ms vs ~200 ms for 150k rows, 8 parallel workers). Prefer~*over@@@for counter/aggregation patterns; reserve BM25 for ranked retrieval and fuzzy matching. - bench: updated serial and parallel benchmark results; added baseline CSV for regression tracking.
1.0.8
- fix:
UPDATEduplicate-key error on colcompress tables with unique indexes —engine_index_fetch_tuplenow consults the in-memoryRowMaskWriteStateMapbitmask before falling back toColumnarReadRowByRowNumberfor flushed stripes. Previously,engine_tuple_update()marked the old row deleted (viaUpdateRowMask) and immediately inserted the new version; the unique-constraint recheck viaindex_fetch_tupleread a stale pre-deletion snapshot from the B-tree entry’s old TID and returned “tuple still alive”, causing a spurious duplicate-key error on everyUPDATE. - fix: deleted rows visible within same command —
engine_tuple_satisfies_snapshotnow also consultsRowMaskWriteStateMap, so rows deleted within the current transaction are correctly reported as invisible during the same command, preventing false positives in constraint checks. - fix: OOM crash in
engine_tuple_updatewith large VARLENA columns —ColumnarWriteRowInternaladds a memory-based flush guard: if thestripeWriteContextexceeds 256 MB (SE_MAX_STRIPE_MEM_BYTES), the current stripe is flushed before buffering the next row. This prevents OOM crashes when stripe row-count limits are generous but rows carry large VARLENA columns (XML, JSON, PDF).
1.0.7
- fix: GIN
BitmapHeapScanbypassesColcompressScanwithrandom_page_cost=1.1— On NVMe-tuned servers (random_page_cost=1.1), the planner preferred a GINBitmap Heap ScanoverCustom Scan (ColcompressScan)for analytical queries with JSONB@>or array@>predicates whenindex_scan=false. This caused +195–237% regression in serial mode vs baseline (Q6 JSONB: 163ms→479ms, Q8 array: 123ms→414ms). Fixed by adding adisable_cost(1e10) penalty to everyBitmapHeapPathinCostColumnarPathswhenindex_scan=false, symmetric with the existing penalty forIndexPath. Tables withindex_scan=trueare unaffected. Fix confirmed: serial Q6 175ms (-63%), Q8 141ms (-66%). - fix:
index_scan=falsegate missing inengine_reader.cchunk loader — The single-chunk targeted loading optimisation (ColumnarReadRowByRowNumber) was activating unconditionally, including on analytics tables whereindex_scan=false. AddedindexScanEnabledfield toColumnarReadState, populated fromReadColumnarOptionsinColumnarBeginRead, and gated the single-chunk optimisation onreadState->indexScanEnabled. - fix:
BitmapHeapPathpenalty also applied topartial_pathlist— parallel bitmap heap paths were not being penalised, allowing GIN scans via parallel workers to bypassColcompressScaneven withindex_scan=false. - fix: infinite loop in index scan point lookup —
ColumnarReadRowByRowNumbercould loop forever when the requested row number fell beyond the last stripe, producing a hang with no error output. - fix: index scan cost at chunk granularity —
ColumnarIndexScanAdditionalCostnow computesperChunkCostinstead ofperStripeCost, eliminating the ~15× cost inflation that caused the planner to always rejectIndexScanoverColcompressScanfor selective point lookups on wide columnar tables. - fix: use projected column count in
ColumnarIndexScanAdditionalCost— replacedRelationIdGetNumberOfAttributeswithlist_length(rel->reltarget->exprs), so wide tables with large blob columns (XML/JSON) no longer inflate index scan cost beyond the full-scan cost, restoring planner choice forindex_scan=truetables. - fix: remove stray
randomAccessPenaltyfromColumnarIndexScanAdditionalCost— the per-row penalty (estimatedRows * cpu_tuple_cost * 100) was dead code whenindex_scan=false(path already blocked bydisable_cost) but was still evaluated whenindex_scan=true, causing the planner to always chooseSeqScanoverIndexScanregardless of selectivity. Removed entirely.
1.0.6
- fix:
index_scan=falsebypassed byParallel Index Scan—CostColumnarPathsonly iteratedrel->pathlist, leavingrel->partial_pathlist(parallel paths) untouched. When a B-tree index existed on a colcompress table, the planner choseParallel Index Scaneven withindex_scan=false, bypassing stripe pruning entirely. Fixed by iteratingrel->partial_pathlistinCostColumnarPathsand applyingdisable_cost(1e10) to everyIndexPathfound there. - fix:
disable_costforindex_scan=falseserial paths — replaced the proportional penalty (estimatedRows * cpu_tuple_cost * 100.0) with PostgreSQL’s canonicaldisable_costconstant (1e10), matching the behaviour ofSET enable_indexscan = off. The old penalty was smaller than the seq-scan cost for low-selectivity queries (~4% of rows), so the planner still preferredIndexScanoverColcompressScan. - bench: updated serial and parallel benchmark results and charts (1M rows, PostgreSQL 18, 4 access methods).
1.0.5
- fix: EXPLAIN + citus SIGSEGV —
IsCreateTableAs(NULL)calledstrlen(NULL)when citus passedquery_string=NULLinternally; added NULL guard. AddedIsExplainQueryguard to skipPlanTreeMutatorfor EXPLAIN statements. FixedT_CustomScanelse branch to recurse intocustom_plansinstead ofelog(ERROR). - fix: stripe pruning bypassed by btree indexes — when a btree index existed on a
colcompress table, the planner chose
IndexScanwithrandomAccess=true, which disabled stripe pruning entirely. Fixed by strengtheningColumnarIndexScanAdditionalCostwith a per-row random-access penalty (estimatedRows * cpu_tuple_cost * 100.0), steering the planner back to seq scan. - perf:
ColumnarIndexScanAdditionalCostper-row penalty — discourages index scans on large colcompress tables where full-stripe pruning is more efficient. - docs: benchmark kit — added
tests/bench/with setup SQL, serial/parallel run scripts, chart generators, and result PNGs; addedBENCHMARKS.mdwith full analysis. - docs: README — citus load order note, btree/stripe-pruning Known Limitation, Benchmarks section, corrected install path.
1.0.4
- chore: bump version to 1.0.4 (PGXN meta).
- docs: benchmark results — heap vs colcompress vs rowcompress vs citus_columnar.
1.0.3
- perf: stripe-level min/max pruning for colcompress scans — before reading
any stripe, the scan aggregates the per-column min/max statistics from
engine.chunkacross all chunks of the stripe and tests the resulting stripe-wide ranges against the query’s WHERE predicates usingpredicate_refuted_by. Any stripe whose range is provably disjoint from the predicate is skipped entirely — no decompression, no I/O. The pruned count is shown inEXPLAIN:
Engine Stripes Removed by Pruning: N
Pruning applies to both the serial scan path and the parallel DSM path
(parallel workers only receive stripe IDs that survive the filter).
Effectiveness scales directly with data sortedness; combine with
engine.colcompress_merge() and the orderby table option to maximise it.
1.0.2
- fix: index corruption during
COPYinto colcompress tables —engine_multi_insertwas callingExecInsertIndexTuples()internally, while COPY’sCopyMultiInsertBufferFlushalso calls it aftertable_multi_insertreturns. The double insertion corrupted every B-tree index on tables loaded viaCOPY. Fixed by removing all executor infrastructure from the per-tuple loop; index insertion is the caller’s responsibility, matchingheap_multi_insertsemantics. - fix: index corruption when
orderbyand indexes coexist — when sort-on-write is active,ColumnarWriteRow()buffers rows and returnsCOLUMNAR_FIRST_ROW_NUMBER(= 1) as a placeholder for every row. The executor then indexed all rows with TID(0,1), making every index lookup return the first row. Fixed inengine_init_write_state(): sort-on-write is disabled when the target relation hasrelhasindex = true. Tables with indexes already have fast key access; sort ordering is redundant and was silently lethal. - perf: fast
ANALYZEvia chunk-group stride sampling — samples at mostN / stridechunk groups (stride = max(1, nchunks / 300)) instead of reading the entire table, makingANALYZEon large colcompress tables milliseconds instead of minutes.
Migration note (1.0.1 → 1.0.2): any colcompress table that has indexes and was written with
COPYorcolcompress_mergeusing a prior version must be rebuilt:REINDEX TABLE CONCURRENTLY <table>;
1.0.1
- fix:
multi_insertnow setstts_tidbefore opening indexes, and explicitly callsExecInsertIndexTuples()— previously B-tree entries received garbage TIDs duringINSERT INTO ... SELECT, causing index scans to return wrong rows. Tables populated before this fix requireREINDEX TABLE CONCURRENTLY. - fix:
orderbysyntax is now validated atALTER TABLE SET (orderby=...)time instead of at merge time, giving an immediate error on bad input. - fix: CustomScan node names renamed to avoid symbol collision with
columnar.sowhen both extensions are loaded simultaneously. - fix: corrected SQL function names for
se_alter_engine_table_set/se_alter_engine_table_reset(C symbols were mismatched). - fix: added
safeclibsymlink undervendor/somemcpy_sresolves correctly at link time. - add:
META.jsonfor PGXN publication.
1.0.0
Initial release of storage_engine — a PostgreSQL table access method extension derived from Hydra Columnar and extended with two independent access methods:
- colcompress — column-oriented storage with vectorized execution, parallel
DSM scan, chunk pruning, and a MergeTree-style per-table sort key (
orderby). - rowcompress — row-compressed batch storage with parallel work-stealing scan and full DELETE/UPDATE support via a row-level mask.
Additional features added beyond the upstream:
- per-table
index_scanoption (GUCstorage_engine.enable_index_scan) - full DELETE/UPDATE support for colcompress via row mask
- parallel columnar scan wired through DSM
- GUCs under the
storage_engine.*namespace - support for PostgreSQL 16, 17, and 18