Contents
- Changelog
- [1.5.0] - 2026-05-14
- First-class random variables
- Hybrid analytic + Monte Carlo evaluation
- Conditional inference
- Aggregation over random variables
- Studio companion release
- Internal
- Bug fixes
- GUCs (user-facing)
- New documentation
- ABI / compatibility
- [1.4.0] - 2026-05-09
- ProvSQL Studio companion release
- New compiled semirings
- Circuit introspection helpers
- agg_token rendering and the aggtoken_text_as_uuid GUC
- provsql.tool_search_path and external-tool robustness
- Bug fixes
- Documentation
- Infrastructure
- Upgrade procedure
- [1.3.1] - 2026-05-04
- Upgrade-script corrections
- Bug fixes
- Documentation
- Infrastructure
- Upgrade procedure
- [1.3.0] - 2026-05-04
- [1.2.3] - 2026-04-12
- [1.2.2] - 2026-04-11
- [1.2.1] - 2026-04-11
- [1.2.0] - 2026-04-10
- [1.1.0] - 2026-04-09
- [1.0.0] - 2026-04-05
Changelog
All notable changes to ProvSQL are documented
in this file. It mirrors the release-notes section of the website
(provsql.org/releases) and is kept in
sync by the release.sh release-automation script.
[1.5.0] - 2026-05-14
Major release headlining first-class continuous random-variable
columns and a hybrid analytic + Monte Carlo evaluator. The gate
ABI is extended (three new gate types appended; no renumbering of
older values); the mmap circuit format is otherwise compatible
and an ALTER EXTENSION provsql UPDATE is sufficient.
First-class random variables
A new random_variable type (a thin UUID wrapper,
binary-coercible with uuid) carries a probability distribution
per row. Constructors live in the provsql schema:
provsql.normal(μ, σ),provsql.uniform(a, b),provsql.exponential(λ),provsql.erlang(k, λ)for the four continuous families;provsql.categorical(probs, outcomes)for discrete categorical random variables;provsql.mixture(p, x, y)(two overloads: shared Boolean gate vs ad-hoc Bernoulli probability) for probabilistic mixtures;provsql.as_random(c)for deterministic point-mass lifts.
Implicit casts from integer, numeric, and double precision
to random_variable make WHERE reading > 2 work without an
explicit wrapper. Arithmetic operators + - * / and unary -
build gate_arith over the operands; comparison operators
< <= = <> >= > are intercepted at planning time and rewritten
into gate_cmp calls conjoined into each row’s provenance.
The new gate types gate_rv, gate_arith, gate_mixture are
appended to the gate_type enum (with a parallel append to the
SQL provenance_gate enum). gate_value gains a float8 mode
parsed via extract_constant_double in having_semantics.cpp.
Hybrid analytic + Monte Carlo evaluation
A three-stage evaluator decides every probabilistic query analytically where possible and falls back to Monte Carlo otherwise:
RangeCheckpropagates support intervals throughgate_arithand tests everygate_cmpagainst the propagated interval; decidable comparators collapse to Bernoulli leaves. A joint AND-conjunction pass intersects per-variable intervals across conjuncts before the decision.AnalyticEvaluatorcomputes the closed-form CDF for any single-distributiongate_cmp(Normal viaerf, Uniform by arithmetic, Exponential bylog1p/expm1, Erlang via the regularised lower incomplete gamma).Expectationsemiring runs analytical mean / variance / moments per distribution with structural-independence detection ongate_arith TIMESvia aFootprintCache.
The HybridEvaluator simplifier folds family-preserving
combinations (normals close under linear combination; sums of
i.i.d. exponentials with the same rate fold to Erlang; the
affine shapes -N, -U, c+N, c-N, N-c, c-U, U-c,
U+c fold via a MINUS → PLUS canonicalisation plus a uniform
shift-closure rule; c·X-style shifts thread through mixtures
and categoricals; single-child arith roots and semiring
identities collapse; deterministic gate_arith subtrees are
folded to gate_value at load time). The island decomposer
splits multi-cmp queries into independent sub-problems on
shared base-RV footprints. Whole-circuit Monte Carlo remains as
the safety net for anything not analytically tractable.
EQ/NE comparators take an analytical path whenever both sides
have extractable Dirac mass-maps with disjoint random-leaf
footprints, resolving to a sum-product over the discrete masses;
the same shortcut also widens to gate_arith composites and to
Bernoulli mixtures whose continuous arm fully covers the
support, so equality checks against an outcome of a categorical
or a mixture resolve symbolically.
Symbolic prints in the simplifier use std::to_chars
shortest-roundtrip formatting, so folds like 2 * Exp(0.4) now
print exponential:0.2 instead of 0.20000000000000001.
provsql.simplify_on_load (default on) runs the universal
peephole pass at load time, so every downstream consumer
(semiring evaluators, MC, view_circuit, PROV-XML export,
Studio) sees the simplified form.
Conditional inference
The polymorphic moment dispatchers expected / variance /
moment / central_moment / support all accept an optional
prov uuid DEFAULT gate_one() argument; passing provenance()
from inside a tracked query conditions on the row’s filter
event automatically. New companion C entry points
rv_sample(token, n, prov) (SRF over float8),
rv_histogram(token, bins, prov) (returning jsonb), and
rv_analytical_curves(token, prov, n_points) (SRF returning
(x, pdf, cdf) rows; mass-stems for discrete arms) expose
conditional samples, histograms, and closed-form PDF/CDF curves
for inspection and downstream analytics.
Closed-form truncated distributions cover Normal (Mills ratio),
Uniform (intersected support), Exponential (memorylessness on a
lower bound, finite-interval truncation via the lower incomplete
gamma), and Erlang (via the same regularised incomplete-gamma
machinery). The truncation pipeline also handles Bernoulli
mixtures (each arm truncated independently and the surviving mass
renormalised), categoricals (filtered outcomes plus rescaling),
and Diracs (kept or dropped against the conditioning event); a
universally-infeasible truncated subtree short-circuits to a
NaN-typed Dirac so downstream evaluators do not fire MC blindly.
On top of the moment fast paths, rv_sample and rv_histogram
take an inverse-CDF fast path on bare gate_rv conditional
events — Uniform / Exponential by memoryless inverse, Normal by
Beasley-Springer-Moro — bypassing MC entirely when the gate is a
single recognised distribution under a closed-form truncation.
Anything outside the closed-form table falls back to MC
rejection sampling at provsql.rv_mc_samples; a NOTICE (or,
for histograms / moments, an error) fires when fewer than the
requested n samples land within the budget.
Aggregation over random variables
Three RV-returning aggregates: sum, avg, product
(over random_variable). They lower to a single gate_arith
root over per-row gate_mixture children produced by the new
rv_aggregate_semimod helper. aggtype-based dispatch lets the
planner-hook recognise RV-returning aggregates and wrap the
per-row argument before the SFUNC sees it; the FFUNC pulls the
provenance back out of each mixture’s first child to build the
matching denominator (AVG) or to patch the multiplicative
identity into the else-branch (PRODUCT). The
INITCOND = '{}' convention lets each aggregate define its own
empty-group identity (as_random(0) for SUM, SQL NULL for
AVG, as_random(1) for PRODUCT).
HAVING clauses whose outcome collapses to a deterministic
scalar are supported natively, including the natural shape
HAVING expected(avg(rv)) > 20 (and the analogous
variance / moment / central_moment over an RV
aggregate). The planner skips the HAVING-lift on such quals
and lets PostgreSQL filter the surviving groups directly, while
the per-group gate_delta wrapper is still emitted so the
provenance shape is unchanged. Quals that compute on
agg_token results (the historical HAVING surface) continue
to route through having_Expr_to_provenance_cmp.
Studio companion release
ProvSQL Studio 1.1.0 ships in parallel on PyPI as
provsql-studio==1.1.0; minimum required extension version is
1.5.0. The new Studio features include the distribution-profile
panel (μ/σ², histogram, PDF/CDF toggle, wheel zoom) with a
closed-form analytical overlay drawn on top of the histogram
bars (terracotta SVG path for continuous arms, discs-on-stems
for Bernoulli mixtures / categoricals / Diracs, staircase
overlay in CDF mode), the Sample evaluator with conditional-MC
budget hints, the Condition on row-prov auto-preset,
simplified-circuit rendering driven by provsql.simplify_on_load,
Config-panel rows for monte_carlo_seed, rv_mc_samples,
simplify_on_load, and a footer that surfaces both the
extension and the Studio package versions (plus a new
--version CLI flag). See Studio release notes for details.
Internal
- Unified
migrate_probabilistic_qualsclassifier insrc/provsql.creplaces the historical pairmigrate_aggtoken_quals_to_having+extract_rv_cmps_from_quals; routes every qual through aqual_classenum (pure-RV, pure-agg, deterministic, plus mixed-error classes). gate_aggarm inmonteCarloRV::evalScalarunlocks HAVING+RV under Monte Carlo.gate_deltais transparent to the rv_* event walker inSampler::evalBoolandwalkAndConjunctIntervalsso the δ-semiring algebra and the random-variable algebra compose cleanly.getJointCircuitinMMappedCircuit.cppbuilds a multi-rooted BFS so sharedgate_rvleaves betweeninputandprovare loaded into a singleGenericCircuitand consequently couple correctly under MC rejection sampling.random_variableis now a thin wrapper overpg_uuid_twith bare-UUID text I/O and a binary-coercibleWITHOUT FUNCTIONcast to / fromuuid; the planner hook emits aRelabelTypeinstead of aFuncExpr. The historical cached-scalar field has been removed.runConstantFoldin the load-time simplifier pass folds any deterministicgate_arithsubtree to a singlegate_value(so e.g.arith(NEG, value:c)collapses tovalue:-cbeforeasRvVsConstCmplooks at the cmp).matchTruncatedSingleRv(inRangeCheck.h) factors the closed-form single-RV shape detection used bytry_truncated_closed_form/try_truncated_closed_form_sample/rv_analytical_curves, keeping the supported-shape set in sync across moments, sampling, and PDF/CDF curves.HybridEvaluator::double_to_textusesstd::to_charsfor shortest-roundtrip formatting of folded scalar coefficients.
Bug fixes
- Backend segfault at
verbose_level >= 20when deparsing anEXCEPT-rewritten tree.transform_except_into_joinwas leaving the synthesisedRTE_JOINwithNULLeref/joinaliasvars/joinleftcols/joinrightcols; execution was fine (outerVars reference the inputs directly) but the ruleutils deparser walks the rtable and crashed. All four fields are now populated on supported PostgreSQL versions (joinleftcols/joinrightcols/joinmergedcolsare guarded for PG < 13). New regressionverbose_setopscoversEXCEPTand non-ALLUNION. READM/READBinprovsql_mmap.cnow compareread()against(ssize_t)sizeof(...)so the size check no longer promotes to unsigned and masks short-read errors. File-local globals are markedstaticand the-Wmissing-variable-declarations/-Wunused-resultwarnings are clean.- Tree-mutator / tree-walker callbacks in
src/provsql.cnow takevoid *(PostgreSQL’s idiom) so clang’s-Wcast-function-type-strictno longer fires at everyexpression_tree_mutator/expression_tree_walkercall site; the deadcollect_rv_footprinthelper inHybridEvaluator.cppis dropped (its job is done byFootprintCacheinExpectation.cpp) and a baremove()inwhere_provenance.cppis qualified asstd::move(). Both gcc and clang now build clean.
GUCs (user-facing)
provsql.monte_carlo_seed(default-1): pinning seeds the MC sampler for reproducibility across runs and across the Bernoulli and continuous sampling paths.provsql.rv_mc_samples(default10000): sample budget for the analytical-evaluator MC fallback. Set to0to require analytical answers (the fallback then raises).provsql.simplify_on_load(defaulton): runs the universal peephole simplifier when circuits are loaded into memory.
provsql.hybrid_evaluation is debug-only (GUC_NO_SHOW_ALL);
end users have no reason to flip it.
New documentation
doc/source/user/continuous-distributions.rst: full user surface.doc/source/user/casestudy6.rst: The City Air-Quality Sensor Network, the first Studio-driven case study.doc/source/dev/continuous-distributions.rst: architecture companion.
ABI / compatibility
gate_typeenum extended (gate_rv,gate_arith,gate_mixtureappended; no renumbering).- mmap format compatible with 1.4.0.
random_variabletext I/O is bare UUID; the type is binary- coercible withuuid(cast declaredWITHOUT FUNCTION), so the on-disk and on-wire representations are identical touuid. The struct ispg_uuid_t.ALTER EXTENSION provsql UPDATEis sufficient.
[1.4.0] - 2026-05-09
Major release headlining the ProvSQL Studio companion (released
in parallel as provsql-studio 1.0.0 on PyPI) and a substantial
expansion of the compiled-semiring family. The mmap circuit format
is unchanged from 1.3.0; an ALTER EXTENSION provsql UPDATE is
enough.
ProvSQL Studio companion release
ProvSQL Studio, a
self-contained Flask/JS web UI for provenance inspection, ships in
parallel as pip install provsql-studio==1.0.0. Studio renders the
provenance DAG behind any UUID or agg_token cell, runs any
compiled semiring (or probability method or PROV-XML export) against
a pinned node, lights up the source rows of a Where-mode result via
hover, and prefils add_provenance / create_provenance_mapping
calls from the schema panel. Studio’s version stream is independent
of the extension’s; the
compatibility matrix
in the user guide records each Studio release’s minimum required
extension version (1.0.0 ↔ extension 1.4.0+). The Docker image
inriavalda/provsql:1.4.0 bundles both, exposes Studio on port 8000,
and replaces the legacy Apache + where_panel PHP UI.
New compiled semirings
Ten new sr_* evaluators land alongside the existing
sr_formula / sr_counting / sr_why / sr_boolexpr /
sr_boolean family. All are dispatched through the
provenance_evaluate_compiled C++ path, so they evaluate in a
single circuit traversal and respect circuit caching.
sr_how(token, mapping): canonicalN[X]polynomial provenance (how-provenance), the universal commutative-semiring carrier.sr_which(token, mapping): which-provenance / lineage: the set of input labels that influence the result.sr_tropical(token, mapping): tropical (min-plus) semiring onfloat8, returning the cost of the cheapest derivation.sr_viterbi(token, mapping): Viterbi (max-times) semiring onfloat8∈ [0, 1], returning the probability of the most likely derivation.sr_lukasiewicz(token, mapping): Łukasiewicz fuzzy semiring:+ = max,× = max(a + b − 1, 0)onfloat8∈ [0, 1], preserving crisp truth and avoiding the near-zero collapse of long product chains.sr_minmax(token, mapping, element_one)andsr_maxmin(token, mapping, element_one): min-max / max-min m-semirings over a user-defined enum carrier (security-classification shape and trust/availability shape, respectively). The third argument is a sample value of the carrier enum, used only for type inference.PG14+:
sr_temporal(token, mapping),sr_interval_num(token, mapping),sr_interval_int(token, mapping): interval-union m-semiring carriers overtstzmultirange,nummultirange, andint4multirange.sr_temporalsubsumes the oldunion_tstzintervals_*helpers (state functions, aggregates, monus) which have been removed; the user-facingunion_tstzintervals(token, mapping)wrapper is now a thinSELECT sr_temporal(...)call retained for backward compatibility, andtimetravel,timeslice,history, andget_valid_timenow callsr_temporaldirectly.sr_boolexprsignature change: the provenance-mapping argument is now optional (token2value regclass = NULL). When omitted, leaves are rendered as barex<id>placeholders. Existing one-argument callers continue to work unchanged.Paren elision in
sr_boolexprandsr_formulaoutput: redundant outer parentheses, parentheses around single-child subtrees, and parentheses around same-op nested subtrees are dropped at rendering time, so long expressions stay readable. The parsed expression is unchanged; only the textual form is shorter. Callers that grepsr_boolexproutput for exact paren counts will need to adjust.
Circuit introspection helpers
Two new SQL-level helpers expose the gate DAG so external tools can walk a bounded slice without copying the entire circuit; Studio uses them to render Circuit mode:
circuit_subgraph(root UUID, max_depth INT DEFAULT 8): returns(node, parent, child_pos, gate_type, info1, info2, depth)for the BFS-bounded subgraph rooted atroot, joiningget_gate_type/get_children/get_infosin a single recursive CTE and keeping every distinct DAG edge (a child reached fromkparents within the bound contributeskrows; self-joins contribute one row per child position).resolve_input(uuid UUID): returns(relation regclass, row_data jsonb)for the source row whoseprovsqlcolumn equalsuuid, by enumerating every provenance-tracked relation. Returns zero rows for non-input gates (plus,times,agg, …).
agg_token rendering and the aggtoken_text_as_uuid GUC
agg_token cells now have two render modes, controlled by a new
provsql.aggtoken_text_as_uuid GUC:
Off (default, unchanged): cells render as
value (*).On (typical for UI layers like Studio): cells render as the underlying UUID, so callers can click through to the provenance circuit; the
value (*)side is recovered via the newagg_token_value_text(uuid)helper, which returnsget_extra(token) || ' (*)'whentokenresolves to anagggate.
agg_token_out is consequently STABLE rather than IMMUTABLE
(the chosen output now depends on a session GUC).
provsql.tool_search_path and external-tool robustness
A new provsql.tool_search_path GUC (colon-separated directories
prepended to PATH when invoking external tools) replaces the
previous “tool must be on the postmaster’s PATH” assumption. The
external-tool dispatch layer also gains:
Pre-flight tool lookup with structured error decoding: calls fail fast with a clear message when a required tool is missing, instead of waiting for an opaque downstream error.
statement_timeouttranslation: astatement_timeoutthat fires during d4 / c2d / dsharp / weightmc compilation now becomes a proper SQLSTATE 57014 (query_canceled) cancel rather than a raw subprocess kill.SIGINT translation: Ctrl-C during
find_external_toolpre-flight is translated into a proper PostgreSQL cancel.Private mkdtemp dir: external tools now run in a per-call
mkdtempdirectory, closing a/tmprace on shared hosts.
Bug fixes
provenance_aggregateUUID collision under concurrent aggregation.SUM(id)andAVG(id)over the same children could collapse to a single agg gate, after which their concurrentset_infoscalls would overwrite each other’s aggregation operator (andprovsql_havingwould read the wrongagg_kindunder cross-backend contention). The aggregate function OID is now folded into the gate UUID so the two queries produce distinct gates.CircuitCachepoisoning under concurrent gate creation. A rare lost-write between two backends creating the same gate could leave the cache pointing at the wrong type / children pair. Fixed; the cache’s return-value contract is now aligned with what the callers expect (src/CircuitCache.cpp).CircuitCachepoisoning when callingget_gate_typebeforeget_children. A separate cache-coherence bug along theget_gate_type→get_childrenordering has been fixed.Where-provenance column position for multi-table joins. PROJECT-gate column positions were computed against the wrong RTE for multi-table joins, causing empty locator sets on some query shapes. Fixed.
1.2.3 → 1.3.0 upgrade WARNING text. The recovery-instructions block raised by the storage-layout check used
%splaceholders inside aRAISE WARNING, where the substitution marker is%(no type letter); the data-directory path was rendered with a straysappended (<datadir>sinstead of<datadir>). Fixed. The behaviour of the upgrade itself was unaffected.
Documentation
Studio user-guide chapter (
doc/source/user/studio.rst): a full walkthrough of Where mode, Circuit mode, the eval strip, and the Config panel, plus a compatibility matrix and screenshots. Cross-links from the introduction, semiring, and probability pages.Semirings chapter expansion (
doc/source/user/semirings.rst, +300 lines): new sections documentingsr_how,sr_which,sr_tropical,sr_viterbi,sr_lukasiewicz,sr_minmax/sr_maxmin, and the PG14+ interval-union family, with a capability matrix summarising each semiring’s identities and δ-handling.Expanded case studies: Case Study 1 gains a Minimum Security Clearance step that walks
sr_minmaxover aclassification_levelenum mapping; case study 4 has its temporal-DB code samples migrated fromunion_tstzintervalstosr_temporal; case studies 3 and 5 are realigned with the new paren-elidedsr_boolexpr/sr_formulaoutput.Developer guide: new Studio architecture chapter (
doc/source/dev/studio.rst) covering the Flask app layout, the/api/*surface, the auto-prepare andsearch_pathpinning strategy, and how Circuit mode walkscircuit_subgraph.Build-system chapter: new “Studio releases” section in
doc/source/dev/build-system.rstdocumenting Studio’s independent version stream, Trusted Publishing on PyPI, thestudio-v*tag workflow, and the hand-editedstudio/CHANGELOG.mddiscipline.
Infrastructure
Studio CI workflow (
.github/workflows/studio.yml): Python 3.10 / 3.11 / 3.12 / 3.13 × PostgreSQL 14 / 15 / 16 matrix covering pytest, Playwright e2e, ruff, and a wheel-install smoke.Studio release pipeline (
.github/workflows/studio-release.yml): tag-triggered (studio-v*), publishes to PyPI via Trusted Publishing, attaches sdist + wheel to a GitHub release, embeds the matchingstudio/CHANGELOG.mdsection in the release notes, and aborts loudly if the section is missing.Docker image rework: bundles Studio (PyPI install at image build time, contributor
STUDIO_SOURCE=override for editable installs); adds the PGDG apt source so anyPSQL_VERSIONresolves (Debian bookworm only ships 15); collapses the apt layer; replaces Apache +where_panel/withprovsql-studioon port 8000; parallelisesmakein the build.where_panel/removed. The legacy PHP where-provenance UI is superseded by Studio’s Where mode.
Upgrade procedure
make install
In each database that uses ProvSQL:
ALTER EXTENSION provsql UPDATE;
The mmap circuit format is unchanged from 1.3.0; for users already
on 1.3.x no migration is required. Users still on 1.2.x must run
provsql_migrate_mmap first to move the flat
$PGDATA/provsql_*.mmap files into the per-database layout
introduced in 1.3.0; see the 1.3.0 release notes
for the full procedure.
[1.3.1] - 2026-05-04
A bug-fix release focused on repair_key / mulinput correctness,
plus a corrected upgrade path from 1.2.3 and documentation additions
(a fifth case study and expanded material in case studies 1 and 2).
No on-disk format change relative to 1.3.0; an
ALTER EXTENSION provsql UPDATE is enough.
Upgrade-script corrections
sql/upgrades/provsql--1.2.3--1.3.0.sqlshipped with 1.3.0 only carried the per-database mmap migration warning and missed two groups of SQL-surface changes that had landed inprovsql.common.sql/provsql.14.sqlduring the 1.3.0 dev cycle: the lazy-input-gate refactor ofadd_provenance/repair_key(commitf670b7f) and the schema-qualifiedprovsql.time_validity_viewreferences intimetravel,timeslice,history, andget_valid_time(commit1f59032). Users on 1.2.3 who ranALTER EXTENSION provsql UPDATE TO '1.3.0'ended up with a stale set of function bodies. The script in 1.3.1 has been corrected and now applies all the missing changes; users still on 1.2.3 reach a clean 1.3.0-equivalent SQL surface when they upgrade after 1.3.1.sql/upgrades/provsql--1.3.0--1.3.1.sqlapplies the same catch-up changes on the 1.3.0 → 1.3.1 path so that users already on 1.3.0 (who came through the broken upgrade) are brought back in sync. Fresh installs of 1.3.0 also run this script, but the CREATE OR REPLACE statements match the source already on disk, so it is a no-op for them.
Bug fixes
probability_evaluate(..., 'tree-decomposition')on circuits containingmulinputgates. Input gates produced byrepair_keycould share an internal id when their UUIDs were never materialised in the d-DNNF builder, causing the probability to be wrong and to vary from one session to the next on identical data. The aliasing has been removed (src/BooleanCircuit.cpp,src/dDNNFTreeDecompositionBuilder.cpp); a regression test (test/sql/treedec_mulinput.sql) covers the affected query shapes.Off-by-one in
BooleanCircuit::rewriteMultivaluedGates. The splitter that turns amulinputinto a chain of independent Bernoulli inputs produced non-deterministic probabilities under self-join +GROUP BYqueries. Fixed; the four built-in evaluation methods (default,'possible-worlds','tree-decomposition','monte-carlo') now agree onmulinput-bearing circuits.Shapley and Banzhaf computation on
mulinputcircuits.shapley(),shapley_all_vars(),banzhaf(), andbanzhaf_all_vars()previously walked throughmulinputgates and returned meaningless values. They now raise a clear error identifying the unsupported gate type.
Documentation
New Case Study 5: The Wildlife Photo Archive. A 30-photo / 13-species / 63-detection synthetic dataset demonstrates the
VALUESclause,repair_keyand themulinputgate (with the numerical effect of mutual exclusion made explicit viasr_boolexprandprobability_evaluate), probabilistic ranking versus naive confidence thresholding,EXCEPTwith monus, common table expressions, andexpected()aggregates. The case study is bundled (no external data download) and is part of the regression suite.Case Study 1 gains three steps in the circuit-inspection section: a tree-decomposition probability variant in the benchmark step, an
sr_boolexprstep on the Nairobi monus token, and a programmatic circuit-inspection step usingget_nb_gates,get_gate_type,get_children, andidentify_token.Case Study 2 gains two steps: a bulk Shapley/Banzhaf step using
shapley_all_vars/banzhaf_all_vars(contrasted with the per-variable cross-join from the existing Steps 13 and 14), and a step on arithmetic over aggregate results illustrating theagg_tokencast warning.Copy-to-clipboard buttons on every documentation code block (
sphinx-copybutton). A small JS shim (doc/source/_static/copybutton-shim.js) papers over an incompatibility betweensphinx-copybutton0.4.0 (the version in Ubuntu Noble’s apt) and Sphinx 9.
Infrastructure
- Release tarballs and CI workflows exclude the
studio/subdirectory for future developments.
Upgrade procedure
make install
In each database that uses ProvSQL:
ALTER EXTENSION provsql UPDATE;
The mmap circuit format is unchanged from 1.3.0; no migration is required.
[1.3.0] - 2026-05-04
Breaking change: per-database circuit storage
Prior to 1.3.0, the provenance circuit was stored in four flat files at
the root of the PostgreSQL data directory ($PGDATA/provsql_gates.mmap,
provsql_wires.mmap, provsql_mapping.mmap, provsql_extra.mmap),
shared across all databases in the cluster. Starting with 1.3.0, each
database gets its own isolated set of files under
$PGDATA/base/<db_oid>/.
Users upgrading from 1.2.x must migrate their circuit data before
upgrading. The new provsql_migrate_mmap tool handles this. If the
migration is skipped, existing circuit data becomes inaccessible (new
provenance queries still work, but provenance computed under the old
version is lost). The upgrade script detects old flat files and raises a
WARNING with recovery instructions if they are still present.
Upgrade procedure
Install the new ProvSQL binaries:
make installRun the migration tool as the
postgresuser:provsql_migrate_mmap -D $PGDATA -c <connstr>The tool reads the old flat files, collects root UUIDs from each database’s provenance-tracked tables, writes per-database files under$PGDATA/base/<db_oid>/, and deletes the old flat files on success.Restart PostgreSQL.
In each database that uses ProvSQL:
sql ALTER EXTENSION provsql UPDATE;
If you forgot step 2
If PostgreSQL has already been restarted with the new binaries before migrating, some empty per-database files may have been created. To recover:
- Delete the empty per-database files:
rm -f $PGDATA/base/*/provsql_*.mmap - Restart PostgreSQL.
- Immediately run
provsql_migrate_mmapbefore executing any provenance query.
Lazy input gate creation
add_provenance() no longer eagerly writes an input gate to the circuit
for every existing row in the table at the time it is called. Gates are
now created on first reference during a query, at the cost of a small
overhead on the first query that touches each row. This significantly
reduces the overhead of provisioning large tables.
Four case studies
Four worked examples have been added to the documentation and are included as regression tests:
- Case Study 1: The Intelligence Agency: simple introductory example with Boolean and why-provenance.
- Case Study 2: The Open Science Database: comprehensive example covering why-provenance, where-provenance, custom semirings, probabilities, Shapley and Banzhaf values.
- Case Study 3: Île-de-France Public Transit: Boolean provenance and formula inspection over GTFS transit data.
- Case Study 4: Government Ministers Over Time: temporal provenance
with
union_tstzintervalsand time-validity views.
Bug fixes
- Fix GROUP BY provenance aggregation silently dropped when ORDER BY referenced the semiring result column.
- Fix d-DNNF tree decomposition: deduplicate OR gate children to prevent double-counting in probability evaluation.
- Fix NULL dereference and out-of-bounds crashes in where-provenance on views.
- Fix temporal functions (
time_filter,time_range,in_interval) to use schema-qualifiedprovsql.time_validity_view, preventing failures whensearch_pathdoes not include theprovsqlschema. - Fix
sr_booleanevaluation when the provenance mapping uses integer values. - Fix where-provenance PROJECT gate positions for provenance tables that are not the first RTE in a query, causing empty locator sets on some PostgreSQL versions.
[1.2.3] - 2026-04-12
PGXN improvements
Prevent indexing of secondary documentation directories (
doc/source/,doc/tutorial/,doc/demo/,doc/aggregation/,doc/temporal_demo/,where_panel/) on the PGXN distribution page viano_indexinMETA.json.Document PGXN as an installation channel in the user guide, with a note that
pgxn installdoes not configureshared_preload_libraries.Add a GitHub Actions workflow (
.github/workflows/pgxn.yml) that automatically publishes releases to PGXN on version-tag pushes.
Documentation and repository housekeeping
Add
CODE_OF_CONDUCT.md.Add architecture dataflow diagram to the website overview page.
Replace
sudowith generic “as a user with write access to the PostgreSQL directories” wording across installation and contribution documentation.
[1.2.2] - 2026-04-11
In-place extension upgrades
ALTER EXTENSION provsql UPDATE is now supported, starting with this
release. A committed chain of upgrade scripts under sql/upgrades/
covers every previous release (1.0.0 → 1.1.0 → 1.2.0 → 1.2.1 → 1.2.2),
so users on any historical version can upgrade in place without
dropping and recreating the extension. The persistent provenance
circuit (memory-mapped files) is preserved across the upgrade: the
on-disk format has been binary-stable since 1.0.0, and the relevant
headers (src/MMappedCircuit.h, src/provsql_utils.h) now carry
explicit warnings so future contributors don’t break that guarantee
by accident.
A pg_regress regression test (test/sql/extension_upgrade.sql)
exercises the full chain end-to-end on every PostgreSQL version in
the CI matrix, installing the extension at 1.0.0 from a frozen
install-script fixture and walking it up to the current
default_version. See the new “Extension Upgrades” section of the
developer guide for the workflow contributors should follow when
making SQL changes.
Repository housekeeping and discoverability
CHANGELOG.mdat the repository root, mirroring the release notes published at provsql.org/releases. It is automatically kept in sync byrelease.sh.GitHub issue and pull-request templates under
.github/. The bug-report form prompts for PostgreSQL version, ProvSQL version, OS, a minimal SQL reproducer, and optional verbose-mode output; the PR template carries a contributor checklist and links to the developer guide.DockerHub image-version badge added to the README (
inriavalda/provsql) and a prose pointer on the website overview page.PGXN
META.jsonat the repository root, making ProvSQL ready for submission to the PostgreSQL Extension Network. Submission will happen once upstream approval lands; no change to the build or install flow in the meantime.CITATION.cffnow carries the Zenodo concept DOI (10.5281/zenodo.19512786) and a Software Heritage archive URL in itsidentifiersblock.
Infrastructure
release.shlearned to updateCITATION.cff,CHANGELOG.md, andMETA.jsonin sync withprovsql.common.controlandwebsite/_data/releases.yml, and to enforce the presence of an upgrade script (auto-generating a no-op when no SQL sources have changed since the previous tag).- CI workflows now fetch git tags so
git describeworks inside the pgxn-tools containers, which unblocks the Makefile’s dev-cycle upgrade-script generation. - The four build / docs workflows'
paths-ignorelists excludeMETA.json,.github/ISSUE_TEMPLATE/**, and.github/pull_request_template.md, so metadata-only edits do not trigger the full CI matrix any more.
No SQL-level changes
There are no changes to sql/provsql.common.sql or
sql/provsql.14.sql in this release. The SQL API, query
rewriter, semiring evaluators, and probability machinery are
unchanged from 1.2.1. The upgrade script 1.2.1 → 1.2.2 is
accordingly an empty placeholder.
[1.2.1] - 2026-04-11
Maintenance release headlining the new developer guide and laying the groundwork for long-term archival and citation.
Highlights
Developer guide (14 chapters, ~3500 lines): PostgreSQL extension primer, architecture, query rewriting pipeline, memory management, where-provenance, data-modification tracking, aggregation semantics, semiring and probability evaluation (including the block-independent database model and the expected Shapley / Banzhaf algorithm from Karmakar et al., PODS 2024), coding conventions, testing, debugging, and the build system. Cross-references to Lean 4 machine-checked proofs of the positive-fragment rewriting rules and the m-semiring axioms. See the new Developer Guide tab in the documentation.
User guide updates: expanded coverage of
expected(), thechooseaggregate, custom semiring evaluation, and diagnostic functions. “Formula semiring” has been renamed to “symbolic representation” throughout.CITATION.cff: standard citation metadata at the repo root. GitHub now shows a Cite this repository button that emits BibTeX and APA for the ICDE 2026 paper.Software Heritage archival is active — the full repository history is continuously preserved at archive.softwareheritage.org.
Zenodo integration enabled: starting with this release, every tagged version receives a persistent DOI (10.5281/zenodo.19512786).
Fixes
create_provenance_mapping_viewis now available on all supported PostgreSQL versions, not only PG 14+.External-tool tests (
c2d,d4,dsharp,minic2d,weightmc,view_circuit_multiple) now skip cleanly when the tool is not installed, instead of being removed from the test schedule.
Infrastructure
- Automated documentation coherence check runs in CI (validates every
:sqlfunc:/:cfunc:/:cfile:/:sqlfile:reference resolves to a live Doxygen anchor). - Mobile-friendly Doxygen and Sphinx output.
- CI speedups: concurrency groups, skip-on-tags, macOS
pg_isreadyrace fix. - In-place extension upgrades via
ALTER EXTENSION provsql UPDATEare supported starting with this release; upgrade scripts live undersql/upgrades/and the path is exercised by an automated CI test.
[1.2.0] - 2026-04-10
This release focuses on providing broader and more consistent support for SQL language features within provenance tracking. Systematic testing across a wide range of query patterns led to numerous bug fixes, new feature support, and clearer error messages for unsupported constructs.
New Features
CTE support: Non-recursive
WITHclauses now fully track provenance. Nested CTEs (CTE referencing another CTE) and CTEs insideUNION/EXCEPTbranches are supported. Recursive CTEs produce a clear error message.INSERT ... SELECTprovenance propagation: When both source and target tables are provenance-tracked,INSERT ... SELECTnow propagates source provenance to the inserted rows instead of assigning fresh tokens. A warning is emitted when the target table lacks aprovsqlcolumn.Correct arithmetic and expressions on aggregate results from subqueries: Explicit casts (
cnt::numeric), arithmetic (cnt + 1), window functions (SUM(cnt) OVER()), and expressions (COALESCE,GREATEST, etc.) on aggregate results from subqueries now produce correct values with a warning, using the original aggregate return type from the provenance circuit.UNION ALL with aggregate columns:
UNION ALLof queries returning aggregate results now works correctly.
Bug Fixes
Fixed crash when mixing
COUNT(DISTINCT ...)withprovenance()orsr_formula(provenance(), ...)in the same query.Fixed
COUNT(*)returning NULL instead of0 (*)on empty results withoutGROUP BY.Fixed
provenance_cmpfunction failing with “function uuid_ns_provsql() does not exist” whenprovsqlwas not insearch_path.
Improved Error Messages
provenance_evaluateon unsupported gate types now reports the specific gate type and suggests using compiled semirings.Subquery errors now read “Subqueries (EXISTS, IN, scalar subquery) not supported” instead of the misleading “Subqueries in WHERE clause”.
Clear error messages for unsupported operations on aggregate results:
DISTINCTon aggregates,UNION/EXCEPT(non-ALL) with aggregates,ORDER BY/GROUP BYon aggregate results from subqueries.Dropped redundant “by provsql” suffix from all error messages (the “ProvSQL:” prefix is already present).
Documentation
Updated supported/unsupported SQL features list with accurate coverage based on systematic testing.
Added documentation for
INSERT ... SELECTprovenance propagation.Expanded aggregation documentation with examples of casts, window functions,
COALESCE, andGREATESTon aggregate results.Added workaround guidance for unsupported features (use
LATERALfor correlated subqueries, explicit cast for comparison on aggregates).
[1.1.0] - 2026-04-09
Support for arithmetic on aggregate results
Queries performing arithmetic on aggregate results (e.g.,
SELECT COUNT(*)+1 or SUM(id)*10) are now supported. Previously,
these queries produced incorrect results because the planner hook
replaced aggregate references with agg_token values without
adjusting surrounding operator type expectations. This is handled by
adding implicit and assignment casts from agg_token to standard SQL
types (numeric, double precision, integer, bigint, text),
and by inserting appropriate type casts during query rewriting when
aggregate results are used inside operators or functions. A warning
is emitted when provenance information is lost during such
conversions.
Infrastructure improvements
- Versioned Docker image tagging (images are now tagged with the
release version in addition to
latest). - Improved release process: post-release version bump is now automated, and release tarballs exclude non-essential files (CI workflows, release script, branding, Docker, and website assets).
- CI fixes for macOS and documentation builds.
[1.0.0] - 2026-04-05
Initial official release of ProvSQL after 10 years of development. ProvSQL is now fully documented and usable in production.