Contents
- Changelog
- [0.4.0] — 2026-05-15
- [0.3.1] — 2026-05-13
- [0.3.0] — 2026-05-10
- [0.2.2] — 2026-05-10 (candidate)
- [0.2.1] — 2026-05-09
- [0.2.0.1] — 2026-05-04
- [0.1.4.1] — 2026-05-04 (maintenance branch)
- [0.2.0] — 2026-05-04
- [0.1.4] — 2026-05-04
- [0.1.3] — 2026-04-29
- [0.1.2] — 2026-04-28
- [0.1.1] — 2026-04-27
- [0.1.0] — 2026-04-26
- [0.0.1] — 2026-04-20
Changelog
All notable changes to pgmnemo are documented here.
Format follows Keep a Changelog.
[0.4.0] — 2026-05-15
Theme
Hybrid retrieval promoted to default — significant lift on conversational memory workloads (LoCoMo), neutral on dense multi-doc retrieval (LongMemEval).
Bench verdict
Real-DB benchmarks via the new router (benchmarks/gate/v0.4.0.json) vs
v0.3.0 baseline:
| Bench / scope | Metric | v0.3.0 | v0.4.0 | Δpp | p_corr | Verdict |
|---|---|---|---|---|---|---|
| LoCoMo session OVERALL | recall@5 | 0.6623 | 0.7230 | +6.07 | 0.0010 | 🟢 IMPROVED |
| LoCoMo session OVERALL | recall@10 | 0.7951 | 0.8409 | +4.15 | 0.0156 | 🟢 IMPROVED |
| LoCoMo session OVERALL | MRR | 0.5569 | 0.6365 | +7.96 | <0.0001 | 🟢 IMPROVED |
| LoCoMo session open_domain | recall@5 | 0.7176 | 0.7907 | +7.31 | 0.0148 | 🟢 IMPROVED |
| LoCoMo session open_domain | MRR | 0.5688 | 0.6667 | +9.79 | 0.0009 | 🟢 IMPROVED |
| LongMemEval OVERALL | recall@10 | 0.9334 | 0.9334 | +0.00 | 1.0000 | neutral |
| LongMemEval OVERALL | MRR | 0.8472 | 0.8521 | +0.49 | 1.0000 | neutral |
| LoCoMo segment | (all) | (unchanged) | (unchanged) | 0.00 | 1.0000 | neutral |
5 significant improvements, 0 regressions across 24 LoCoMo session cells. LongMemEval and LoCoMo segment hold steady — hybrid doesn’t trigger when query_text is NULL or when dense retrieval is already saturated (bge-m3 on LME).
Honest scope (COMPETITIVE_REALITY.md updated):
- ✅ Significant lift on conversational dialog retrieval (LoCoMo paper-canonical)
- ✅ No regression on any benchmark
- ❌ Does NOT close the BM25 gap on LongMemEval (BM25=0.982, pgmnemo=0.9334)
- ⚠️ v0.2.2 simulation predicted +12.7pp lift; real-DB measured +4.15pp on
LoCoMo and +0.00pp on LongMemEval — sim overstated by 3-100x. New
docs/WORKFLOW.md §2.2“PROVE BEFORE ADD” rule caught this before promotion.
Added
- Hybrid retrieval as default for
recall_lessons()— whenquery_textis non-empty ANDquery_embeddingis non-NULL ANDpgmnemo.disable_hybridGUC is FALSE/unset, internally routes torecall_hybrid()with default weights (vec_weight=0.4, bm25_weight=0.4, rrf_k=60). Signature unchanged; output shape unchanged (12 columns); diagnosticvec_score/bm25_score/rrf_scorecolumns exposed only via directrecall_hybrid()call. pgmnemo.disable_hybridGUC —SET pgmnemo.disable_hybrid = 'true'restores strict v0.3.0 vector-only behaviour. Default FALSE.lesson_tsvcolumn + GIN index + auto-populating trigger — moved from v0.2.2 EXPERIMENTAL opt-in to default extension install.recall_hybrid()function — moved from v0.2.2 opt-in to default install. Signature unchanged from v0.2.2.scripts/smoke_recall_hybrid.py— CI signature-stability smoke test (catches output-column rename bugs in ~10s vs ~5min bench script failure).benchmarks/scripts/bench_embed_cache.py— embedding cache (deterministic for(text, model, max_seq)). Reduces LongMemEval bench from ~52 min cold to ~3 min cached, LoCoMo from ~10 min to 14 seconds. Unlocks practical weight-tuning grid searches.
Changed
docs/SQL_REFERENCE.md §2.5— fixed incorrect documentation ofrecall_hybrid()output schema (washybrid_score, actuallyscore). Addedvec_score,bm25_score,rrf_kparameter, and explicit “sort by score” guidance.docs/COMPETITIVE_REALITY.md §1.2updated — BM25 gap on LongMemEval remains (0.982 vs 0.9334); v0.4.0 does NOT close it. Future work via Stella V5 embedder (H-02) or workload-aware routing (H-future).docs/COMPETITIVE_REALITY.md §5updated — graph-feature deprecation deferred to v0.4.1 (separate cycle).
Upgrade
ALTER EXTENSION pgmnemo UPDATE TO '0.4.0';
Idempotent. Adopters who need strict v0.3.0 retrieval behaviour can opt out:
SET pgmnemo.disable_hybrid = 'true';
-- or persist:
ALTER SYSTEM SET pgmnemo.disable_hybrid = 'true';
SELECT pg_reload_conf();
CI / release-gate verdict
scripts/significance_test_extended.py exit 3 (NEAR_THRESHOLD) due to
14 near-threshold cells on LoCoMo session (all positive direction). Release
notes include monitor watchlist for v0.4.1 follow-up:
multi_hop/MRR: +9.55pp (p_corr=0.2949, n=321, may reach significance with v0.4.1 data)multi_hop/recall@5: +7.79pp (p_corr=0.6874)adversarial/MRR: +7.14pp (p_corr=0.7715)single_hop/recall@10: +3.62pp (positive but small n)temporal/*: small positive trends across 3 metrics (historically weakest category, watch)
[0.3.1] — 2026-05-13
Theme
Hygiene foundation — no recall-algorithm change. The release that closes the
documentation/process gaps surfaced by the v0.2.x → v0.3.0 audit (see
docs/WORKFLOW.md §1 for the post-mortem).
Bench verdict
Per scripts/significance_test_extended.py against the v0.3.0 baseline
(benchmarks/gate/v0.3.0.json): neutral on all 3 benches — no SQL change,
no recall delta possible. The release is hygiene-only.
Added
docs/WORKFLOW.md— canonical development discipline document. Defines customer-first hypothesis declaration, per-cell bench gate, deprecation by absence of evidence, and 2–4 week release cycle.docs/BENCHMARK_PROTOCOL.md— two-phase architecture (corpus snapshot reuse + per-version retrieval test), frozen parameters table, gate decision matrix, CI integration plan.docs/SQL_REFERENCE.md— every public SQL function (version, ingest, recall_lessons, recall_lessons_pooled, recall_hybrid, traverse_causal_chain, traverse_temporal_window), all GUCs, RLS behaviour, deprecation log.docs/MIGRATION.mdPart B — in-place version-to-version upgrade paths v0.1.x → v0.3.0, per-version backfill requirements, generic dump+restore rollback policy.benchmarks/METRICS_BY_VERSION.md— single source of truth for “which version produced which number.” Per-(dataset × embedder × mode) tables, append-only at every release.benchmarks/gate/— release pre-push snapshot files (v<tag>.json) that consolidate every real-DB metrics.json for a release; CI uses these for the mechanical gate decision.scripts/significance_test_extended.py— per-category z-test with Holm-Bonferroni correction; exit codes 0/½/3 drive the release gate.scripts/render_progression.py— pure-SVG per-bench small-multiples line charts with CI95 bands.scripts/render_full_history.py— Tufte-style sparkline table with all metrics × all versions.scripts/render_executive_scorecard.py— single-page PASS/WATCH/FAIL scorecard for non-technical readers.ROADMAP.mdv2 — customer-driven per-version plan to v1.0. Old spec-driven roadmap archived.- CI bench-gate —
.github/workflows/release.ymlblocks tag push whenbenchmarks/gate/v<tag>.jsonis missing or significance test exits 2. Soft check inci.ymlwarns PRs that touch SQL but don’t update the gate.
Fixed
- GitHub Issues #12 (release coherence), #13 (docs/API coherence), #14 (install/upgrade contract), #15 (migration guide), #16 (benchmark protocol) — all closed; see commits in the v0.3.1 cycle.
Upgrade
ALTER EXTENSION pgmnemo UPDATE TO '0.3.1';
Empty upgrade script (no SQL changes); the version bump tracks documentation/process improvements only.
[0.3.0] — 2026-05-10
Fixed
- P0:
edge_type→relation_typein migration S3 backfill — theUPDATE pgmnemo.mem_edge SET edge_kind = ...backfill in the 0.2.1→0.3.0 migration script referenced a non-existent columnedge_typeinstead of the correctrelation_typecolumn. On any database with existingmem_edgerows this causedERROR: column "edge_type" does not exist, preventing the migration from completing. Fixed: alledge_typereferences in S3 replaced withrelation_type. - P0:
edge_type→relation_typeintraverse_causal_chain()S8 — the recreatedtraverse_causal_chain()function in S8 of the migration also referencedme.edge_typein the WHERE clause (both forward and backward BFS branches). Fixed: alledge_typereferences replaced withrelation_typein S8.
Upgrade
ALTER EXTENSION pgmnemo UPDATE TO '0.3.0';
[0.2.2] — 2026-05-10 (candidate)
Added
pgmnemo.recall_hybrid()— vector + BM25 weighted fusion (EXPERIMENTAL — opt-in only, NOT default) — new function combining dense cosine retrieval with BM25-class sparse retrieval (ts_rank_cdonlesson_tsv). Formula:0.4×cosine + 0.4×ts_rank_cd(lesson_tsv, q, 32)(plus minor importance/recency/provenance components). Union retrieval: candidates matched by either embedding cosine or BM25 text match. Returnsrrf_scorediagnostic column (1/(rrf_k+vec_rank) + 1/(rrf_k+bm25_rank)).recall_lessons()is unchanged and remains the default. Bench results (simulation, 2026-05-10): LoCoMo recall@10 +12.7pp vs vector-only (all 5 question types positive, statistically significant, CIs disjoint); LongMemEval MRR +5.8pp (p=0.005, significant), recall@10 +1.5pp (p=0.308, not significant — within noise at high baseline 0.93). Tryrecall_hybrid()if your task is MRR-sensitive or your corpus has keyword-matchable queries alongside semantic ones. WG decision:spec/v2/pgmnemo/HYBRID_DECISION_2026-05-10.md.- Migration script
extension/pgmnemo--0.2.1--0.2.2-hybrid.sql— idempotentCREATE OR REPLACE FUNCTION, backward-compatible (existingrecall_lessons()unchanged). - Benchmark script
benchmarks/scripts/run_longmemeval_hybrid.py— LongMemEval evaluation harness forrecall_hybrid()with gap-analysis reporting vs. vector-only and BM25 baselines.
Upgrade
\i extension/pgmnemo--0.2.1--0.2.2-hybrid.sql
[0.2.1] — 2026-05-09
Added
traverse_causal_chain(direction)parameter (W2.2 / F5) — addsdirection TEXT DEFAULT 'forward'parameter.'forward'follows source→target edges (existing behaviour, backward-compatible).'backward'follows target→source edges for reverse traversal.'both'traverses all edges. Input validation raisesEXCEPTIONon invalid values. Cycle guard via path array applies to all directions.pgmnemo.ef_searchGUC (F2) —SET LOCAL pgvector.hnsw.ef_searchapplied atrecall_lessons()entry frompgmnemo.ef_searchGUC (default 100, clamped 10–500).- Graph-proximity mixin in standard upgrade path (F3) —
pgmnemo--0.2.0-step4-recall-mixin.sqlcontent folded into the v0.2.0.1→0.2.1 upgrade script (was supplemental-only). - Row-Level Security multi-tenant isolation (W2.3 / Q5) —
pgmnemo.tenant_idGUC gatesagent_lessonbyproject_idandmem_edgeby endpoint ownership. Empty/unset = service-account bypass. Policies are idempotent (DROP IF EXISTS before CREATE).
Fixed
recall_lessons()IN-param/RETURNS TABLE collision onproject_id(INS-029 v2) — IN-paramproject_id INTcollided with theRETURNS TABLEcolumn of the same name. Fix: IN-param renamed toproject_id_filter; all internalrecall_lessons.project_idreferences updated accordingly. Backport of the same pattern as therole→role_filterfix in v0.1.4.1/v0.2.0.1.
Changed
pgmnemo.recency_weightdefault lowered (F1) — from0.20to0.08(pending REC-1 ablation confirmation). Operator can override viaALTER SYSTEM SET pgmnemo.recency_weight = '<value>'; SELECT pg_reload_conf();.
Upgrade
ALTER EXTENSION pgmnemo UPDATE TO '0.2.1';
[0.2.0.1] — 2026-05-04
Fixed
traverse_temporal_window()numeric → double precision cast (INS-030) — comparison between aNUMERICintermediate value andDOUBLE PRECISIONcaused a type-mismatch error at runtime on PostgreSQL 14/15. Cast now explicit throughout the function body.recall_lessons()IN-param/RETURNS TABLE collision (INS-029) — IN-paramrolerenamed torole_filter(backport of v0.1.4.1 fix; see below).- Idempotent upgrade DDL (INS-031) —
ADD COLUMN IF NOT EXISTSandCREATE INDEX IF NOT EXISTSguards added across all0.1.4→0.2.0upgrade scripts (backport of v0.1.4.1 fix). recall_lessons_pooled()post-collision smoke (Action #7) — confirmed pooled wrapper correctly delegates to the renamedrole_filterparameter after the collision fix.
Upgrade
ALTER EXTENSION pgmnemo UPDATE TO '0.2.0.1';
[0.1.4.1] — 2026-05-04 (maintenance branch)
Fixed
recall_lessons()IN-param/RETURNS TABLE collision (INS-029, P0) — PL/pgSQL raisedERROR: parameter name "role" used more than oncebecause the IN-paramrole TEXTcollided with theRETURNS TABLEcolumn of the same name. The flagship function never compiled on a fresh install. Fix: IN-param renamed torole_filter; all callers updated.- Idempotent upgrade DDL (INS-031) —
ADD COLUMN IF NOT EXISTSandCREATE INDEX IF NOT EXISTSguards applied across all upgrade scripts so re-running a patch on an already-upgraded database no longer raises duplicate-object errors.
Upgrade
ALTER EXTENSION pgmnemo UPDATE TO '0.1.4.1';
[0.2.0] — 2026-05-04
Added
pgmnemo.mem_edgeDDL (closes RFC §3) — directed typed edge table betweenagent_lessonrows.- Columns:
source_id,target_id,relation_type(CAUSED_BY, SUPERSEDES, CO_OCCURRED, DERIVED_FROM, or user-defined),weight REAL[0.0–1.0], bitemporality (valid_from/valid_until),commit_sha,metadata JSONB. - Three covering indexes: forward/reverse traversal on
(source_id, relation_type)and(target_id, relation_type)withWHERE valid_until IS NULL; temporal range index on(valid_from, valid_until). CONSTRAINT ck_no_self_loop: preventssource_id = target_id.
- Columns:
pgmnemo.traverse_causal_chain(start_id, max_depth, relation_types, only_active)(closes RFC §4) — recursive CTE walk of the causal edge graph.- Returns
(lesson_id, depth, path BIGINT[], path_weight, role, topic, lesson_text, importance, created_at, commit_sha, verified_at). - Cycle-safe via accumulated path array. Fail-safe: returns zero rows if
start_idmissing. max_depthdefault 5;relation_typesdefaultARRAY['CAUSED_BY'];only_activedefaultTRUE.
- Returns
pgmnemo.traverse_temporal_window(start_id, window_interval, include_unlinked, role_filter, project_id_filter, k)(closes RFC §5) — co-temporal episode discovery.- Returns lessons whose
created_atfalls within±window_intervalof the anchor lesson. Window hard-capped at 30 days. linked=TRUEwhen amem_edge(either direction) exists between the row andstart_id.- Ghost-lesson exclusion controlled by
pgmnemo.include_unverifiedGUC (default off).
- Returns lessons whose
Graph-proximity mixin for
recall_lessons()— integrates BFS graph traversal into scoring.- Updated scoring formula:
0.4×cosine + 0.2×importance + γ×recency + 0.1×prov_strength + δ×graph_proximity. graph_proximity = MAX(1 - depth/max_depth)via BFS throughCAUSED_BY,CO_OCCURRED,DERIVED_FROMedges from top-5 cosine anchors (max_depth=5).- New GUC
pgmnemo.graph_proximity_weight(default0.2, clamped to[0.0, 0.5]).
- Updated scoring formula:
Upgrade
ALTER EXTENSION pgmnemo UPDATE TO '0.2.0';
Or from a fresh install:
CREATE EXTENSION pgmnemo CASCADE; -- installs 0.2.0 directly
[0.1.4] — 2026-05-04
Added
State machine for
agent_lesson(closes #3)- New
state TEXTcolumn (default'draft'), constrained to 9 lifecycle values:draft,candidate,validated,canonical,deprecated,superseded,archived,rejected,conflicted. state_changed_at TIMESTAMPTZ— auto-set on every state change.pgmnemo.agent_lesson_state_transitiontable — explicit allowed-transition pairs.pgmnemo.transition_lesson(lesson_id BIGINT, new_state TEXT)— enforces the DAG; raises on invalid transition.
- New
Provenance FK columns (closes #4)
source_run_id BIGINT NULL— soft FK to the orchestratoragent_runrow that produced this lesson.source_task_id BIGINT NULL— soft FK to the orchestratortasksrow.- Partial indexes
ix_pgmnemo_lesson_source_runandix_pgmnemo_lesson_source_task(WHERE NOT NULL). - Columns are intentionally not hard
REFERENCES-constrained so the extension remains portable across host schemas.
TTL /
expires_at(closes #5)expires_at TIMESTAMPTZ NULL— optional hard expiry;NULL= never expires.pgmnemo.evict_expired_lessons()— deletes rows whereexpires_at < NOW(); returns eviction count. Safe to call on a schedule.- Partial index
ix_pgmnemo_agent_lesson_expireskeeps eviction scans cheap.
Fixed
pgmnemo.version()dynamic lookup (closes #1)version()previously returned a hard-coded string baked at build time. AfterALTER EXTENSION pgmnemo UPDATEthe reported version was stale.- Now reads
extversionfrompg_catalog.pg_extensionat call time — always accurate.
Upgrade
ALTER EXTENSION pgmnemo UPDATE TO '0.1.4';
Or from a fresh install:
CREATE EXTENSION pgmnemo CASCADE; -- installs 0.1.4 directly
[0.1.3] — 2026-04-29
Added
verifier_role TEXTcolumn onagent_lesson— records which agent role validated the lesson.
[0.1.2] — 2026-04-28
Added
- Tri-state
prov_strength(hard/soft/none) onagent_lesson. recall_lessons_pooled()wrapper — cross-project recall for shared-context queries.
[0.1.1] — 2026-04-27
Added
recency_weightGUC — tune the time-decay component of the hybrid recall score without restarting the server.
[0.1.0] — 2026-04-26
Added
- HNSW vector index via
pgvector— fast approximate nearest-neighbour recall. pgmnemo.ingest()— provenance-gated write API; requirescommit_shaorartifact_hash.pgmnemo.recall_lessons()— hybrid scoring: cosine similarity + BM25 full-text + recency decay.- Role +
project_idcomposite scoping. recall_lessons_pooled()(cross-project variant).
[0.0.1] — 2026-04-20
Initial schema: pgmnemo.agent_lesson table + basic HNSW index.