Contents
Node.js cluster client (napi-rs) — build plan
A native Node.js addon, written in Rust with napi-rs, that gives JS/TS an efficient connection pool over the N-node pg_replica cluster and follows the primary automatically. The host-racing, primary detection, pooling, and failover recovery live in Rust — JS never loops over connections.
Why this shape
- Not WASM — WASM has no raw TCP; the PG wire protocol needs a socket. A native addon
loads into Node and uses real sockets, so
tokio-postgresworks unmodified. - Not JS probe-loop — pure-JS
pgis single-host (#1470), and the libpq path can’t be reached cleanly throughpg’s parser. We push that logic down into Rust instead of hand-rolling it in JS (see CLIENT.md). - Reuse what’s verified —
tokio-postgresalready does multi-host +target_session_attrs=read-write(proven inpackages/failover-probe,scripts/test-m6-routing.sh). The addon reuses that exactConfig.
Core idea (why this is small)
A tokio-postgres Config with the host list + target_session_attrs=read-write lands
each connection on the current primary by itself. Wrap that Config in an async pool
(bb8-postgres) and you get an N-node, primary-following pool with
almost no custom routing code: every pooled connection resolves the primary at connect
time, and broken connections (after a failover) are replaced by fresh ones that re-resolve
to the new primary. The pool crate also spawns each Connection future for us.
The one non-trivial case to handle explicitly: a primary that is fenced read-only
without dropping its sessions (pg_replica sets default_transaction_read_only=on).
An already-open pooled connection to that node stays connected but can no longer write.
So the pool needs a checkout validation that confirms the connection is still on a
writable primary (SHOW transaction_read_only → off) and evicts it otherwise.
Public API
import { createPool } from 'hyperiondb-client'
const pool = createPool({
hosts: ['10.0.0.1', '10.0.0.2', '10.0.0.3'],
port: 5432,
user: 'app',
password: '…',
database: 'weido',
mode: 'read-write',
poolSize: 10,
connectTimeoutMs: 2000,
})
const rows = await pool.query<{ id: number }>('select id from t where x = $1', [42])
await pool.transaction(async (tx) => {
await tx.query('insert into t (x) values ($1)', [1])
await tx.query('update t set x = x + 1 where x = $1', [1])
})
await pool.end()
A separate mode: 'read-only' (or 'prefer-standby' on PG 14+) pool routes read-heavy
work (ParadeDB search) to standbys.
Milestones
C1 — Core pool
- [x] Reuse the
failover-probeConfig(multi-host,target_session_attrs=read-write,connect_timeout). - [x]
createPool(opts)→#[napi]class holding adeadpool/bb8pool. - [x]
pool.query(sql, params)async (#[napi]→ JS Promise) returning rows as JS objects keyed by column name. - [x]
pool.end()graceful drain.
C2 — Type marshalling
- [x] Results: PG
Row→ JS. Mappings:bool→boolean,int2/4→number,int8→BigInt,oid→number,float4/8→number,numeric→string,text/varchar/bpchar/name→ string,uuid→string,bytea→Buffer,json/jsonb→parsed value,timestamptz/timestamp/date/time→ISO 8601 string, and arrays of all the above → JS arrays. Custom binarynumeric→string decoder (arbitrary precision, no float). - [x] Params: JS values →
tokio_postgres::types::ToSql(null, boolean, number, BigInt, string, Buffer→bytea, Date→timestamptz, array→pg array when the column is an array type else jsonb, object→jsonb). Integer/float targets coerced by column type. - [x] Decision:
int8/bigint→ BigInt (lossless 64-bit);numeric→ string (arbitrary precision).timestamptz→ ISO 8601 string;bytea→ Buffer.
C3 — Primary affinity & failover
- [x] Checkout validation: a
deadpoolpre_recyclehook runsSHOW transaction_read_onlyand evicts the connection when it ison(read-only-fence-without-disconnect window). Write pool only; fresh connections are already validated bytarget_session_attrs. - [x] Recycle on connection error; new connections re-resolve the primary (the hook doubles
as a liveness check — a failed
SHOWevicts the dead connection). - [x] Retry/backoff (50ms→500ms, capped) bounded by
acquireTimeoutMs(default 5000), surfacingno writable primary available after <ms>msas a typed error. - [x]
mode: 'read-write' | 'read-only' | 'prefer-standby' | 'any'.read-only→target_session_attrs=read-only(lands on standbys);prefer-standby/any→target_session_attrs=any+ random host load-balancing. (tokio-postgres 0.7.x has no server-side standby preference, soprefer-standbyspreads across all reachable nodes.)
C4 — Ergonomics
- [x] Transactions: native
pool.begin() -> Transaction {query, commit, rollback}(one dedicated connection held in anArc<Mutex<Option<Client>>>), plus apool.transaction(cb)helper (autoBEGIN/COMMIT,ROLLBACKon throw) in the JS wrapper. - [x] Prepared statements: query paths use deadpool
prepare_cached, so repeated SQL reuses a server-side named statement per connection. (Pipelining is inherent to tokio-postgres for concurrent queries on a connection.) - [x] Query cancellation:
query(sql, params, { timeoutMs, signal }). Both a timeout and anAbortSignaltrip the connection’stokio_postgrescancel_token(server-side cancel). The!SendRc-basedAbortSignalis bridged on the JS thread (on_abort→ aSendNotify) so the async query canselect!on it. - [x] Error mapping: PG errors carry the 5-char
SQLSTATEon JSerr.code(native formats[SQLSTATE xxxxx] msg; the JS wrapper parses it onto.codeand cleans the message). - [x] Hand-checked
client.d.ts(precisePoolOptions/Param/Row/QueryOptions/Pool/Transactiontypes) is the publishedtypes; the napi-generatedindex.d.tsstays as the internal native binding. Architecture is now native core (index.js/.node) + a thin JS ergonomic layer (client.js) that adds.code,transaction(cb), and option passing.
C5 — Observability & resilience
- [x] Pool metrics:
pool.status()→{ maxSize, size, available, inUse, waiting }(from deadpool’sStatus;inUse = size − available). - [x]
statement_timeout:statementTimeoutMspool option sets it server-side on every connection (options=-c statement_timeout=…). Per-query cancellation is the C4query(…, { timeoutMs, signal })path (client-sidecancel_token). - [x] Optional logging hook:
logger(event)pool option, called once per query with{ sql, durationMs, rowCount? , error? }; thrown logger errors are swallowed.
C6 — Packaging & release
- [x]
@napi-rs/cliprebuilds:win32-x64-msvc,darwinx64/arm64,linuxx64/arm64 (gnu+musl) — 7 targets (napi.targets). Linux builds cross-compile withcargo-zigbuild(--cross-compile); win/mac build natively. - [x] GitHub Actions matrix (
.github/workflows/release.yml) →napi prepublishpublishes per-platformhyperiondb-client-<triple>packages and wires them as the main package’soptionalDependencies. Triggered by[cd]in the commit message onmain. - [x] npm publish; the workflow bumps
npm version patchand syncsCargo.tomlto match, commits the bump back (no[cd], so it doesn’t re-trigger), then publishes. (Needs anNPM_TOKENsecret.) - [x] README: install, connect, failover behavior, read-scaling, type mapping, testing.
Testing
node-addon/test/ (node:test). npm test runs the type + fence tests against a running
cluster (primary on the first host); npm run test:chaos runs the failover test, which needs
to stop a node (test/cluster.js, defaults to the local pgrx cluster via WSL pg_ctl; set
HYPERION_CTL/env for docker). Connection + topology come from HYPERION_* env vars.
- [x] Unit: type round-trips (params ↔ rows) for every supported PG type —
test/types.test.js(scalars,int8→BigInt,numericprecision, NULLs, date/time/void, arrays inclint8[], Date/Buffer/array/jsonb params). - [x] Integration: query load through the primary-following pool, stop the primary, assert
reconnection to the new primary and zero acked-write loss —
test/chaos.test.js(JS port ofpackages/chaos-writer; requires synchronous replication for true zero-loss). - [x] Read-only-fence eviction (C3) explicitly —
test/fence.test.js(fences the primary viaALTER SYSTEM, asserts the typed error + recovery; always resets the fence in teardown).
Open decisions
- [x] Pool crate:
deadpool-postgres(simpler, recycling built in) - [x] Package + crate name (brand: HyperionDb).
- [x]
bigint/numericJS representation (C2).
Not planned
- [ ] TLS backend (rustls vs native-tls) and default per environment.