Node.js cluster client (napi-rs) — build plan

A native Node.js addon, written in Rust with napi-rs, that gives JS/TS an efficient connection pool over the N-node pg_replica cluster and follows the primary automatically. The host-racing, primary detection, pooling, and failover recovery live in Rust — JS never loops over connections.

Why this shape

  • Not WASM — WASM has no raw TCP; the PG wire protocol needs a socket. A native addon loads into Node and uses real sockets, so tokio-postgres works unmodified.
  • Not JS probe-loop — pure-JS pg is single-host (#1470), and the libpq path can’t be reached cleanly through pg’s parser. We push that logic down into Rust instead of hand-rolling it in JS (see CLIENT.md).
  • Reuse what’s verifiedtokio-postgres already does multi-host + target_session_attrs=read-write (proven in packages/failover-probe, scripts/test-m6-routing.sh). The addon reuses that exact Config.

Core idea (why this is small)

A tokio-postgres Config with the host list + target_session_attrs=read-write lands each connection on the current primary by itself. Wrap that Config in an async pool (bb8-postgres) and you get an N-node, primary-following pool with almost no custom routing code: every pooled connection resolves the primary at connect time, and broken connections (after a failover) are replaced by fresh ones that re-resolve to the new primary. The pool crate also spawns each Connection future for us.

The one non-trivial case to handle explicitly: a primary that is fenced read-only without dropping its sessions (pg_replica sets default_transaction_read_only=on). An already-open pooled connection to that node stays connected but can no longer write. So the pool needs a checkout validation that confirms the connection is still on a writable primary (SHOW transaction_read_onlyoff) and evicts it otherwise.

Public API

import { createPool } from 'hyperiondb-client'

const pool = createPool({
  hosts: ['10.0.0.1', '10.0.0.2', '10.0.0.3'],
  port: 5432,
  user: 'app',
  password: '…',
  database: 'weido',
  mode: 'read-write',
  poolSize: 10,
  connectTimeoutMs: 2000,
})

const rows = await pool.query<{ id: number }>('select id from t where x = $1', [42])

await pool.transaction(async (tx) => {
  await tx.query('insert into t (x) values ($1)', [1])
  await tx.query('update t set x = x + 1 where x = $1', [1])
})

await pool.end()

A separate mode: 'read-only' (or 'prefer-standby' on PG 14+) pool routes read-heavy work (ParadeDB search) to standbys.

Milestones

C1 — Core pool

  • [x] Reuse the failover-probe Config (multi-host, target_session_attrs=read-write, connect_timeout).
  • [x] createPool(opts)#[napi] class holding a deadpool/bb8 pool.
  • [x] pool.query(sql, params) async (#[napi] → JS Promise) returning rows as JS objects keyed by column name.
  • [x] pool.end() graceful drain.

C2 — Type marshalling

  • [x] Results: PG Row → JS. Mappings: bool→boolean, int2/4→number, int8BigInt, oid→number, float4/8→number, numericstring, text/varchar/bpchar/name→ string, uuid→string, byteaBuffer, json/jsonb→parsed value, timestamptz/timestamp/date/timeISO 8601 string, and arrays of all the above → JS arrays. Custom binary numeric→string decoder (arbitrary precision, no float).
  • [x] Params: JS values → tokio_postgres::types::ToSql (null, boolean, number, BigInt, string, Buffer→bytea, Date→timestamptz, array→pg array when the column is an array type else jsonb, object→jsonb). Integer/float targets coerced by column type.
  • [x] Decision: int8/bigintBigInt (lossless 64-bit); numericstring (arbitrary precision). timestamptz → ISO 8601 string; bytea → Buffer.

C3 — Primary affinity & failover

  • [x] Checkout validation: a deadpool pre_recycle hook runs SHOW transaction_read_only and evicts the connection when it is on (read-only-fence-without-disconnect window). Write pool only; fresh connections are already validated by target_session_attrs.
  • [x] Recycle on connection error; new connections re-resolve the primary (the hook doubles as a liveness check — a failed SHOW evicts the dead connection).
  • [x] Retry/backoff (50ms→500ms, capped) bounded by acquireTimeoutMs (default 5000), surfacing no writable primary available after <ms>ms as a typed error.
  • [x] mode: 'read-write' | 'read-only' | 'prefer-standby' | 'any'. read-onlytarget_session_attrs=read-only (lands on standbys); prefer-standby/anytarget_session_attrs=any + random host load-balancing. (tokio-postgres 0.7.x has no server-side standby preference, so prefer-standby spreads across all reachable nodes.)

C4 — Ergonomics

  • [x] Transactions: native pool.begin() -> Transaction {query, commit, rollback} (one dedicated connection held in an Arc<Mutex<Option<Client>>>), plus a pool.transaction(cb) helper (auto BEGIN/COMMIT, ROLLBACK on throw) in the JS wrapper.
  • [x] Prepared statements: query paths use deadpool prepare_cached, so repeated SQL reuses a server-side named statement per connection. (Pipelining is inherent to tokio-postgres for concurrent queries on a connection.)
  • [x] Query cancellation: query(sql, params, { timeoutMs, signal }). Both a timeout and an AbortSignal trip the connection’s tokio_postgres cancel_token (server-side cancel). The !Send Rc-based AbortSignal is bridged on the JS thread (on_abort → a Send Notify) so the async query can select! on it.
  • [x] Error mapping: PG errors carry the 5-char SQLSTATE on JS err.code (native formats [SQLSTATE xxxxx] msg; the JS wrapper parses it onto .code and cleans the message).
  • [x] Hand-checked client.d.ts (precise PoolOptions/Param/Row/QueryOptions/Pool/ Transaction types) is the published types; the napi-generated index.d.ts stays as the internal native binding. Architecture is now native core (index.js/.node) + a thin JS ergonomic layer (client.js) that adds .code, transaction(cb), and option passing.

C5 — Observability & resilience

  • [x] Pool metrics: pool.status(){ maxSize, size, available, inUse, waiting } (from deadpool’s Status; inUse = size − available).
  • [x] statement_timeout: statementTimeoutMs pool option sets it server-side on every connection (options=-c statement_timeout=…). Per-query cancellation is the C4 query(…, { timeoutMs, signal }) path (client-side cancel_token).
  • [x] Optional logging hook: logger(event) pool option, called once per query with { sql, durationMs, rowCount? , error? }; thrown logger errors are swallowed.

C6 — Packaging & release

  • [x] @napi-rs/cli prebuilds: win32-x64-msvc, darwin x64/arm64, linux x64/arm64 (gnu + musl) — 7 targets (napi.targets). Linux builds cross-compile with cargo-zigbuild (--cross-compile); win/mac build natively.
  • [x] GitHub Actions matrix (.github/workflows/release.yml) → napi prepublish publishes per-platform hyperiondb-client-<triple> packages and wires them as the main package’s optionalDependencies. Triggered by [cd] in the commit message on main.
  • [x] npm publish; the workflow bumps npm version patch and syncs Cargo.toml to match, commits the bump back (no [cd], so it doesn’t re-trigger), then publishes. (Needs an NPM_TOKEN secret.)
  • [x] README: install, connect, failover behavior, read-scaling, type mapping, testing.

Testing

node-addon/test/ (node:test). npm test runs the type + fence tests against a running cluster (primary on the first host); npm run test:chaos runs the failover test, which needs to stop a node (test/cluster.js, defaults to the local pgrx cluster via WSL pg_ctl; set HYPERION_CTL/env for docker). Connection + topology come from HYPERION_* env vars.

  • [x] Unit: type round-trips (params ↔ rows) for every supported PG type — test/types.test.js (scalars, int8→BigInt, numeric precision, NULLs, date/time/void, arrays incl int8[], Date/Buffer/array/jsonb params).
  • [x] Integration: query load through the primary-following pool, stop the primary, assert reconnection to the new primary and zero acked-write losstest/chaos.test.js (JS port of packages/chaos-writer; requires synchronous replication for true zero-loss).
  • [x] Read-only-fence eviction (C3) explicitly — test/fence.test.js (fences the primary via ALTER SYSTEM, asserts the typed error + recovery; always resets the fence in teardown).

Open decisions

  • [x] Pool crate: deadpool-postgres (simpler, recycling built in)
  • [x] Package + crate name (brand: HyperionDb).
  • [x] bigint/numeric JS representation (C2).

Not planned

  • [ ] TLS backend (rustls vs native-tls) and default per environment.