pgproto 0.5.0

This Release
pgproto 0.5.0
Date
Status
Stable
Other Releases
Abstract
Native Protobuf support for PostgreSQL
Released By
apaezmx
License
PostgreSQL
Resources
Special Files
Tags

Extensions

pgproto 0.5.0
Native Protobuf support for PostgreSQL

Documentation

README
src/ Directory Overview

README

pgproto

Native Protocol Buffers (proto3) support for PostgreSQL.

Store your protobuf binary data with the rest of your data. Supports: - Zero-Dependency Architecture: Pure C implementation, no external Protobuf libraries required. - Schema-aware field extraction without JSONB conversions. - Custom operators for nested field navigation (->, #> for integers, and #>> for text). - Substantial storage savings over standard JSONB. - GIN and standard indexing for fast retrieval. - Automatic Compaction: Mutations like pb_set and pb_delete automatically remove stale tags, preventing binary bloat.

Coverage Status

📊 Performance Results

In benchmarks comparing 100,000 serialized example.Order messages against equivalent JSONB structures and normalized native relational schemas (using benchmark.sh with static fixtures):

Metric Protobuf (pgproto) JSONB (jsonb) Native Relational (Normalized 1:N) Win
Storage Size 16 MB 48 MB 21 MB 📊 ~25% smaller than Native, ~66% smaller than JSONB!
Single-Row Lookup Latency (Indexed) 3.6 ms 8.0 ms 2.7 ms 📈 ~2x faster than JSONB for indexed lookups!
Full Document Retrieval Latency 3.6 ms 8.0 ms 31.1 ms 📈 ~8x faster than Native JOINs for full object fetch!

📈 Large Payload Aggregation Benchmark (1KB)

In separate benchmarks querying 100,000 rows with large 1KB payloads (comparing extraction vs JSONB): * Field at Beginning (Tag 1): pgproto is ~35% faster than jsonb. * Field at End (Tag 3, requires skipping 1KB): pgproto is ~30% faster than jsonb.

📊 Concurrent Load Benchmarks

To simulate production load, we ran queries in parallel to measure average latency: * 10 Parallel Workers: pgproto average latency was 3.72 ms vs jsonb 6.59 ms (~42% faster). * 100 Parallel Workers: pgproto average latency was 5.11 ms vs jsonb 10.49 ms (~50% faster).

[!NOTE] pgproto combines the storage efficiency of binary compaction with the query flexibility of JSONB, without the overhead of heavy JOINs or text parsing!

Benchmarks ran using optimized release binaries (-O2) in an isolated Docker environment.


🛠️ Installation

Linux and Mac

Compile and install the extension (requires standard build-essential and postgresql-server-dev-*).

git clone https://github.com/Apaezmx/pgproto
cd pgproto
make
make install # may need sudo

See the docker-compose.yml if you want to deploy a pre-configured local sandbox.


🏁 Getting Started

Enable the extension (do this once in each database where you want to use it):

CREATE EXTENSION pgproto;

1. Register Your Protobuf Schemas

To understand what fields are in your binary blob, pgproto requires runtime schemas. You can load FileDescriptorSet binary blobs into the registry:

INSERT INTO pb_schemas (name, data) VALUES ('MySchema', '\x...');

2. Create a Protobuf Table

Create a table with the custom protobuf type:

CREATE TABLE items (
    id SERIAL PRIMARY KEY,
    data protobuf
);

-- Optional: Add implicit cast to bytea for utility functions like length()
CREATE CAST (protobuf AS bytea) WITHOUT FUNCTION;

3. Insert & Query Values

Insert your serialized binary protobuf blobs:

INSERT INTO items (data) VALUES ('\x0a02082a');

Extract nested standard fields using operators:

-- Extract field id 1 (integer) from nested structure
SELECT data #> '{Outer, inner, id}'::text[] FROM items;

-- Extract string field (text) using the text accessor
SELECT data #>> '{Outer, tags, mykey}'::text[] FROM items;

🔍 Querying & Extraction

Extract values using standard PostgreSQL operators:

Nested Field Access

Navigate nested structures using standard text-array paths:

  • #> (Integer Accessor): Returns int4. Ideal for numeric IDs and enums.
  • #>> (Text Accessor): Returns text. Ideal for strings and map values.
-- Access a nested integer field
SELECT data #> '{Outer, inner, id}'::text[] FROM items;

-- Access a nested string field
SELECT data #>> '{Outer, description}'::text[] FROM items;

Map / Repeated Field Lookups

Navigating complex arrays and maps (using text-arrays for keys and indices):

-- Access map keys inside a nested structure
SELECT data #>> '{Outer, tags, mykey}'::text[] FROM items;

-- Access array index
SELECT data #> '{Outer, scores, 0}'::text[] FROM items;

✏️ Modification & CRUD Operations

pgproto allows you to update, insert, and delete parts of a Protobuf document without overwriting the whole column, similar to jsonb.

[!IMPORTANT] Functions like pb_set, pb_insert, and pb_delete are pure functions. They return a new modified protobuf value with automatic compaction (stale tags are removed to prevent bloat). To persist changes, you must use them in an UPDATE statement.

Update Fields (pb_set)

Update a field at a specific path. Supports Int32, Int64, Bool, and String.

-- To persist the change:
UPDATE items SET data = pb_set(data, ARRAY['Outer', 'a'], '42');

Insert into Arrays/Maps (pb_insert)

Insert an element into an array or map.

-- Persist insertion into a repeated field (array)
UPDATE items SET data = pb_insert(data, ARRAY['Outer', 'scores', '0'], '100');

-- Persist insertion into a map
UPDATE items SET data = pb_insert(data, ARRAY['Outer', 'tags', 'key1'], 'value1');

Delete Fields/Elements (pb_delete)

Remove a field or specific element from an array or map.

-- Persist deletion of a field
UPDATE items SET data = pb_delete(data, ARRAY['Outer', 'a']);

🗃️ Indexing

B-Tree expression indexing

You can use standard B-Tree indexing on field extraction results for fast lookups:

CREATE INDEX idx_pb_id ON items ((data #> '{Outer, inner, id}'::text[]));

-- Query will use Index Scan instead of sequential scan
EXPLAIN ANALYZE SELECT * FROM items WHERE (data #> '{Outer, inner, id}'::text[]) = 42;

📚 Advanced Usage & Schema Evolution

Complex Types: Enums and oneof

Protobuf enums and oneof fields map naturally to standard extraction functions: - Enums: Encoded as standard varints on the wire. - Oneof: Queried normally. Accessing a field that is not currently set in the oneof returns NULL.

Schema Evolution Handling

Protobuf’s biggest strength is seamless forward/backward compatibility: - Adding Fields: Old messages will return NULL for the new field. - Deprecating Fields: Engine safely skips unknown fields during traversal.


🧪 Testing

🟢 Regression Tests (PostgreSQL pg_regress)

Run integration tests for type I/O, operators, and GIN indexing: bash make installcheck

🔬 Isolated C Unit Tests

Test core C logic (Varints, Traversal, Registry) in absolute isolation without a PostgreSQL server: bash make -f tests/Makefile clean make -f tests/Makefile

🐳 Running Coverage & Leaks in Docker

lcov and valgrind are pre-installed in the Docker image.

🧠 Memory Safety

The entire extension is verified as 100% memory safe under Valgrind: ```bash

Run isolated unit tests under Valgrind

docker-compose exec -u postgres db valgrind –leak-check=full ./tests/navigation_test ```

🧪 Consolidated Coverage

Expected consolidated coverage (Unit + Integration) is >90%: bash docker-compose exec -u postgres db make -f tests/Makefile coverage


🏗️ Technical Details

For technical design plans and architectural discussion, see src/README.md and DESIGN.md.


Disclaimer

This is a personal project. The views, code, and opinions expressed here are my own and do not represent those of my current or past employers.