pgproto 0.3.0

This Release
pgproto 0.3.0
Date
Status
Stable
Other Releases
Abstract
Native Protobuf support for PostgreSQL
Released By
apaezmx
License
PostgreSQL
Resources
Special Files
Tags

Extensions

pgproto 0.3.0
Native Protobuf support for PostgreSQL

Documentation

README
Test runner
README
Usage
failure_list_objc
failure_list_objc
README
Legacy generated code compatibility smoke test
README
README
README
Protocol Buffers - Prototiller design documents
edition-zero-json-handling
Edition Zero: JSON Handling
cmake_protobuf_generate
How to use protobuf_generate
failure_list_php_c
failure_list_php_c
README
Protocol Buffers Python
README
README
text_encoding
text_encoding.rst
protobuf
protobuf.rst
wrapping-upb
Building a protobuf library on upb
README
Publish pre-compiled protoc artifacts
symbol_database
symbol_database.rst
README
Protobuf packaging
compare_cpp_codegen_multiple
compare_cpp_codegen_multiple
UTF-8-demo
UTF-8-demo
installed_lib_static_golden
installed_lib_static_golden
json_format
json_format.rst
compare_cpp_codegen_failure
compare_cpp_codegen_failure
service
service.rst
descriptor
descriptor.rst
README
README
failure_list_rust_upb
failure_list_rust_upb
options
Protobuf Global Extension Registry
bug_report
bug_report
text_format
text_format.rst
message_factory
message_factory.rst
failure_list_cpp
failure_list_cpp
README
README
README
hpb Generator
README
Protocol Buffers - Code Example
README
upb Lua bindings
failure_list_python-post26
failure_list_python-post26
edition-zero-features
Edition Zero Features
utf8_corpus_kuhn
utf8_corpus_kuhn
reflection
reflection.rst
cpp-apis-for-edition-zero
C++ APIs for Edition Zero
text_format_failure_list_java_lite
text_format_failure_list_java_lite
README
Bazel CI test
expected_prefixes
expected_prefixes
arena_fusion
μpb Arena Fusion and References
index
index.rst
duration_pb2
duration_pb2.rst
text_format_unittest_extensions_data_pointy
text_format_unittest_extensions_data_pointy
README
README
CODE_OF_CONDUCT
CODE_OF_CONDUCT
utf8_corpus_durst
utf8_corpus_durst
text_format_failure_list_java
text_format_failure_list_java
failure_list_dart_upb
failure_list_dart_upb
CONTRIBUTORS
CONTRIBUTORS
README
Python setuptools extension
README
Rust upb bindings
what-are-protobuf-editions
What are Protobuf Editions?
README
README
CMakeLists
CMakeLists
minimum-required-edition
Minimum Required Edition
installed_include_golden
installed_include_golden
map_test_data
map_test_data
SECURITY
SECURITY
editions-life-of-a-featureset
Editions: Life of a FeatureSet
README
Protocol Buffers - Protobuf Editions design documents
edition-zero-converged-semantics
Edition Zero: Converged Semantics
proto2
Generated code
conformance_upb_failures_performance
conformance_upb_failures_performance
text_format_failure_list_python
text_format_failure_list_python
text_format_failure_list_rust_upb
text_format_failure_list_rust_upb
CMakeLists
CMakeLists
README
Protocol Buffers - Google’s data interchange format
README
README
failure_list_objc_performance
failure_list_objc_performance
failure_list_ruby
failure_list_ruby
requirements
requirements
README
README
README
Clang wrappers
proto_builder
proto_builder.rst
README
Protocol Buffers - Google’s data interchange format
README
Protocol Buffers - Google’s data interchange format
descriptor_pb2
descriptor_pb2.rst
design
upb Design
failure_list_rust_cc
failure_list_rust_cc
failure_list_python_upb
failure_list_python_upb
README
README
text_format_failure_list_dart_upb
text_format_failure_list_dart_upb
README
README
requirements
requirements
edition-evolution
Edition Evolution
README
Protocol Buffers - Google’s data interchange format
text_format_unittest_data_pointy_oneof
text_format_unittest_data_pointy_oneof
vs-cpp-protos
upb vs. C++ Protobuf Design
java-lite-for-editions
Java Lite For Editions
README
Kotlin DSL Generator
protobuf-design-options-attributes
Protobuf Design: Options Attributes
text_format_failure_list_rust_cc
text_format_failure_list_rust_cc
edition-naming
Edition Naming
failure_list_csharp_performance
failure_list_csharp_performance
README
How to get a compatible version of protoc
conformance_upb_failures
conformance_upb_failures
protobuf-editions-for-schema-producers
protobuf-editions-for-schema-producers
field_mask_pb2
field_mask_pb2.rst
text_format_failure_list_cpp
text_format_failure_list_cpp
CMakeLists
CMakeLists
editions-feature-extension-layout
Editions: Feature Extension Layout
README
V4 ownership and implementation change
text_format_map_unittest_data
text_format_map_unittest_data
README
μpb: small, fast C protos
cpp_build_systems
How Protobuf supports multiple C++ build systems
failure_list_java_lite
failure_list_java_lite
descriptor_database
descriptor_database.rst
wrappers_pb2
wrappers_pb2.rst
implementing_proto3_presence
How To Implement Field Presence for Proto3
empty_pb2
empty_pb2.rst
prototiller-reqs-for-edition-zero
Prototiller Requirements for Edition Zero
README
Fast UTF-8 validation with Range algorithm (NEON+SSE4+AVX2)
README
Getting Sources
failure_list_jruby
failure_list_jruby
README
README
golden-ruby_ffi_bindings
golden-ruby_ffi_bindings
editions-feature-visibility
Editions Feature Visibility
style-guide
C style guide
failure_list_java
failure_list_java
debug_string_test
debug_string_test
text_format_extensions_unittest_data
text_format_extensions_unittest_data
lite
Protocol Buffers - Google’s data interchange format
struct_pb2
struct_pb2.rst
unknown_fields
unknown_fields.rst
containers
containers.rst
compare_cpp_codegen_success
compare_cpp_codegen_success
edition-zero-feature-enum-field-closedness
Edition Zero Feature: Enum Field Closedness
text_format_unittest_data_oneof_implemented
text_format_unittest_data_oneof_implemented
README
Protocol Buffers - Google’s data interchange format
edition-lifetimes
Edition Lifetimes
README
How to get a compatible version of protoc
installed_bin_golden
installed_bin_golden
text_format_unittest_data
text_format_unittest_data
any_pb2
any_pb2.rst
field_presence
Application note: Field presence
README
Bazel Central Registry
life-of-an-edition
Life of an Edition
text_format_unittest_extensions_data
text_format_unittest_extensions_data
timestamp_pb2
timestamp_pb2.rst
CHANGES
CHANGES
protobuf-editions-design-features
Protobuf Editions Design: Features
failure_list_csharp
failure_list_csharp
text_format_unittest_data
text_format_unittest_data
text_format_failure_list_php
text_format_failure_list_php
jvm_aot
Ahead Of Time (AOT) compilation for the Java Virtual Machine (JVM)
README
Bazel-specific Build Rules
failure_list_python_cpp
failure_list_python_cpp
feature_request
feature_request
prototiller-reqs-for-editions
Prototiller Requirements for Editions
README
Python wrappers protobuf/util
README
README
editions-tooling
Editions Tooling
group-migration-issues
Editions: Group Migration Issues
stricter-schemas-with-editions
Stricter Schemas with Editions
installed_lib_shared_golden
installed_lib_shared_golden
legacy-syntax-editions
Legacy Syntax Editions
service_reflection
service_reflection.rst
README
upb util library
failure_list_php
failure_list_php
descriptor_pool
descriptor_pool.rst
type_pb2
type_pb2.rst
third_party
Third-Party Add-ons for Protocol Buffers
REFCOUNTING
Refcounting Tips
text_format_unittest_data_pointy
text_format_unittest_data_pointy
message
message.rst
failure_list_jruby_ffi
failure_list_jruby_ffi
CONTRIBUTING
Contributing to Protocol Buffers
README
src/ Directory Overview
failure_list_python
failure_list_python

README

pgproto

Native Protocol Buffers (proto3) support for PostgreSQL.

Store your protobuf binary data with the rest of your data. Supports: - Schema-aware field extraction without JSONB conversions. - Custom operators for nested field navigation (-> and #>). - Substantial storage savings over standard JSONB. - GIN and standard indexing for fast retrieval.

Coverage Status

📊 Performance Results

In benchmarks comparing 100,000 serialized example.Order messages against equivalent JSONB structures and normalized native relational schemas (using benchmark.sh with static fixtures):

Metric Protobuf (pgproto) JSONB (jsonb) Native Relational (Normalized 1:N) Win
Storage Size 16 MB 46 MB 25 MB 📊 ~35% smaller than Native, ~65% smaller than JSONB!
Single-Row Lookup Latency (Indexed) 4.2 ms 12.4 ms 3.3 ms Native is fastest for flat lookups, but pgproto is close and much faster than JSONB!
Full Document Retrieval Latency 4.2 ms 12.4 ms 32.1 ms 📈 ~7x faster than Native JOINs for full object fetch!

📈 Large Payload Aggregation Benchmark (1KB)

In separate benchmarks querying 100,000 rows with large 1KB payloads (comparing extraction vs JSONB): * Field at Beginning (Tag 1): pgproto is ~35% faster than jsonb (17.6 ms vs 27.2 ms). * Field at End (Tag 3, requires skipping 1KB): pgproto is ~30% faster than jsonb (17.1 ms vs 24.3 ms).

[!NOTE] pgproto combines the storage efficiency of binary compaction with the query flexibility of JSONB, without the overhead of heavy JOINs or text parsing!

Benchmarks ran using un-optimized debug binaries on standard Linux environments.


🛠️ Installation

Linux and Mac

Compile and install the extension (requires standard build-essential and postgresql-server-dev-*).

git clone --recursive https://github.com/protocolbuffers/protobuf/upb pgproto
cd pgproto
make
make install # may need sudo

See the docker-compose.yml if you want to deploy a pre-configured local sandbox.


🏁 Getting Started

Enable the extension (do this once in each database where you want to use it):

CREATE EXTENSION pgproto;

1. Register Your Protobuf Schemas

To understand what fields are in your binary blob, pgproto requires runtime schemas. You can load FileDescriptorSet binary blobs into the registry:

INSERT INTO pb_schemas (name, data) VALUES ('MySchema', '\x...');

2. Create a Protobuf Table

Create a table with the custom protobuf type:

CREATE TABLE items (
    id SERIAL PRIMARY KEY,
    data protobuf
);

3. Insert & Query Values

Insert your serialized binary protobuf blobs:

INSERT INTO items (data) VALUES ('\x0a02082a');

Extract nested standard fields using operators:

-- Extract field id 1 (integer) from nested structure
SELECT data #> '{Outer, inner, id}'::text[] FROM items;

🔍 Querying & Extraction

Extract values using standard PostgreSQL operators:

Nested Field Access

Navigate nested structures using standard text-array paths:

-- Access a nested field deep in protobuf hierarchy
SELECT data #> '{Outer, inner, id}'::text[] FROM items;

Map / Repeated Field Lookups

Navigating complex arrays and maps (using text-arrays for keys and indices):

-- Access map keys inside a nested structure
SELECT data #> '{Outer, tags, mykey}'::text[] FROM items;

✏️ Modification & CRUD Operations

pgproto allows you to update, insert, and delete parts of a Protobuf document without overwriting the whole column, similar to jsonb.

[!IMPORTANT] Functions like pb_set, pb_insert, and pb_delete are pure functions. They do not modify the database in place. They return a new modified protobuf value. To persist changes, you must use them in an UPDATE statement and assign the return value back to the column. The pb_to_json function seen in some examples is not necessary for the operation itself; it is only used to display the binary result in a human-readable format.

Update Fields (pb_set)

Update a field at a specific path. Currently supports singular primitive types (Int32, Float, Bool, String).

-- To persist the change, use it in an UPDATE statement:
UPDATE items SET data = pb_set(data, ARRAY['Outer', 'a'], '42');

-- To view the result without persisting (returns JSON for display):
SELECT pb_to_json(pb_set(data, ARRAY['Outer', 'a'], '42'), 'Outer') FROM items;

Insert into Arrays/Maps (pb_insert)

Insert an element into an array or map.

-- Persist insertion into a repeated field (array)
UPDATE items SET data = pb_insert(data, ARRAY['Outer', 'scores', '0'], '100');

-- Persist insertion into a map
UPDATE items SET data = pb_insert(data, ARRAY['Outer', 'tags', 'key1'], 'value1');

Delete Fields/Elements (pb_delete)

Remove a field or specific element from an array or map.

-- Persist deletion of a field
UPDATE items SET data = pb_delete(data, ARRAY['Outer', 'a']);

-- Persist deletion from an array
UPDATE items SET data = pb_delete(data, ARRAY['Outer', 'scores', '0']);

Merge Messages (|| Operator)

Merge two protobuf messages of the same type. Concatenation of wire format results in standard Protobuf merge (scalars overwrite, arrays append).

-- Persist merge result
UPDATE items SET data = data || other_data;

🗃️ Indexing

B-Tree expression indexing

You can use standard B-Tree indexing on field extraction results for fast lookups:

CREATE INDEX idx_pb_id ON items ((data #> '{Outer, inner, id}'::text[]));

-- Query will use Index Scan instead of sequential scan
EXPLAIN ANALYZE SELECT * FROM items WHERE (data #> '{Outer, inner, id}'::text[]) = 42;

📚 Advanced Usage & Schema Evolution

Complex Types: Enums and oneof

Protobuf enums and oneof fields map naturally to standard extraction functions: - Enums: Encoded as standard varints on the wire. Extract them using pb_get_int32 or the shorthand -> operators. - Oneof: Since oneof fields are just regular fields with a semantic constraint, you can query their values normally.

Schema Evolution Handling

Protobuf’s biggest strength is seamless forward/backward compatibility: - Adding Fields: You can safely add new fields to your .proto definition. Old messages in the database without the field will return NULL or default values when read using the new schema. - Deprecating Fields: Deprecated fields can still be read if they exist in the binary data. If you remove a field from the schema, the engine will safely skip it during traversal.

To update a schema in the registry without breaking existing data: sql -- Update using ON CONFLICT (re-registering is safe!) INSERT INTO pb_schemas (name, data) VALUES ('MySchema', '\x...') ON CONFLICT (name) DO UPDATE SET data = EXCLUDED.data;


🧪 Testing

🟢 Regression Tests (PostgreSQL pg_regress)

Run the standard PostgreSQL regression tests to verify type I/O, operators, and GIN indexing:

make installcheck

🛒 eCommerce Testbench Sandbox (Docker)

We provide an isolated, ready-to-use testing sandbox with a pre-configured schema (order.proto) and sample records. This environment demonstrates advanced features like Maps, Nested Navigation, and Human-Readable JSON conversion.

To spin it up and run queries: ```bash

1. Build and start the container

docker-compose -f example/docker-compose.yml up -d –build

2. Run showcase queries

docker-compose -f example/docker-compose.yml exec db psql -U postgres -d pgproto_showcase -f /workspace/example/queries.sql ```

See example/README.md for more details.


🐳 Running Coverage & Leaks in Docker (Recommended)

You can run both coverage capture and memory leak analysis directly inside your running Docker workspace.

1. 🏗️ Prerequisites

lcov and valgrind are now pre-installed in the Docker image! You can skip manual installation and proceed directly to running tests.

2. 🧪 Coverage Run

Recompile the extension with profiling flags and capture data: ```bash

Recompile inside container

docker-compose -f example/docker-compose.yml exec -u postgres db make clean docker-compose -f example/docker-compose.yml exec -u postgres db make COPT=“-O0 -fprofile-arcs -ftest-coverage” docker-compose -f example/docker-compose.yml exec -u root db make install

Run tests to generate trace data

docker-compose -f example/docker-compose.yml exec -u postgres db make installcheck

Capture output (ignores negative hit counter overflows and version mismatches)

docker-compose -f example/docker-compose.yml exec -u postgres db lcov –capture –directory . –output-file coverage.info –ignore-errors negative,inconsistent,version,gcov

Filter out third_party dependencies to see actual extension coverage

docker-compose -f example/docker-compose.yml exec -u postgres db lcov –remove coverage.info ‘/workspace/third_party/*’ –output-file coverage_filtered.info

View summary

docker-compose -f example/docker-compose.yml exec -u postgres db lcov –summary coverage_filtered.info ```

[!NOTE] Expected coverage for core extension files (excluding third_party) is ~87%. If you encounter permission issues when writing coverage.info or regression.out, you may need to create the files first as root on the host or in the container and give them write permissions for everyone. For example: touch coverage.info && chmod a+rw coverage.info

3. 🧠 Memory Leak Analysis

Run showcase queries through valgrind to verify memory safety: bash docker-compose -f example/docker-compose.yml exec -u postgres db valgrind --leak-check=full --log-file=/workspace/valgrind.log psql -U postgres -d pgproto_showcase -f /workspace/example/valgrind_full.sql

[!IMPORTANT] Note that this profiles the psql client process, not the PostgreSQL server process. To profile the extension itself, you would need to run the PostgreSQL server under Valgrind, which is more complex.


🏗️ Technical Details

For historical design plans, caching mechanisms, and deeper architectural discussion, see DESIGN.md.


Disclaimer

This is a personal project. The views, code, and opinions expressed here are my own and do not represent those of my current or past employers.