Contents
pg_biscuit
A PostgreSQL Index Access Method (IAM) for high-performance pattern matching on text columns. Biscuit indexes are specifically designed to accelerate LIKE queries with arbitrary wildcards.
Overview
pg_biscuit implements a novel indexing approach that maintains character position information using compressed bitmaps (Roaring bitmaps). Unlike traditional B-tree or trigram (pg_trgm) indexes, Biscuit can efficiently handle complex pattern queries including prefix, suffix, substring, and multi-part patterns.
Key Features
- Pattern-optimized indexing: Accelerates
LIKEqueries with%and_wildcards - Full CRUD support: Efficient insert, update, and delete operations with O(1) lazy deletion
- Memory-resident architecture: Index lives in shared memory for fast access
- Compressed bitmaps: Uses Roaring bitmaps for space-efficient storage
- Automatic cleanup: Tombstone-based deletion with batch cleanup on threshold
- PostgreSQL 15+ compatible: Implements standard IAM interface
Installation
Prerequisites
- PostgreSQL 16 or later
- C compiler (gcc or clang)
- Optional: Roaring bitmap library for better compression
Build from Source
# Clone or download the extension
cd pg_biscuit
# Build and install
make
sudo make install
# Enable in your database
psql -d your_database -c "CREATE EXTENSION pg_biscuit;"
Usage
Creating a Biscuit Index
-- Basic index on a text column
CREATE INDEX idx_username ON users USING biscuit(username);
-- Index on varchar or other text types
CREATE INDEX idx_email ON emails USING biscuit(email_address);
-- Partial index (only index active records)
CREATE INDEX idx_active_usernames ON users USING biscuit(username)
WHERE status = 'active';
-- Case-insensitive index (use LOWER function)
CREATE INDEX idx_username_lower ON users USING biscuit(LOWER(username));
Query Examples
Biscuit indexes automatically accelerate these query patterns:
-- Prefix match: 'john%'
SELECT * FROM users WHERE username LIKE 'john%';
-- Suffix match: '%@gmail.com'
SELECT * FROM users WHERE email LIKE '%@gmail.com';
-- Substring match: '%admin%'
SELECT * FROM users WHERE username LIKE '%admin%';
-- Complex pattern: '%a%b%c%'
SELECT * FROM logs WHERE message LIKE '%error%database%';
-- Exact match: 'johndoe'
SELECT * FROM users WHERE username LIKE 'johndoe';
-- Case-insensitive (requires lowercase index)
SELECT * FROM users WHERE LOWER(username) LIKE '%admin%';
Index Maintenance
-- View all Biscuit indexes in database
SELECT * FROM biscuit_indexes;
-- Get detailed statistics for an index
SELECT biscuit_index_stats('idx_username'::regclass::oid);
-- Rebuild index if needed
REINDEX INDEX idx_username;
-- Clean up deleted records
VACUUM ANALYZE users;
Diagnostic Output Example
SELECT biscuit_index_stats('idx_username'::regclass::oid);
Biscuit Index Statistics (FULLY OPTIMIZED)
==========================================
Index: idx_username
Active records: 995432
Total slots: 1000000
Free slots: 156
Tombstones: 0
Max length: 64
------------------------
CRUD Statistics:
Inserts: 1000000
Updates: 0
Deletes: 4568
------------------------
Active Optimizations:
✓ 1. Skip wildcard intersections
✓ 2. Early termination on empty
✓ 3. Avoid redundant copies
✓ 4. Optimized single-part patterns
✓ 5. Skip unnecessary length ops
✓ 6. TID sorting for sequential I/O
✓ 7. Batch TID insertion
✓ 8. Direct bitmap iteration
✓ 9. Parallel bitmap scan support
✓ 10. Batch cleanup on threshold
Performance Characteristics
Benchmark Results (1M UUID records)
The included benchmark script (tests/benchmark.sql) tests various pattern types against Sequential Scan, B-Tree, pg_trgm (GIN), and Biscuit indexes:
Overall Performance (Geometric Mean):
- Biscuit: 4.33 ms
- pg_trgm: 5.62 ms
- Sequential Scan: 5.50 ms
- B-Tree: 2.34 ms (prefix-only queries)
Category Winners:
- Exact Match: B-Tree (0.22 ms) - Best for single lookups
- Prefix Match: pg_trgm/Biscuit/Sequential tied (~2.1-2.3 ms)
- Suffix Match: pg_trgm (3.02 ms) - Slight edge over Biscuit (3.30 ms)
- Contains Match: Biscuit (3.60 ms) - Outperforms pg_trgm (3.79 ms)
- Complex Patterns: Biscuit (202.4 ms) - Better than pg_trgm (211.3 ms)
- Case Sensitivity: Biscuit (0.70 ms) - Fastest for case-specific queries
Key Findings:
- Biscuit excels at: Contains queries (
%text%), complex multi-part patterns, and case-sensitive searches - pg_trgm excels at: Suffix queries and some prefix scenarios
- B-Tree excels at: Exact matches and simple prefix queries
- Performance is workload-dependent: No single index type dominates all scenarios
When to Use Biscuit
Good Use Cases:
- Frequent substring searches (
%keyword%) - Complex pattern queries with multiple parts
- Mixed prefix/suffix/contains patterns
- Case-sensitive pattern matching
- Moderate to high cardinality text columns
Not Recommended For:
- Simple equality checks (use B-Tree)
- Primarily prefix-only searches (B-Tree may be sufficient)
- Very low selectivity queries (>50% of rows)
- Full-text search (use PostgreSQL’s FTS instead)
Index Size
From benchmarks on 1M records:
- Biscuit: 0 bytes (memory-resident, not persisted)
- pg_trgm (GIN): 132 MB
- B-Tree: 56 MB
Note: Biscuit is currently a memory-resident index. It rebuilds from the heap on database restart. This trades persistence for query performance.
Architecture
Memory-Resident Design
Biscuit indexes are stored entirely in PostgreSQL’s shared memory:
- Index Build: Scans heap once during
CREATE INDEX - Storage: Lives in index relation’s
rd_amcache - Persistence: Not written to disk; rebuilds on restart
- Updates: Maintained incrementally via INSERT/UPDATE/DELETE hooks
Data Structures
- Position Index: Character → Position → Bitmap of record IDs
- Negative Index: Character → Negative offset → Bitmap (for suffix queries)
- Length Bitmaps: Precomputed bitmaps for length-based filtering
- Tombstones: Lazy deletion with bitmap tracking
- Roaring Bitmaps: Compressed bitmap representation
Query Optimization
The engine includes several optimizations:
- Wildcard Skipping: Only intersects concrete characters, skips
_ - Early Termination: Stops on empty intersection
- Single-Part Fast Path: Avoids recursion for simple patterns
- TID Sorting: Orders results for sequential heap access
- Batch Operations: Bulk bitmap operations for better performance
Limitations
- Memory-Resident: Index rebuilds on database restart (not persisted to disk)
- Single Column: Only supports one indexed column
- Max String Length: Limited to 256 characters (configurable via
MAX_POSITIONS) - Case Sensitivity: Case-insensitive searches require function index with
LOWER() - No Full-Text Search: Not a replacement for PostgreSQL’s text search features
Configuration
No configuration is required. The extension automatically:
- Allocates memory in the index context
- Performs cleanup when tombstones reach 1000 (configurable via
TOMBSTONE_CLEANUP_THRESHOLD) - Rebuilds length bitmaps as needed
Development
Running Benchmarks
# Load test data and run comprehensive benchmarks
psql -d test_db -f benchmark.sql
# Results include:
# - 7 test categories (exact, prefix, suffix, contains, complex, selectivity, case)
# - 112+ individual test runs
# - Statistical analysis and performance comparisons
Code Structure
pg_biscuit.c: Main IAM implementationpg_biscuit--1.0.sql: SQL installation scriptbenchmark.sql: Comprehensive benchmark suite
Contributing
This is an academic/research project demonstrating a novel indexing approach. Contributions are welcome:
- Bug reports and fixes
- Performance improvements
- Documentation enhancements
- Benchmark additions
License
PostgreSQL License (similar to BSD/MIT)
Acknowledgments
- Uses Roaring bitmap library (optional dependency)
- Implements PostgreSQL Index Access Method interface
Disclaimer
This is experimental software. While functional, it is not recommended for production use without thorough testing in your specific environment. The memory-resident architecture means indexes must rebuild on restart, which may not be suitable for all workloads.
Contributors
BISCUIT is developed and maintained by Sivaprasad Murali .
Support and Contact
Issues: https://github.com/crystallinecore/pg_biscuit/issues
Discussions: https://github.com/crystallinecore/pg_biscuit/discussions
When pg_trgm feels half-baked, grab a pg_BISCUIT 🍪