Contents
pg_biscuit
A PostgreSQL Index Access Method (IAM) for high-performance pattern matching on text columns. Biscuit indexes are specifically designed to accelerate LIKE queries with arbitrary wildcards.
Overview
pg_biscuit implements a novel indexing approach that maintains character position information using compressed bitmaps (Roaring bitmaps). Unlike traditional B-tree or trigram (pg_trgm) indexes, Biscuit can efficiently handle complex pattern queries including prefix, suffix, substring, and multi-part patterns.
Key Features
- Pattern-optimized indexing: Accelerates
LIKEqueries with%and_wildcards - Full CRUD support: Efficient insert, update, and delete operations with lazy deletion
- Memory-resident architecture: Index lives in shared memory for fast access
- Compressed bitmaps: Uses Roaring bitmaps for space-efficient storage
- Automatic cleanup: Tombstone-based deletion with batch cleanup on threshold
- PostgreSQL 16+ compatible: Implements standard IAM interface
Installation
Prerequisites
- PostgreSQL 16 or later
- C compiler (gcc or clang)
- Optional: Roaring bitmap library for better compression
Build from Source
# Clone or download the extension
cd pg_biscuit
# Build and install
make
sudo make install
# Enable in your database
psql -d your_database -c "CREATE EXTENSION pg_biscuit;"
Usage
Creating a Biscuit Index
-- Basic index on a text column
CREATE INDEX idx_username ON users USING biscuit(username);
-- Index on varchar or other text types
CREATE INDEX idx_email ON emails USING biscuit(email_address);
-- Partial index (only index active records)
CREATE INDEX idx_active_usernames ON users USING biscuit(username)
WHERE status = 'active';
-- Case-insensitive index (use LOWER function)
CREATE INDEX idx_username_lower ON users USING biscuit(LOWER(username));
-- Multi-columnar indexing
CREATE INDEX idx_user ON users USING biscuit(username, email_address);
Query Examples
Biscuit indexes automatically accelerate these query patterns:
-- Prefix match: 'john%'
SELECT * FROM users WHERE username LIKE 'john%';
-- Suffix match: '%@gmail.com'
SELECT * FROM users WHERE email LIKE '%@gmail.com';
-- Substring match: '%admin%'
SELECT * FROM users WHERE username LIKE '%admin%';
-- Complex pattern: '%a%b%c%'
SELECT * FROM logs WHERE message LIKE '%error%database%';
-- Exact match: 'johndoe'
SELECT * FROM users WHERE username LIKE 'johndoe';
-- Case-insensitive (requires lowercase index)
SELECT * FROM users WHERE LOWER(username) LIKE '%admin%';
-- Multiple filter based querying
SELECT * FROM users WHERE username LIKE 'john%' and email_address like '%gmail%';
Index Maintenance
-- View all Biscuit indexes in database
SELECT * FROM biscuit_indexes;
-- Get detailed statistics for an index
SELECT biscuit_index_stats('idx_username'::regclass::oid);
-- Rebuild index if needed
REINDEX INDEX idx_username;
-- Clean up deleted records
VACUUM ANALYZE users;
Architecture
Memory-Resident Design
Biscuit indexes are stored entirely in PostgreSQL’s shared memory:
- Index Build: Scans heap once during
CREATE INDEX - Storage: Lives in index relation’s
rd_amcache - Persistence: Not written to disk; rebuilds on restart (amortized) using full heap scan
- Updates: Maintained incrementally via INSERT/UPDATE/DELETE hooks
Data Structures
- Position Index: Character → Position → Bitmap of record IDs
- Negative Index: Character → Negative offset → Bitmap (for suffix queries)
- Length Bitmaps: Precomputed bitmaps for length-based filtering
- Tombstones: Lazy deletion with bitmap tracking
- Roaring Bitmaps: Compressed bitmap representation
Query Optimization
The engine includes several optimizations:
- Query planning: In case of multiple queries, it intelligently determines the optimal order of execution
- Early Termination: Stops on empty intersection
- Single-Part Fast Path: Avoids recursion for simple patterns
- TID Sorting: Orders results for sequential heap access
- Batch Operations: Bulk bitmap operations for better performance
Limitations
- Memory-Resident: Index rebuilds on database restart (not persisted to disk)
- Max String Length: Limited to 256 characters (configurable via
MAX_POSITIONS) - Case Sensitivity: Case-insensitive searches require function index with
LOWER() - No Full-Text Search: Not a replacement for PostgreSQL’s text search features
Configuration
No configuration is required. The extension automatically:
- Allocates memory in the index context
- Performs cleanup when tombstones reach 1000 (configurable via
TOMBSTONE_CLEANUP_THRESHOLD) - Rebuilds length bitmaps as needed
Development
Code Structure
pg_biscuit.c: Main IAM implementationpg_biscuit--1.0.sql: SQL installation script
Contributing
This is an academic/research project demonstrating a novel indexing approach. Contributions are welcome:
- Bug reports and fixes
- Performance improvements
- Documentation enhancements
- Benchmark additions
License
PostgreSQL License (similar to BSD/MIT)
Acknowledgments
- Uses Roaring bitmap library (optional dependency)
- Implements PostgreSQL Index Access Method interface
Contributors
BISCUIT is developed and maintained by Sivaprasad Murali .
Support and Contact
Issues: https://github.com/crystallinecore/pg_biscuit/issues
Discussions: https://github.com/crystallinecore/pg_biscuit/discussions
When pg_trgm feels half-baked, grab a pg_BISCUIT 🍪