pg_biscuit

A PostgreSQL Index Access Method (IAM) for high-performance pattern matching on text columns. Biscuit indexes are specifically designed to accelerate LIKE queries with arbitrary wildcards.

Overview

pg_biscuit implements a novel indexing approach that maintains character position information using compressed bitmaps (Roaring bitmaps). Unlike traditional B-tree or trigram (pg_trgm) indexes, Biscuit can efficiently handle complex pattern queries including prefix, suffix, substring, and multi-part patterns.

Key Features

  • Pattern-optimized indexing: Accelerates LIKE queries with % and _ wildcards
  • Full CRUD support: Efficient insert, update, and delete operations with lazy deletion
  • Memory-resident architecture: Index lives in shared memory for fast access
  • Compressed bitmaps: Uses Roaring bitmaps for space-efficient storage
  • Automatic cleanup: Tombstone-based deletion with batch cleanup on threshold
  • PostgreSQL 16+ compatible: Implements standard IAM interface

Installation

Prerequisites

  • PostgreSQL 16 or later
  • C compiler (gcc or clang)
  • Optional: Roaring bitmap library for better compression

Build from Source

# Clone or download the extension
cd pg_biscuit

# Build and install
make
sudo make install

# Enable in your database
psql -d your_database -c "CREATE EXTENSION pg_biscuit;"

Usage

Creating a Biscuit Index

-- Basic index on a text column
CREATE INDEX idx_username ON users USING biscuit(username);

-- Index on varchar or other text types
CREATE INDEX idx_email ON emails USING biscuit(email_address);

-- Partial index (only index active records)
CREATE INDEX idx_active_usernames ON users USING biscuit(username)
WHERE status = 'active';

-- Case-insensitive index (use LOWER function)
CREATE INDEX idx_username_lower ON users USING biscuit(LOWER(username));

-- Multi-columnar indexing
CREATE INDEX idx_user ON users USING biscuit(username, email_address);

Query Examples

Biscuit indexes automatically accelerate these query patterns:

-- Prefix match: 'john%'
SELECT * FROM users WHERE username LIKE 'john%';

-- Suffix match: '%@gmail.com'
SELECT * FROM users WHERE email LIKE '%@gmail.com';

-- Substring match: '%admin%'
SELECT * FROM users WHERE username LIKE '%admin%';

-- Complex pattern: '%a%b%c%'
SELECT * FROM logs WHERE message LIKE '%error%database%';

-- Exact match: 'johndoe'
SELECT * FROM users WHERE username LIKE 'johndoe';

-- Case-insensitive (requires lowercase index)
SELECT * FROM users WHERE LOWER(username) LIKE '%admin%';

-- Multiple filter based querying
SELECT * FROM users WHERE username LIKE 'john%' and email_address like '%gmail%';

Index Maintenance

-- View all Biscuit indexes in database
SELECT * FROM biscuit_indexes;

-- Get detailed statistics for an index
SELECT biscuit_index_stats('idx_username'::regclass::oid);

-- Rebuild index if needed
REINDEX INDEX idx_username;

-- Clean up deleted records
VACUUM ANALYZE users;

Architecture

Memory-Resident Design

Biscuit indexes are stored entirely in PostgreSQL’s shared memory:

  1. Index Build: Scans heap once during CREATE INDEX
  2. Storage: Lives in index relation’s rd_amcache
  3. Persistence: Not written to disk; rebuilds on restart (amortized) using full heap scan
  4. Updates: Maintained incrementally via INSERT/UPDATE/DELETE hooks

Data Structures

  • Position Index: Character → Position → Bitmap of record IDs
  • Negative Index: Character → Negative offset → Bitmap (for suffix queries)
  • Length Bitmaps: Precomputed bitmaps for length-based filtering
  • Tombstones: Lazy deletion with bitmap tracking
  • Roaring Bitmaps: Compressed bitmap representation

Query Optimization

The engine includes several optimizations:

  1. Query planning: In case of multiple queries, it intelligently determines the optimal order of execution
  2. Early Termination: Stops on empty intersection
  3. Single-Part Fast Path: Avoids recursion for simple patterns
  4. TID Sorting: Orders results for sequential heap access
  5. Batch Operations: Bulk bitmap operations for better performance

Limitations

  1. Memory-Resident: Index rebuilds on database restart (not persisted to disk)
  2. Max String Length: Limited to 256 characters (configurable via MAX_POSITIONS)
  3. Case Sensitivity: Case-insensitive searches require function index with LOWER()
  4. No Full-Text Search: Not a replacement for PostgreSQL’s text search features

Configuration

No configuration is required. The extension automatically:

  • Allocates memory in the index context
  • Performs cleanup when tombstones reach 1000 (configurable via TOMBSTONE_CLEANUP_THRESHOLD)
  • Rebuilds length bitmaps as needed

Development

Code Structure

  • pg_biscuit.c: Main IAM implementation
  • pg_biscuit--1.0.sql: SQL installation script

Contributing

This is an academic/research project demonstrating a novel indexing approach. Contributions are welcome:

  • Bug reports and fixes
  • Performance improvements
  • Documentation enhancements
  • Benchmark additions

License

PostgreSQL License (similar to BSD/MIT)

Acknowledgments

  • Uses Roaring bitmap library (optional dependency)
  • Implements PostgreSQL Index Access Method interface

Contributors

BISCUIT is developed and maintained by Sivaprasad Murali .

Support and Contact

Issues: https://github.com/crystallinecore/pg_biscuit/issues

Discussions: https://github.com/crystallinecore/pg_biscuit/discussions

When pg_trgm feels half-baked, grab a pg_BISCUIT 🍪