Extensions
README
Contents
- BISCUIT: Bitmap Indexed Searching with Combinatorial Union and Intersection Techniques
- PostgreSQL Extension for High-Performance Wildcard Pattern Matching
- Table of Contents
- Executive Summary
- Technical Overview
- Installation and Requirements
- Configuration and Setup
- Query Interface
- Pattern Matching Specification
- CRUD Operations and Index Maintenance
- Architecture and Implementation
- Advanced Features
- API Reference
- Monitoring and Diagnostics
- Optimization Guidelines
- Troubleshooting
- Examples and Use Cases
- Example 1: E-Commerce Product Search
- Example 2: Customer Email Domain Analysis
- Example 3: Log Analysis and Error Detection
- Example 4: Genetic Sequence Matching
- Example 5: Social Media Username Search
- Example 6: Document Management System
- Example 7: Bulk Data Migration with BISCUIT
- Example 8: Multi-Pattern Search Interface
- Example 9: Performance Monitoring Dashboard
- Complete Production Example
- License
- Version History
- Contributors
- Support and Contact
BISCUIT: Bitmap Indexed Searching with Combinatorial Union and Intersection Techniques
PostgreSQL Extension for High-Performance Wildcard Pattern Matching
Version 1.0.0
Table of Contents
- Executive Summary
- Technical Overview
- Installation and Requirements
- Configuration and Setup
- Query Interface
- Pattern Matching Specification
- CRUD Operations and Index Maintenance
- Architecture and Implementation
- API Reference
- Monitoring and Diagnostics
- Optimization Guidelines
- Limitations and Constraints
- Troubleshooting
- Examples and Use Cases
Executive Summary
BISCUIT is a PostgreSQL C extension that provides specialized bitmap indexing for wildcard pattern matching operations. The extension addresses a fundamental limitation in traditional database indexes: efficient handling of leading wildcard queries (e.g., LIKE '%pattern'), which typically require full table scans.
Key Capabilities
- High-Performance Wildcard Matching: Achieves O(1) to O(log n) query performance regardless of wildcard placement
- Automatic Index Maintenance: Transparent CRUD synchronization via PostgreSQL triggers
- Memory-Efficient Storage: Compressed bitmap structures using Roaring Bitmap algorithm
- Lazy Deletion Strategy: Deferred cleanup with intelligent batching for optimal throughput
- Incremental Update Optimization: Minimized reindexing overhead for similar string modifications
- Universal Primary Key Support: Compatible with any PostgreSQL data type
- Production-Ready Design: Comprehensive error handling, statistics tracking, and diagnostic capabilities
Performance Characteristics
| Metric | Value | |––––|—––| | Query Latency (Simple Pattern) | O(1) to O(log n) | | Query Latency (Complex Pattern) | O(k × m), k=parts, m=window | | Insert Operation | O(L), L=string length | | Update Operation (Incremental) | O(d), d=differences | | Delete Operation | O(1) (lazy tombstone) | | Memory Overhead | 10-30% of indexed data size | | Supported String Length | 256 characters (indexed portion) |
Technical Overview
Problem Statement
Traditional database indexes (B-tree, GiST, GIN) exhibit suboptimal performance for wildcard pattern matching queries, particularly when wildcards appear at the beginning of patterns. A query such as SELECT * FROM table WHERE column LIKE '%pattern%' typically degrades to a sequential scan with O(n) complexity, where n represents the total number of rows.
Solution Architecture
BISCUIT implements a multi-dimensional bitmap index structure that decomposes strings into positional character mappings. Each character at each position is tracked using compressed bitmaps (Roaring Bitmaps), enabling constant-time intersection operations for pattern matching.
Core Index Structures
-
Positional Character Index: Maps each character to a sorted array of position-bitmap pairs
- Forward positions:
{0, 1, 2, ..., n}from string start - Reverse positions:
{-1, -2, -3, ..., -n}from string end
- Forward positions:
-
Character Existence Cache: Union of all positional bitmaps per character for O(1) containment queries
-
Length Index: Bitmap array indexed by string length for length-constrained queries
-
Primary Key Hash Table: O(1) lookup structure mapping primary keys to internal indices
-
Lazy Deletion Layer: Tombstone bitmap tracking deleted records with deferred cleanup
Algorithmic Approach
Pattern Parsing: Input patterns are decomposed into literal segments separated by wildcards (%). Each segment’s position constraints (prefix, suffix, or floating) are analyzed.
Query Optimization: Pattern structure determines execution strategy:
- Single-character queries: Direct bitmap lookup
- Prefix patterns (
abc%): Position-0 intersection with length filter - Suffix patterns (
%xyz): Negative-position intersection with length filter - Containment patterns (
%abc%): Character cache intersection - Multi-part patterns: Windowed position matching with recursive refinement
Bitmap Operations: Compressed bitmaps enable efficient set operations:
- Intersection: AND operation for constraint combination
- Union: OR operation for alternative matches
- Difference: ANDNOT operation for tombstone filtering
Data Flow
Query Pattern → Parse → Optimize → Bitmap Ops → Filter Tombstones → Result Set
↓
Insert/Update/Delete → Hash Lookup → Bitmap Update → Tombstone Management
Installation and Requirements
System Requirements
- PostgreSQL Version: 11.0 or higher
- Operating System: Linux, macOS, Windows (with MinGW)
- Architecture: x86_64, ARM64
- Memory: Minimum 512MB available RAM per index
- Compiler: GCC 4.8+, Clang 3.9+, or MSVC 2017+
Optional Dependencies
- CRoaring Library: For optimized Roaring Bitmap operations (recommended)
- If unavailable, extension falls back to built-in bitmap implementation
- Installation:
apt-get install libroaring-dev(Debian/Ubuntu)
Installation Procedure
Method 1: PostgreSQL Extension System
CREATE EXTENSION biscuit;
This command must be executed by a superuser or a user with CREATE EXTENSION privileges.
Method 2: Manual Compilation
# Clone repository
git clone https://github.com/crystallinecore/biscuit.git
cd biscuit
# Compile
make
# Install (requires superuser privileges)
sudo make install
# Load extension
psql -d your_database -c "CREATE EXTENSION biscuit;"
Verification
SELECT biscuit_version();
-- Expected output: 1.0.0-Biscuit
Configuration and Setup
Initialization Options
Option 1: Automated Setup (Recommended)
The biscuit_setup() function provides a complete initialization workflow:
SELECT biscuit_setup(
p_table_name TEXT,
p_column_name TEXT,
p_pk_column_name TEXT DEFAULT 'id'
);
Parameters:
p_table_name: Target table name (required)p_column_name: Column to index (required, must be text-compatible)p_pk_column_name: Primary key column name (default: ‘id’)
Operations Performed:
- Validates table and column existence
- Constructs bitmap index structure
- Generates strongly-typed wrapper functions
- Installs AFTER triggers for INSERT, UPDATE, DELETE operations
- Returns detailed confirmation message
Example:
SELECT biscuit_setup('customer_records', 'email_address', 'customer_id');
Output:
Biscuit index built successfully.
Created biscuit_match() and biscuit_match_rows() functions for table: customer_records
Columns: customer_id integer, email_address text
Successfully created trigger on table: customer_records
The index will now automatically update on INSERT, UPDATE, and DELETE operations.
Option 2: Manual Configuration (Advanced)
For fine-grained control, configure components individually:
Step 1: Index Construction
SELECT biscuit_build_index(
table_name TEXT,
column_name TEXT,
pk_column_name TEXT DEFAULT 'id'
);
Step 2: Function Generation
SELECT biscuit_create_match_function();
This creates two query functions:
biscuit_match(pattern TEXT): ReturnsSETOF table_typebiscuit_match_rows(pattern TEXT): ReturnsTABLE(pk type, value TEXT)
Step 3: Trigger Activation
SELECT biscuit_enable_triggers();
Installs trigger named biscuit_auto_update on the indexed table.
Primary Key Considerations
BISCUIT supports heterogeneous primary key types through automatic type conversion:
| PostgreSQL Type | Internal Representation | Notes | |––––––––|————————|—––| | INT2, INT4, INT8 | String (optimized) | Direct conversion | | TEXT, VARCHAR, CHAR | String (native) | No conversion overhead | | UUID | Canonical string format | Standardized representation | | Other types | Type output function | Uses PostgreSQL type system |
Performance Note: Integer primary keys receive optimized conversion paths. UUID and text types have minimal overhead. Custom types utilize PostgreSQL’s type output functions.
Index Scope and Limitations
- Single Column Per Index: Each index covers one text column
- Multiple Indexes Supported: Create separate indexes for different columns
- Table Scope: One active BISCUIT index per table
- Character Limit: Positions beyond 256 characters are not indexed (but stored)
Example: Multiple Column Indexing
-- Index email column
SELECT biscuit_setup('users', 'email', 'id');
-- To index another column, use a different approach
-- (current implementation: one index per table)
Query Interface
Function Signatures and Return Types
Primary Query Functions
1. Full Tuple Retrieval (Strongly-Typed)
biscuit_match(pattern TEXT) RETURNS SETOF table_type
Returns complete rows from the indexed table matching the specified pattern. Return type matches the exact schema of the indexed table.
Characteristics:
- Type-safe: Returns actual table type
- No column specification required
- Supports standard SQL operations (WHERE, ORDER BY, JOIN)
- Optimal for production queries
Example:
SELECT * FROM biscuit_match('%@example.com%');
SELECT id, email FROM biscuit_match('user%') WHERE created_at > '2024-01-01';
SELECT COUNT(*) FROM biscuit_match('%gmail%');
2. Generic Tuple Retrieval
biscuit_match_rows(pattern TEXT) RETURNS TABLE(pk primary_key_type, value TEXT)
Returns primary key and indexed column value pairs.
Use Cases:
- When only key-value pairs are needed
- Intermediate processing before joining
- Reduced data transfer overhead
Example:
SELECT pk, value FROM biscuit_match_rows('%pattern%');
-- Join back to original table if needed
SELECT t.*
FROM original_table t
JOIN biscuit_match_rows('%pattern%') b ON t.id = b.pk;
3. Count Aggregation
biscuit_match_count(pattern TEXT) RETURNS INTEGER
Returns the count of matching records without materializing result set.
Performance: This is the fastest query method, performing only bitmap operations without tuple retrieval.
Example:
SELECT biscuit_match_count('%search_term%');
4. Key-Value Extraction
biscuit_match_keys(pattern TEXT) RETURNS TABLE(pk TEXT, value TEXT)
Low-level function returning primary keys and indexed values as text. Used internally by higher-level functions.
Note: Primary keys are returned as text regardless of original type.
Query Patterns and SQL Integration
Standard SQL Operations
All biscuit_match() results support complete SQL operations:
-- Filtering
SELECT * FROM biscuit_match('%pattern%')
WHERE status = 'active' AND created_at > NOW() - INTERVAL '30 days';
-- Ordering
SELECT * FROM biscuit_match('A%')
ORDER BY name DESC, created_at ASC
LIMIT 100;
-- Aggregation
SELECT category, COUNT(*), AVG(price)
FROM biscuit_match('%widget%')
GROUP BY category
HAVING COUNT(*) > 10;
-- Joins
SELECT p.*, c.category_name
FROM biscuit_match('%special%') p
JOIN categories c ON p.category_id = c.id;
-- Subqueries
SELECT *
FROM products
WHERE id IN (
SELECT id FROM biscuit_match('%premium%')
);
-- Common Table Expressions
WITH matched_products AS (
SELECT * FROM biscuit_match('%electronics%')
)
SELECT m.*, s.stock_level
FROM matched_products m
JOIN stock s ON m.id = s.product_id;
Performance Considerations
Optimal Patterns:
- Use bitmap index for pattern matching
- Apply additional filters after pattern match
- Leverage PostgreSQL’s query planner for joins
Query Plan Example:
EXPLAIN ANALYZE
SELECT * FROM biscuit_match('%search%')
WHERE status = 'active';
-- Result: Bitmap operations complete in milliseconds
-- Additional filter applied to result set
Pattern Matching Specification
Wildcard Operators
BISCUIT implements SQL LIKE-style pattern matching with two wildcard operators:
| Operator | Semantics | Character Match | Examples |
|–––––|———–|––––––––|–––––|
| % | Zero or more characters | [0, ∞) | 'a%' → “a”, “ab”, “abcdef” |
| _ | Exactly one character | 1 | 'a_c' → “abc”, “axc” |
Pattern Categories and Optimization Paths
1. Exact Match Patterns
Pattern Structure: No wildcards
Example: 'exact_string'
Optimization: Length bitmap intersection with position-0 character match
Complexity: O(1)
SELECT biscuit_match_count('hello');
-- Matches only "hello" exactly
2. Prefix Patterns
Pattern Structure: Literal prefix followed by %
Example: 'prefix%'
Optimization: Position-0 character sequence with length≥prefix_length filter
Complexity: O(log n)
SELECT * FROM biscuit_match('user_');
-- Matches: "user_admin", "user_john", etc.
3. Suffix Patterns
Pattern Structure: % followed by literal suffix
Example: '%suffix'
Optimization: Negative-position character sequence with length≥suffix_length filter
Complexity: O(log n)
SELECT * FROM biscuit_match('%@gmail.com');
-- Matches: "john@gmail.com", "admin@gmail.com"
4. Containment Patterns
Pattern Structure: %literal%
Example: '%search%'
Optimization: Character cache intersection
Complexity: O(1) to O(m), m=pattern_length
SELECT * FROM biscuit_match('%error%');
-- Matches any string containing "error"
5. Complex Multi-Part Patterns
Pattern Structure: Multiple literal segments with wildcards Examples:
'start%middle%end''%part1%part2%''a%b%c%d'
Optimization: Windowed position matching with recursive refinement Complexity: O(k × m), k=parts, m=search_window
SELECT * FROM biscuit_match('user_%@%.com');
-- Matches: "user_john@example.com", "user_admin@test.com"
6. Single-Character Wildcard Patterns
Pattern Structure: Underscores with optional literals Examples:
'___'(exactly 3 characters)'a_c'(a, any char, c)'___%'(at least 3 characters)
Optimization: Length bitmap or position bitmap intersection Complexity: O(1) to O(log n)
SELECT * FROM biscuit_match('___');
-- Matches: "abc", "xyz", "123" (any 3-character string)
SELECT * FROM biscuit_match('test_');
-- Matches: "test1", "testA", "test_" (test + any char)
Pattern Syntax Reference
| Pattern | Description | Matches | Does Not Match |
|———|———––|———|––––––––|
| 'abc' | Exact string | “abc” | “abcd”, “xabc”, “ABC” |
| 'a%' | Starts with ‘a’ | “a”, “abc”, “a123” | “ba”, “ABC” |
| '%z' | Ends with ‘z’ | “z”, “xyz”, “123z” | “za”, “Z” |
| '%abc%' | Contains ‘abc’ | “abc”, “xabcx”, “123abc” | “ab”, “ABC” |
| 'a%z' | Starts ‘a’, ends ‘z’ | “az”, “abcz”, “a123xyz” | “za”, “abc” |
| '_' | Single character | “a”, “1”, “” | “”, “ab” |
| '___' | Exactly 3 chars | “abc”, “123”, “__” | “ab”, “abcd” |
| 'a_c' | ‘a’ + any + ‘c’ | “abc”, “axc”, “a1c” | “ac”, “abbc” |
| '%a%b%' | ‘a’ before ‘b’ | “ab”, “axb”, “xaby” | “ba”, “bxa” |
| '%' | Any string | All non-NULL | NULL |
| '' | Empty string | “” | Any non-empty |
Special Cases
Empty Pattern
SELECT biscuit_match_count('');
-- Returns count of empty strings in indexed column
Match All
SELECT biscuit_match_count('%');
-- Returns count of all non-deleted records
Case Sensitivity
BISCUIT patterns are case-sensitive by default. For case-insensitive matching:
-- Option 1: Index lowercase values
CREATE TABLE users_normalized (
id SERIAL PRIMARY KEY,
email TEXT,
email_lower TEXT GENERATED ALWAYS AS (LOWER(email)) STORED
);
SELECT biscuit_setup('users_normalized', 'email_lower', 'id');
SELECT * FROM biscuit_match(LOWER('%SEARCH%'));
-- Option 2: Use expression index (future enhancement)
NULL Handling
- NULL values are never indexed
- NULL column values are treated as empty strings during indexing
- Pattern matching never returns NULL values
-- NULL values are skipped
INSERT INTO table (id, text_col) VALUES (1, NULL);
SELECT biscuit_match_count('%');
-- Does not include row 1
CRUD Operations and Index Maintenance
Automatic Synchronization
When triggers are enabled via biscuit_setup() or biscuit_enable_triggers(), all DML operations automatically maintain index consistency with zero application code changes.
INSERT Operations
Behavior: New records are immediately indexed and available for queries.
Algorithm:
- Extract primary key and indexed column value from new tuple
- Convert primary key to canonical string representation
- Allocate index slot (reuse free slot if available, otherwise expand)
- Generate positional bitmaps for each character
- Update length index and character cache
- Register primary key in hash table
Complexity: O(L), where L = string length (up to 256 characters)
Example:
INSERT INTO products (id, name, description)
VALUES (10001, 'Premium Widget', 'High-quality widget');
-- Index automatically updated
SELECT * FROM biscuit_match('%Widget%');
-- Returns the newly inserted row immediately
Bulk Insert Performance:
-- For large bulk inserts, consider disabling triggers
SELECT biscuit_disable_triggers();
INSERT INTO products (id, name)
SELECT generate_series(1, 1000000), 'Product ' || generate_series(1, 1000000);
-- Rebuild index after bulk operation
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
UPDATE Operations
Behavior: Updates are intelligently optimized based on string similarity.
Incremental Update Path
Criteria for Incremental Update:
- Same string length (before and after)
- Minimum length ≥ 3 characters
- Character differences < 20% of string length
- Maximum 3 changed characters
Algorithm:
- Compute character-level diff between old and new values
- Remove old characters from affected position bitmaps
- Add new characters to affected position bitmaps
- Update character cache for changed characters only
Complexity: O(d), where d = number of different characters
Example (Incremental):
-- Original: "john.doe@example.com"
UPDATE users SET email = 'john.doe@example.org' WHERE id = 123;
-- Only 3 characters changed (com→org), uses incremental update
Full Reindex Path
Criteria:
- Different string lengths
- Changes exceed incremental update threshold
- String < 3 characters
Algorithm:
- Remove all position bitmaps for old value
- Remove from length index and character cache
- Re-index completely with new value
- Update length index and character cache
Complexity: O(L), where L = new string length
Example (Full Reindex):
-- Original: "short"
UPDATE users SET email = 'much.longer.email@example.com' WHERE id = 123;
-- Different lengths, requires full reindex
Statistics Tracking:
SELECT biscuit_index_status();
-- Shows ratio of incremental vs full updates
-- Example: "Updates: 1000 (incr: 750, 75.0%)"
DELETE Operations
Behavior: Implements lazy deletion with tombstone bitmaps for optimal performance.
Algorithm (O(1) Lazy Deletion):
- Hash table lookup to find internal index (O(1))
- Add index to tombstone bitmap (O(1))
- Store index in deletion queue
- Remove primary key from hash table
- Add slot to free list for future reuse
- Defer removal from position/length bitmaps
Complexity: O(1) - actual cleanup deferred
Example:
DELETE FROM products WHERE id = 123;
-- Returns immediately, cleanup happens later
Tombstone Cleanup Process
Automatic Cleanup Triggers:
- Tombstone count ≥ threshold (default: 1000)
- Manual invocation:
SELECT biscuit_cleanup();
Cleanup Algorithm:
- Batch remove tombstoned indices from all bitmaps
- Free associated string memory
- Clear tombstone bitmap
- Reset deletion queue
Cleanup Statistics:
SELECT biscuit_index_status();
-- Pending tombstones: 856 (85.6% of deletes)
-- Total cleanups: 12
-- Items cleaned: 11442
Performance Characteristics:
| Scenario | Delete Latency | Query Overhead | Memory Overhead | |–––––|—————|––––––––|—————–| | No tombstones | O(1) | None | Minimal | | < 1000 tombstones | O(1) | O(t) filter | ~t × 32 bytes | | ≥ 1000 tombstones | O(1) + cleanup | Minimal | Minimal |
Note: Query operations automatically filter tombstones, so deleted records never appear in results despite deferred cleanup.
Upsert Operations (INSERT … ON CONFLICT)
Behavior: Automatically handled through INSERT and UPDATE trigger paths.
Example:
INSERT INTO products (id, name)
VALUES (123, 'New Product')
ON CONFLICT (id)
DO UPDATE SET name = EXCLUDED.name;
-- If row 123 exists: triggers UPDATE path
-- If row 123 doesn't exist: triggers INSERT path
Transaction Semantics
ACID Compliance:
- Atomicity: Index updates commit/rollback with transaction
- Consistency: Index always reflects committed table state
- Isolation: Follows PostgreSQL’s MVCC isolation levels
- Durability: Index persists across server restarts (rebuild required)
Example:
BEGIN;
INSERT INTO products (id, name) VALUES (1, 'Product A');
SELECT * FROM biscuit_match('%Product A%'); -- Visible in transaction
ROLLBACK;
SELECT * FROM biscuit_match('%Product A%'); -- Not visible, index reverted
Current Limitation: Index resides in shared memory and must be rebuilt after server restart.
Trigger Management
Enable Triggers
SELECT biscuit_enable_triggers();
Creates trigger: biscuit_auto_update as AFTER INSERT OR UPDATE OR DELETE FOR EACH ROW.
Disable Triggers
SELECT biscuit_disable_triggers();
Removes triggers but preserves index structure. Use for:
- Bulk data operations
- Table truncation
- Data migration
- Testing scenarios
Check Trigger Status
SELECT biscuit_index_status();
-- Shows whether triggers are active
Architecture and Implementation
System Components
┌─────────────────────────────────────────────────────────────┐
│ PostgreSQL Backend │
├─────────────────────────────────────────────────────────────┤
│ Query Interface │ Trigger System │ Memory Management │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ BISCUIT Extension Layer │
├──────────────────────┬──────────────────┬───────────────────┤
│ Pattern Parser │ Query Executor │ CRUD Handler │
│ - Wildcard analysis │ - Bitmap ops │ - Insert/Update │
│ - Optimization │ - Result filter │ - Delete/Cleanup │
└──────────────────────┴──────────────────┴───────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Index Data Structures │
├──────────────┬──────────────┬──────────────┬────────────────┤
│ Positional │ Character │ Length │ PK Hash │
│ Bitmaps │ Cache │ Index │ Table │
│ (256 pos) │ (256 chars) │ (L bitmaps) │ (O(1)) │
└──────────────┴──────────────┴──────────────┴────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Roaring Bitmap Implementation │
├──────────────────────────────────────────────────────────────┤
│ - Compressed bitmap storage │
│ - Fast set operations (AND, OR, ANDNOT) │
│ - Memory-efficient sparse representation │
└──────────────────────────────────────────────────────────────┘
Data Structure Details
1. Positional Character Index
Purpose: Track character occurrences at specific positions
Structure:
typedef struct {
int pos; // Position in string
RoaringBitmap *bitmap; // Record IDs containing char at pos
} PosEntry;
typedef struct {
PosEntry *entries; // Sorted array of position entries
int count; // Number of positions tracked
int capacity; // Allocated capacity
} CharIndex;
CharIndex pos_idx[256]; // Forward positions (0, 1, 2, ...)
CharIndex neg_idx[256]; // Reverse positions (-1, -2, -3, ...)
Example: For strings [“cat”, “car”, “dog”]:
pos_idx['c'][0] = {0, 1} // 'c' at position 0 in records 0, 1
pos_idx['a'][1] = {0, 1} // 'a' at position 1 in records 0, 1
pos_idx['t'][2] = {0} // 't' at position 2 in record 0
neg_idx['t'][-1] = {0} // 't' at last position in record 0
Access Complexity: Binary search over positions: O(log P), P ≤ 256
2. Character Existence Cache
Purpose: O(1) lookup for “contains character” queries
Structure:
RoaringBitmap *char_cache[256]; // Union of all position bitmaps per char
Construction:
char_cache['a'] = pos_idx['a'][0] ∪ pos_idx['a'][1] ∪ ... ∪ pos_idx['a'][n]
Use Case: Pattern %a% directly queries char_cache['a']
3. Length Index
Purpose: Filter records by string length
Structure:
typedef struct {
RoaringBitmap **length_bitmaps; // Array indexed by length
int max_length; // Maximum observed length
} LengthIndex;
Example:
length_bitmaps[3] = {0, 1, 2} // Records 0, 1, 2 have length 3
length_bitmaps[5] = {3, 4} // Records 3, 4 have length 5
Use Case: Pattern ___ queries length_bitmaps[3] directly
4. Primary Key Hash Table
Purpose: O(1) mapping from primary key to internal index
Implementation:
typedef struct {
char pk_str[NAMEDATALEN]; // Fixed-size key (max 64 bytes)
uint32_t idx; // Internal record index
} PKMapEntry;
HTAB *pk_to_index; // PostgreSQL hash table
Operations:
- Insert:
hash_search(HASH_ENTER)- O(1) - Lookup:
hash_search(HASH_FIND)- O(1) - Delete:
hash_search(HASH_REMOVE)- O(1)
Memory: ~96 bytes per entry (key + value + overhead)
5. Lazy Deletion System
Purpose: Defer expensive cleanup operations
Structure:
typedef struct {
RoaringBitmap *tombstones; // Bitmap of deleted indices
uint32_t *deleted_indices; // Queue for cleanup
int tombstone_count; // Current tombstone count
int tombstone_threshold; // Cleanup trigger threshold
int64 total_cleanups; // Lifetime cleanup count
int64 items_cleaned; // Total items cleaned
} LazyDeletion;
Workflow:
- Delete: Mark bit in tombstone bitmap (O(1))
- Query: Filter results using
ANDNOT tombstones(O(log n)) - Cleanup: Batch remove when threshold reached (O(n × 256))
Adaptive Threshold: Adjusts based on query/delete ratio
Memory Management
Allocation Strategy
Memory Context: Dedicated AllocSetContext for index structures
- Lifetime: Until explicit index rebuild or server restart
- Isolation: Independent of query contexts
- Cleanup: Automatic on context deletion
Size Estimation:
Total Memory = Base + Positional + Cache + Length + Hash + Strings
Base = ~1 MB (structs and metadata)
Positional = ~256 chars × P positions × 64 bytes/bitmap ≈ 16P KB
Cache = ~256 chars × 64 bytes/bitmap ≈ 16 KB
Length = L × 64 bytes/bitmap (L = max length)
Hash = N × 96 bytes (N = record count)
Strings = N × avg_length × 2 (data + PK)
Example (1M records, avg 50 chars, max 100 positions):
= 1 MB + 1.6 MB + 16 KB + 6.4 KB + 96 MB + 100 MB ≈ 200 MB
Compression Benefits: Roaring Bitmaps achieve 10-50× compression for sparse data
Garbage Collection
Automatic:
- Tombstone cleanup when threshold reached
- Free list management for slot reuse
Manual:
SELECT biscuit_cleanup(); -- Force tombstone cleanup
Memory Leak Prevention:
- All allocations tracked in index context
- Single deallocation point on rebuild
- No external references retained
Concurrency Model
Index Access: Read-optimized, single-writer model
- Queries: Concurrent read access (no locking)
- Modifications: Serialized through PostgreSQL trigger system
- Consistency: PostgreSQL’s MVCC handles transaction isolation
Limitation: High-concurrency write workloads may experience contention at trigger level (PostgreSQL limitation, not BISCUIT-specific)
Scalability: Read queries scale linearly across cores
Advanced Features
Incremental Updates
For same-length strings with <20% character changes, BISCUIT uses incremental updates:
-- Original: "hello world"
-- Updated: "hello wurld"
-- Only updates changed positions (saves ~80% of work)
Lazy Deletion with Smart Cleanup
Deletes are instant (O(1)) using tombstone bitmaps. Cleanup triggers automatically when:
- Tombstone count reaches threshold (default: 1000)
- Query/Delete ratio suggests overhead
- Manual trigger:
SELECT biscuit_cleanup();
Pattern Optimization
BISCUIT automatically detects and optimizes:
- Wildcard-only patterns:
%%%,___ - Single-character positioned:
%a%,a%,%a - Prefix patterns:
abc% - Suffix patterns:
%xyz - Exact length:
___(3 chars)
API Reference
Index Management Functions
biscuit_setup()
Complete initialization function combining index build, function creation, and trigger setup.
biscuit_setup(
p_table_name TEXT,
p_column_name TEXT,
p_pk_column_name TEXT DEFAULT 'id'
) RETURNS TEXT
Parameters:
p_table_name: Name of table to index (required)p_column_name: Column containing text data to index (required)p_pk_column_name: Primary key column name (default: ‘id’)
Returns: Status message with setup details
Example:
SELECT biscuit_setup('customers', 'email', 'customer_id');
Output:
Biscuit index built successfully.
Created biscuit_match() and biscuit_match_rows() functions for table: customers
Columns: customer_id integer, email text
Successfully created trigger on table: customers
The index will now automatically update on INSERT, UPDATE, and DELETE operations.
Errors:
- Primary key column not found
- Indexed column not found
- Insufficient permissions
- Table does not exist
biscuit_build_index()
Constructs bitmap index structure from table data.
biscuit_build_index(
table_name TEXT,
column_name TEXT,
pk_column_name TEXT DEFAULT 'id'
) RETURNS BOOLEAN
Parameters:
table_name: Target table namecolumn_name: Column to indexpk_column_name: Primary key column (default: ‘id’)
Returns: TRUE on success
Behavior:
- Drops existing index if present
- Scans entire table sequentially
- Builds all bitmap structures
- Allocates shared memory context
Performance: O(n × L), n=rows, L=avg string length
Example:
SELECT biscuit_build_index('products', 'name', 'id');
biscuit_create_match_function()
Generates strongly-typed wrapper functions for queries.
biscuit_create_match_function() RETURNS TEXT
Returns: Confirmation message with function signatures
Generated Functions:
biscuit_match(pattern TEXT) RETURNS SETOF table_typebiscuit_match_rows(pattern TEXT) RETURNS TABLE(pk type, value TEXT)
Prerequisite: Index must be built first
Example:
SELECT biscuit_create_match_function();
Output:
Created biscuit_match() and biscuit_match_rows() functions for table: products
Columns: id integer, name text, category text, price numeric
biscuit_enable_triggers()
Activates automatic index maintenance triggers.
biscuit_enable_triggers() RETURNS TEXT
Returns: Status message
Behavior:
- Creates AFTER INSERT OR UPDATE OR DELETE trigger
- Trigger name:
biscuit_auto_update - Fires FOR EACH ROW
- Calls
biscuit_trigger()function
Example:
SELECT biscuit_enable_triggers();
biscuit_disable_triggers()
Deactivates automatic index maintenance.
biscuit_disable_triggers() RETURNS TEXT
Returns: Status message
Use Cases:
- Bulk data operations
- Table maintenance
- Performance testing
- Data migration
Example:
SELECT biscuit_disable_triggers();
-- Perform bulk operations
SELECT biscuit_enable_triggers();
Query Functions
biscuit_match()
Primary query interface returning complete table rows.
biscuit_match(pattern TEXT) RETURNS SETOF table_type
Parameters:
pattern: Wildcard pattern (SQL LIKE syntax)
Returns: Complete rows matching pattern
Characteristics:
- Strongly-typed return
- Full SQL compatibility
- Automatic tombstone filtering
- Optimal for production use
Example:
SELECT * FROM biscuit_match('%@example.com');
SELECT id, name FROM biscuit_match('Product%') WHERE price > 100;
biscuit_match_rows()
Alternative query interface returning key-value pairs.
biscuit_match_rows(pattern TEXT) RETURNS TABLE(pk primary_key_type, value TEXT)
Parameters:
pattern: Wildcard pattern
Returns: Table of (primary_key, indexed_value) pairs
Example:
SELECT * FROM biscuit_match_rows('%error%');
biscuit_match_count()
Efficient count-only query.
biscuit_match_count(pattern TEXT) RETURNS INTEGER
Parameters:
pattern: Wildcard pattern
Returns: Count of matching records
Performance: Fastest query method (no tuple materialization)
Example:
SELECT biscuit_match_count('%gmail.com%');
biscuit_match_keys()
Low-level interface returning primary keys and values as text.
biscuit_match_keys(pattern TEXT) RETURNS TABLE(pk TEXT, value TEXT)
Parameters:
pattern: Wildcard pattern
Returns: Table of (pk_text, value_text) pairs
Note: Used internally by higher-level functions
Example:
SELECT pk::INTEGER, value FROM biscuit_match_keys('%pattern%');
Status and Diagnostic Functions
biscuit_index_status()
Comprehensive index status report.
biscuit_index_status() RETURNS TEXT
Returns: Formatted status report including:
- Table and column information
- Record counts (active, total, free, tombstoned)
- Memory usage
- CRUD statistics
- Lazy deletion metrics
- Optimization indicators
Example:
SELECT biscuit_index_status();
Sample Output:
========================================
Biscuit Index v3 - FIXED
========================================
Table: products
Column: name
Primary Key: id
Active Records: 998567
Total Slots: 1000000
Free Slots: 1433
Tombstoned Slots: 856
Max length: 128
Memory: 18.3 MB
----------------------------------------
CRUD Statistics:
Inserts: 1000000
Deletes: 1433
Updates: 125678 (incr: 89234, 71.0%)
Queries: 45892
----------------------------------------
Lazy Deletion Status:
Pending tombstones: 856 (59.7% of deletes)
Cleanup threshold: 1000
Total cleanups: 1
Items cleaned: 577
Query/Delete ratio: 32.02
----------------------------------------
biscuit_get_active_count()
Returns active record count.
biscuit_get_active_count() RETURNS INTEGER
Returns: Count of non-deleted records
Example:
SELECT biscuit_get_active_count();
-- Output: 998567
biscuit_get_free_slots()
Returns available slot count for reuse.
biscuit_get_free_slots() RETURNS INTEGER
Returns: Number of slots available for new inserts
Example:
SELECT biscuit_get_free_slots();
-- Output: 1433
biscuit_get_tombstone_count()
Returns pending tombstone count.
biscuit_get_tombstone_count() RETURNS INTEGER
Returns: Number of deleted records awaiting cleanup
Use Case: Monitor cleanup necessity
Example:
SELECT biscuit_get_tombstone_count();
-- Output: 856
biscuit_version()
Returns extension version.
biscuit_version() RETURNS TEXT
Returns: Version string
Example:
SELECT biscuit_version();
-- Output: 1.0.0-Biscuit
Maintenance Functions
biscuit_cleanup()
Manual tombstone cleanup.
biscuit_cleanup() RETURNS TEXT
Returns: Cleanup summary
Behavior:
- Removes tombstoned indices from all bitmaps
- Frees associated memory
- Resets tombstone counter
- Updates cleanup statistics
Performance: O(n × 256), n=tombstone_count
Example:
SELECT biscuit_cleanup();
Output:
Tombstone cleanup complete:
Cleaned: 856 tombstones
Remaining: 0
Total cleanups: 2
When to Use:
- Before performance-critical operations
- After bulk deletions
- When tombstone count exceeds threshold
- During maintenance windows
Internal Functions (Advanced)
biscuit_trigger()
Internal trigger function (do not call directly).
biscuit_trigger() RETURNS TRIGGER
Purpose: Maintains index consistency on DML operations
Called By: PostgreSQL trigger system
Handles:
- INSERT: Adds new records to index
- UPDATE: Applies incremental or full updates
- DELETE: Marks tombstones
biscuit_match_tuples()
Generic tuple retrieval (legacy).
biscuit_match_tuples(pattern TEXT) RETURNS SETOF RECORD
Note: Use biscuit_match() instead for type-safe queries
Function Error Handling
All functions implement comprehensive error handling:
Common Errors:
ERROR: Index not built- Callbiscuit_build_index()firstERROR: NULL primary key- Primary key cannot be NULLERROR: Column not found- Verify table/column namesERROR: Insufficient permissions- Requires table owner or superuser
Example Error:
SELECT * FROM biscuit_match('%pattern%');
-- ERROR: Index not built. Call biscuit_build_index() first.
Return Type Specifications
| Function | Return Type | Nullable | Notes | |–––––|———––|–––––|—––| | biscuit_match() | SETOF table | No | Type matches indexed table | | biscuit_match_rows() | TABLE(pk, TEXT) | No | PK type preserved | | biscuit_match_count() | INTEGER | No | Always returns count ≥ 0 | | biscuit_match_keys() | TABLE(TEXT, TEXT) | No | Both fields as text | | biscuit_index_status() | TEXT | No | Always returns report | | biscuit_get_*_count() | INTEGER | No | Returns 0 if no index | | biscuit_version() | TEXT | No | Version string |
Monitoring and Diagnostics
Key Performance Indicators
1. Index Health Metrics
Active Record Count
SELECT biscuit_get_active_count();
Interpretation:
- Should match table row count (excluding deleted)
- Discrepancy indicates synchronization issues
- Monitor trend for capacity planning
Free Slot Count
SELECT biscuit_get_free_slots();
Interpretation:
- Available slots for insert reuse
- High count after bulk deletes is normal
- Prevents capacity expansion overhead
Tombstone Count
SELECT biscuit_get_tombstone_count();
Interpretation:
- Pending deleted records
- Critical Threshold: >1000 (default cleanup trigger)
- Warning Threshold: >5000 (performance degradation)
- Action Required: >10000 (manual cleanup recommended)
2. Operational Statistics
Comprehensive Status
SELECT biscuit_index_status();
Key Sections:
A. Capacity Metrics
Active Records: 998567 ← Currently queryable records
Total Slots: 1000000 ← Allocated capacity
Free Slots: 1433 ← Available for reuse
Tombstoned Slots: 856 ← Pending cleanup
B. CRUD Performance
Inserts: 1000000
Deletes: 1433
Updates: 125678 (incr: 89234, 71.0%) ← Incremental update rate
Queries: 45892
Incremental Update Rate:
- >70%: Excellent (similar updates)
- 50-70%: Good (moderate similarity)
- <50%: Normal (diverse updates)
- <20%: Review update patterns
C. Lazy Deletion Health
Pending tombstones: 856 (59.7% of deletes)
Cleanup threshold: 1000
Total cleanups: 1
Items cleaned: 577
Query/Delete ratio: 32.02
Query/Delete Ratio:
- >20: Healthy (reads dominate, lazy deletion beneficial)
- 10-20: Balanced (cleanup overhead acceptable)
- <10: Write-heavy (consider more aggressive cleanup)
3. Memory Utilization
SELECT
(regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+) MB'))[1]::NUMERIC AS memory_mb;
Guidelines:
Memory Growth = f(record_count, avg_length, char_diversity)
Expected Ranges:
100K records: 2-5 MB
1M records: 15-25 MB
10M records: 150-250 MB
100M records: 1.5-2.5 GB
Anomalies:
>30% of expected: Check for memory leaks or fragmentation
<50% of expected: Excellent compression (sparse data)
Monitoring Queries
Real-Time Health Check
-- Comprehensive health dashboard
WITH metrics AS (
SELECT
biscuit_get_active_count() AS active,
biscuit_get_free_slots() AS free,
biscuit_get_tombstone_count() AS tombstones,
(regexp_matches(biscuit_index_status(), 'Total Slots: ([0-9]+)'))[1]::INTEGER AS capacity,
(regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC AS memory_mb
)
SELECT
active,
free,
tombstones,
capacity,
memory_mb,
ROUND(100.0 * active / capacity, 2) AS utilization_pct,
ROUND(100.0 * tombstones / NULLIF(capacity - active, 0), 2) AS tombstone_ratio,
CASE
WHEN tombstones > 10000 THEN 'CRITICAL - Cleanup Required'
WHEN tombstones > 5000 THEN 'WARNING - Monitor Closely'
WHEN tombstones > 1000 THEN 'NOTICE - Cleanup Soon'
ELSE 'HEALTHY'
END AS health_status
FROM metrics;
Sample Output:
active | free | tombstones | capacity | memory_mb | utilization_pct | tombstone_ratio | health_status
---------+------+------------+----------+-----------+-----------------+-----------------+---------------
998567 | 1433 | 856 | 1000000 | 18.30 | 99.86 | 59.70 | HEALTHY
Performance Degradation Detection
-- Detect query performance issues
WITH baseline AS (
SELECT 2.1 AS expected_ms -- Baseline query time
),
current AS (
SELECT
biscuit_get_tombstone_count() AS tombstones,
-- Simulate query time (in production, measure actual queries)
2.1 + (biscuit_get_tombstone_count()::NUMERIC / 1000) * 0.2 AS estimated_ms
)
SELECT
tombstones,
estimated_ms,
ROUND(((estimated_ms - expected_ms) / expected_ms) * 100, 2) AS degradation_pct,
CASE
WHEN estimated_ms > expected_ms * 2 THEN 'ACTION REQUIRED'
WHEN estimated_ms > expected_ms * 1.5 THEN 'DEGRADED'
WHEN estimated_ms > expected_ms * 1.2 THEN 'ELEVATED'
ELSE 'NORMAL'
END AS performance_status
FROM current, baseline;
Capacity Planning
-- Project capacity needs
WITH stats AS (
SELECT
biscuit_get_active_count() AS current_active,
(regexp_matches(biscuit_index_status(), 'Total Slots: ([0-9]+)'))[1]::INTEGER AS capacity,
(regexp_matches(biscuit_index_status(), 'Inserts: ([0-9]+)'))[1]::BIGINT AS total_inserts,
(regexp_matches(biscuit_index_status(), 'Deletes: ([0-9]+)'))[1]::BIGINT AS total_deletes
),
growth AS (
SELECT
current_active,
capacity,
total_inserts - total_deletes AS net_growth,
CASE
WHEN total_deletes > 0 THEN total_inserts::NUMERIC / total_deletes
ELSE NULL
END AS insert_delete_ratio
FROM stats
)
SELECT
current_active,
capacity,
capacity - current_active AS available,
ROUND(100.0 * current_active / capacity, 2) AS utilization_pct,
net_growth,
insert_delete_ratio,
CASE
WHEN current_active::NUMERIC / capacity > 0.9 THEN 'EXPAND SOON'
WHEN current_active::NUMERIC / capacity > 0.8 THEN 'MONITOR'
ELSE 'ADEQUATE'
END AS capacity_status
FROM growth;
Alerting Thresholds
Critical Alerts
-- Immediate action required
DO $
DECLARE
tombstone_count INTEGER;
active_count INTEGER;
capacity INTEGER;
BEGIN
tombstone_count := biscuit_get_tombstone_count();
active_count := biscuit_get_active_count();
capacity := (regexp_matches(biscuit_index_status(), 'Total Slots: ([0-9]+)'))[1]::INTEGER;
IF tombstone_count > 10000 THEN
RAISE WARNING 'CRITICAL: Tombstone count % exceeds 10,000. Run biscuit_cleanup() immediately.', tombstone_count;
END IF;
IF active_count::NUMERIC / capacity > 0.95 THEN
RAISE WARNING 'CRITICAL: Index utilization %.2f%% exceeds 95%%. Capacity expansion imminent.',
100.0 * active_count / capacity;
END IF;
END $;
Warning Alerts
-- Monitor closely, plan maintenance
DO $
DECLARE
tombstone_count INTEGER;
memory_mb NUMERIC;
BEGIN
tombstone_count := biscuit_get_tombstone_count();
memory_mb := (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC;
IF tombstone_count BETWEEN 5000 AND 10000 THEN
RAISE NOTICE 'WARNING: Tombstone count % in elevated range. Schedule cleanup.', tombstone_count;
END IF;
-- Assuming 1M records should use ~20MB
IF memory_mb > 30 THEN
RAISE NOTICE 'WARNING: Memory usage %.2f MB higher than expected. Check for leaks.', memory_mb;
END IF;
END $;
Diagnostic Procedures
Issue: Slow Query Performance
Diagnosis:
-- Check tombstone overhead
SELECT biscuit_get_tombstone_count();
-- Check memory usage
SELECT (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC;
-- Review pattern complexity
EXPLAIN ANALYZE SELECT * FROM biscuit_match('%complex%pattern%');
Resolution:
-- If tombstones > 5000
SELECT biscuit_cleanup();
-- If memory excessive
SELECT biscuit_build_index('table', 'column', 'pk'); -- Rebuild
Issue: Index Out of Sync
Diagnosis:
-- Compare counts
SELECT
COUNT(*) AS table_count,
biscuit_get_active_count() AS index_count
FROM your_table;
-- Check trigger status
SELECT tgname, tgenabled
FROM pg_trigger
WHERE tgname = 'biscuit_auto_update';
Resolution:
-- Rebuild index
SELECT biscuit_build_index('table', 'column', 'pk');
SELECT biscuit_enable_triggers();
Issue: High Memory Usage
Diagnosis:
-- Detailed memory breakdown
SELECT biscuit_index_status();
-- Check for long strings
SELECT MAX(LENGTH(column)), AVG(LENGTH(column)) FROM table;
Resolution:
-- Rebuild to compact
SELECT biscuit_build_index('table', 'column', 'pk');
-- Consider limiting indexed length (future enhancement)
Logging and Audit Trail
Enable PostgreSQL Logging
-- Log slow queries
SET log_min_duration_statement = 100; -- Log queries >100ms
-- Log BISCUIT operations
SET client_min_messages = NOTICE;
-- View logs
SELECT * FROM pg_stat_statements
WHERE query LIKE '%biscuit%'
ORDER BY mean_exec_time DESC;
Custom Monitoring Table
-- Create monitoring log
CREATE TABLE biscuit_metrics (
logged_at TIMESTAMPTZ DEFAULT NOW(),
active_count INTEGER,
free_slots INTEGER,
tombstone_count INTEGER,
memory_mb NUMERIC,
query_count BIGINT,
insert_count BIGINT,
update_count BIGINT,
delete_count BIGINT
);
-- Periodic logging (via cron or pg_cron)
INSERT INTO biscuit_metrics
SELECT
NOW(),
biscuit_get_active_count(),
biscuit_get_free_slots(),
biscuit_get_tombstone_count(),
(regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC,
(regexp_matches(biscuit_index_status(), 'Queries: ([0-9]+)'))[1]::BIGINT,
(regexp_matches(biscuit_index_status(), 'Inserts: ([0-9]+)'))[1]::BIGINT,
(regexp_matches(biscuit_index_status(), 'Updates: ([0-9]+)'))[1]::BIGINT,
(regexp_matches(biscuit_index_status(), 'Deletes: ([0-9]+)'))[1]::BIGINT;
-- Analyze trends
SELECT
DATE_TRUNC('hour', logged_at) AS hour,
AVG(tombstone_count) AS avg_tombstones,
MAX(tombstone_count) AS max_tombstones,
AVG(memory_mb) AS avg_memory
FROM biscuit_metrics
WHERE logged_at > NOW() - INTERVAL '7 days'
GROUP BY 1
ORDER BY 1 DESC;
Optimization Guidelines
Query Optimization
1. Pattern Design Best Practices
Optimal Patterns:
-- ✓ GOOD: Specific prefix
SELECT * FROM biscuit_match('user_admin%');
-- ✓ GOOD: Specific suffix
SELECT * FROM biscuit_match('%@company.com');
-- ✓ GOOD: Single containment
SELECT * FROM biscuit_match('%error%');
-- ⚠ ACCEPTABLE: Two-part pattern
SELECT * FROM biscuit_match('%user%admin%');
-- ⚠ SLOWER: Three-part pattern
SELECT * FROM biscuit_match('%a%b%c%');
-- ✗ AVOID: Excessive parts (4+)
SELECT * FROM biscuit_match('%a%b%c%d%e%');
Pattern Complexity Impact:
Parts Query Time Recommendation
----- ---------- --------------
1 ~2ms Optimal
2 ~5ms Excellent
3 ~12ms Good
4 ~25ms Acceptable
5+ ~50ms+ Avoid if possible
2. Query Structure Optimization
Use Count for Existence Checks:
-- ✗ AVOID: Materializing entire result set
SELECT COUNT(*) FROM biscuit_match('%pattern%');
-- ✓ BETTER: Direct count
SELECT biscuit_match_count('%pattern%');
Apply Filters After Pattern Match:
-- ✓ OPTIMAL: Pattern match first, then filter
SELECT * FROM biscuit_match('%@gmail.com%')
WHERE created_at > '2024-01-01'
AND status = 'active';
-- ✗ SUBOPTIMAL: Complex joins before pattern match
SELECT *
FROM complex_view v
WHERE v.email LIKE '%@gmail.com%'; -- Falls back to sequential scan
Leverage Result Reuse:
-- ✓ OPTIMAL: Use CTE for multiple operations on same pattern
WITH matches AS (
SELECT * FROM biscuit_match('%search%')
)
SELECT
(SELECT COUNT(*) FROM matches) AS total,
(SELECT COUNT(*) FROM matches WHERE category = 'A') AS category_a,
(SELECT AVG(price) FROM matches) AS avg_price;
3. Index Selectivity Considerations
High Selectivity (Few Matches):
-- Excellent: ~0.1% selectivity
SELECT * FROM biscuit_match('unique_prefix_%');
Low Selectivity (Many Matches):
-- Consider adding filters or reformulating query
SELECT * FROM biscuit_match('%') -- Returns everything
WHERE additional_criteria;
Selectivity Guidelines:
Match % Query Performance Recommendation
------- ----------------- --------------
<1% Excellent Ideal use case
1-10% Very Good Well-suited
10-30% Good Acceptable
30-50% Fair Add filters
>50% Poor Reconsider approach
CRUD Optimization
1. Bulk Insert Strategy
Small Batches (<1000 rows):
-- Keep triggers enabled
INSERT INTO products (id, name) VALUES
(1, 'Product 1'),
(2, 'Product 2'),
...
(1000, 'Product 1000');
-- Index updates happen automatically
Large Batches (>10,000 rows):
-- Disable triggers, bulk insert, rebuild
SELECT biscuit_disable_triggers();
COPY products FROM '/data/products.csv' CSV;
-- or
INSERT INTO products SELECT * FROM staging_table;
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
Performance Comparison:
10,000 records:
With triggers: ~2 seconds
Disable/rebuild: ~1 second (50% faster)
100,000 records:
With triggers: ~20 seconds
Disable/rebuild: ~8 seconds (60% faster)
1,000,000 records:
With triggers: ~220 seconds
Disable/rebuild: ~85 seconds (61% faster)
2. Update Optimization
Maximize Incremental Updates:
-- ✓ GOOD: Same-length updates (incremental)
UPDATE users SET email = 'newemail@example.com' -- 21 chars
WHERE email = 'oldemail@example.com'; -- 21 chars
-- ⚠ ACCEPTABLE: Different-length (full reindex)
UPDATE users SET email = 'verylongemail@example.com' -- 27 chars
WHERE email = 'short@ex.co'; -- 11 chars
Batch Updates:
-- For large update batches
SELECT biscuit_disable_triggers();
UPDATE products SET name = UPPER(name);
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
3. Delete Strategy
Individual Deletes:
-- Lazy deletion is O(1), no optimization needed
DELETE FROM products WHERE id = 123;
Bulk Deletes:
-- Option 1: Let lazy deletion handle (recommended for <10K deletes)
DELETE FROM products WHERE category = 'obsolete';
-- Option 2: Disable/rebuild (for >100K deletes)
SELECT biscuit_disable_triggers();
DELETE FROM products WHERE created_at < '2020-01-01';
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
Cleanup Scheduling:
-- Schedule periodic cleanup during low-traffic hours
-- Example: Every night at 2 AM via pg_cron
SELECT cron.schedule('biscuit-cleanup', '0 2 * * *',
'SELECT biscuit_cleanup()');
Memory Optimization
1. Monitor Memory Growth
-- Track memory over time
SELECT
NOW() AS checked_at,
biscuit_get_active_count() AS records,
(regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC AS mb,
ROUND((regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC
/ NULLIF(biscuit_get_active_count(), 0) * 1024, 2) AS kb_per_record;
2. Memory Reduction Strategies
Rebuild Compaction:
-- Periodic rebuild to compact fragmentation
SELECT biscuit_build_index('table', 'column', 'pk');
Character Diversity Reduction (Application Level):
-- Normalize data before indexing
ALTER TABLE users ADD COLUMN email_normalized TEXT;
UPDATE users SET email_normalized = LOWER(email);
SELECT biscuit_setup('users', 'email_normalized', 'id');
Maintenance Optimization
1. Tombstone Management
Adaptive Threshold Tuning:
-- Current threshold: 1000 (default)
-- Adjust based on workload:
-- Read-heavy (Q/D ratio > 20): Increase threshold
-- Write-heavy (Q/D ratio < 10): Decrease threshold
-- Balanced: Keep default
-- Manual threshold adjustment (via C code recompilation)
#define TOMBSTONE_CLEANUP_THRESHOLD 5000 // Adjust value
Proactive Cleanup:
-- Before performance-critical operations
SELECT biscuit_cleanup();
-- Before maintenance windows
SELECT biscuit_cleanup();
2. Index Rebuild Strategy
When to Rebuild:
- After bulk operations (>10% of table size)
- Memory usage exceeds expectations by >50%
- After schema changes
- Periodic maintenance (monthly/quarterly)
Rebuild Procedure:
BEGIN;
SELECT biscuit_disable_triggers();
-- Perform any table maintenance
SELECT biscuit_build_index('table', 'column', 'pk');
SELECT biscuit_enable_triggers();
COMMIT;
Concurrency Optimization
1. Read Concurrency
Queries are lock-free:
-- Multiple concurrent queries: No contention
SELECT * FROM biscuit_match('%pattern1%'); -- Session 1
SELECT * FROM biscuit_match('%pattern2%'); -- Session 2
SELECT * FROM biscuit_match('%pattern3%'); -- Session 3
-- All execute concurrently without blocking
2. Write Concurrency
Minimize trigger overhead:
-- ✓ OPTIMAL: Batch small inserts
INSERT INTO products SELECT * FROM new_products_batch;
-- ⚠ SUBOPTIMAL: Individual inserts in loop
-- Each insert triggers index update
Transaction Batching:
-- Group related operations
BEGIN;
INSERT INTO products VALUES (...);
UPDATE products SET name = ... WHERE id = ...;
DELETE FROM products WHERE id = ...;
COMMIT;
-- All index updates happen at commit
Performance Testing
Benchmark Query Performance
-- Timing template
\timing on
SELECT * FROM biscuit_match('%pattern%');
\timing off
-- With EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM biscuit_match('%pattern%');
Compare with Traditional Methods
-- BISCUIT
\timing on
SELECT COUNT(*) FROM biscuit_match('%search%');
\timing off
-- Traditional LIKE
\timing on
SELECT COUNT(*) FROM table WHERE column LIKE '%search%';
\timing off
-- GIN Index
\timing on
SELECT COUNT(*) FROM table WHERE column ILIKE '%search%';
\timing off
-- ✗ NOT SUPPORTED
SELECT * FROM biscuit_match('[0-9]+');
SELECT * FROM biscuit_match('(abc|xyz)');
-- ✓ EQUIVALENT WORKAROUNDS
SELECT * FROM biscuit_match('abc%')
UNION
SELECT * FROM biscuit_match('xyz%');
Architectural Constraints
1. Memory Residency
Index Location: Shared memory (non-persistent)
Implications:
- Index lost on server restart
- Must rebuild after restart
- No automatic persistence
Startup Procedure:
-- Add to database initialization script
\c your_database
SELECT biscuit_setup('table1', 'column1', 'pk1');
SELECT biscuit_setup('table2', 'column2', 'pk2');
2. Single Index Instance
Limitation: Only one BISCUIT index per PostgreSQL instance
Implications:
-- ✗ Cannot maintain multiple indexes simultaneously
SELECT biscuit_setup('table1', 'col1', 'id');
SELECT biscuit_setup('table2', 'col2', 'id'); -- Replaces table1 index
-- ✓ Workaround: Separate databases
CREATE DATABASE db1;
\c db1
SELECT biscuit_setup('table', 'column', 'id');
CREATE DATABASE db2;
\c db2
SELECT biscuit_setup('table', 'column', 'id');
3. Transaction Isolation
Current Behavior: Index updates commit with transaction
Limitation: No snapshot isolation for queries
BEGIN;
INSERT INTO products VALUES (1, 'New Product');
-- Index updated in current transaction
SELECT * FROM biscuit_match('%New%'); -- Sees new product
ROLLBACK;
-- Index reverted (product removed)
SELECT * FROM biscuit_match('%New%'); -- No longer sees product
Note: This is expected behavior and consistent with PostgreSQL’s MVCC model at the trigger level.
4. Single Column Per Index
Current Design: One index per table, one column per index
-- ✗ Cannot index multiple columns simultaneously
-- Must choose one column
-- ✓ Workaround: Concatenated column
ALTER TABLE products ADD COLUMN search_text TEXT
GENERATED ALWAYS AS (name || ' ' || description || ' ' || sku) STORED;
SELECT biscuit_setup('products', 'search_text', 'id');
5. NULL Handling
Behavior: NULL values treated as empty strings
INSERT INTO products (id, name) VALUES (1, NULL);
-- NULL treated as ''
SELECT * FROM biscuit_match('%'); -- Includes row 1
SELECT * FROM biscuit_match('%%'); -- Includes row 1
SELECT biscuit_match_count(''); -- Counts row 1 if name is NULL
Performance Limitations
1. Pattern Complexity Scaling
Complexity: O(k × m), k = pattern parts, m = search window
-- Linear growth in query time with pattern parts
SELECT biscuit_match('%a%'); -- ~2ms
SELECT biscuit_match('%a%b%'); -- ~5ms
SELECT biscuit_match('%a%b%c%'); -- ~12ms
SELECT biscuit_match('%a%b%c%d%'); -- ~25ms
SELECT biscuit_match('%a%b%c%d%e%'); -- ~52ms
Recommendation: Limit patterns to 3-4 parts for optimal performance
2. Write Concurrency
Limitation: Trigger-based updates serialize at row level
Impact:
High-concurrency writes (>10K/sec):
May experience contention
Trigger overhead accumulates
Solution:
Batch writes
Disable triggers for bulk operations
Consider application-level queue
3. Memory Growth
Behavior: Memory grows with:
- Record count
- String length diversity
- Character diversity
- Tombstone accumulation
-- Memory bounds
Minimum: 10% of indexed data size
Maximum: 50% of indexed data size
Typical: 20-25% of indexed data size
Mitigation:
-- Periodic rebuild for compaction
SELECT biscuit_build_index('table', 'column', 'pk');
-- Cleanup tombstones
SELECT biscuit_cleanup();
Compatibility Constraints
1. PostgreSQL Version
Minimum: PostgreSQL 11.0
Tested Versions:
- PostgreSQL 11.x: ✓ Supported
- PostgreSQL 12.x: ✓ Supported
- PostgreSQL 13.x: ✓ Supported
- PostgreSQL 14.x: ✓ Supported
- PostgreSQL 15.x: ✓ Supported
- PostgreSQL 16.x: ✓ Supported (expected)
2. Data Type Support
Primary Key Types:
- Fully Supported: INT2, INT4, INT8, TEXT, VARCHAR, CHAR, UUID
- Generic Support: All types with output functions
- Not Supported: Composite types, arrays (without custom casting)
Indexed Column Types:
- Required: TEXT, VARCHAR, CHAR, or castable to TEXT
- Not Supported: Binary types (BYTEA), JSON, arrays without casting
3. Extension Dependencies
Required:
- PostgreSQL server-side C API
- Standard PostgreSQL libraries
Optional:
- CRoaring library (for optimized bitmaps)
- Falls back to built-in implementation if unavailable
Security Considerations
1. Permissions
Required Privileges:
-- Index creation requires
GRANT CREATE ON DATABASE TO user; -- For extension
GRANT ALL ON TABLE target_table TO user; -- For triggers
-- Query execution requires
GRANT SELECT ON TABLE target_table TO user;
2. SQL Injection
Safe Query Patterns:
-- ✓ SAFE: Parameterized queries
PREPARE search_stmt AS
SELECT * FROM biscuit_match($1);
EXECUTE search_stmt('%user_input%');
-- ✓ SAFE: Application sanitization
pattern = sanitize_input(user_input)
query = f"SELECT * FROM biscuit_match('{pattern}')"
Unsafe Patterns:
-- ✗ UNSAFE: Direct string concatenation
SELECT * FROM biscuit_match('%' || user_input || '%');
3. Resource Exhaustion
Potential Vectors:
- Complex patterns (>5 parts): High CPU usage
- Very long patterns: Memory allocation
- Rapid index rebuilds: Memory fragmentation
Mitigation:
-- Rate limit index rebuilds
-- Validate pattern complexity before execution
-- Set statement timeout
SET statement_timeout = '5s';
Known Issues
1. Index Persistence
Issue: Index not persistent across restarts
Status: Design limitation (in-memory structure)
Workaround: Rebuild on startup via initialization script
2. Multiple Index Support
Issue: Only one index per PostgreSQL instance
Status: Architectural limitation
Workaround: Use multiple databases or implement queuing
3. Incremental Update Threshold
Issue: Updates with >20% character changes trigger full reindex
Status: Configurable threshold (requires C code modification)
Workaround: Design application to minimize large updates
Future Enhancements (Roadmap)
Planned:
- Persistent index storage
- Multiple concurrent indexes
- Case-insensitive mode option
- Configurable position limit (>256 chars)
- Pattern query optimizer
- Index statistics collection
Under Consideration:
- Fuzzy matching support
- Phonetic matching
- Unicode normalization
- Multi-column composite indexes
- Parallel query execution
- Incremental checkpoint/restore
Troubleshooting
Common Issues and Solutions
Issue 1: “Index not built” Error
Symptom:
SELECT * FROM biscuit_match('%pattern%');
ERROR: Index not built. Call biscuit_build_index() first.
Cause: Index not initialized or server restarted
Solution:
-- Option 1: Quick setup
SELECT biscuit_setup('table_name', 'column_name', 'pk_column');
-- Option 2: Manual build
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
SELECT biscuit_create_match_function();
SELECT biscuit_enable_triggers();
Prevention:
-- Add to database startup script
-- File: /etc/postgresql/14/main/init_biscuit.sql
\c your_database
SELECT biscuit_setup('table1', 'column1', 'pk1');
Issue 2: Query Returns No Results
Symptom:
SELECT biscuit_match_count('%known_value%');
-- Returns: 0 (expected >0)
Diagnosis:
-- Check if value exists in table
SELECT COUNT(*) FROM table_name WHERE column_name LIKE '%known_value%';
-- Check index status
SELECT biscuit_index_status();
-- Check trigger status
SELECT tgname, tgenabled FROM pg_trigger WHERE tgname = 'biscuit_auto_update';
Possible Causes and Solutions:
A. Index Out of Sync
-- Rebuild index
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
B. Case Sensitivity
-- Pattern is case-sensitive
SELECT * FROM biscuit_match('%VALUE%'); -- Won't match "value"
-- Solution: Normalize
SELECT * FROM biscuit_match(LOWER('%VALUE%'));
C. Pattern Beyond 256 Characters
-- Match occurs after position 256
SELECT LENGTH(column_name) FROM table_name WHERE id = 123;
-- Returns: 280
-- Solution: Pattern must be within first 256 chars
Issue 3: Slow Query Performance
Symptom:
\timing on
SELECT * FROM biscuit_match('%pattern%');
Time: 850.234 ms
-- Expected: <10ms
Diagnosis:
-- Check tombstone count
SELECT biscuit_get_tombstone_count();
-- If >5000: Cleanup needed
-- Check pattern complexity
-- Count % symbols
SELECT regexp_count('%a%b%c%d%', '%');
-- If >4: Pattern too complex
-- Check memory usage
SELECT biscuit_index_status();
Solutions:
A. High Tombstone Count
SELECT biscuit_cleanup();
-- Should restore normal performance
B. Complex Pattern
-- Simplify pattern
-- From: SELECT * FROM biscuit_match('%a%b%c%d%e%');
-- To: SELECT * FROM biscuit_match('%abc%de%');
C. Large Result Set
-- Check selectivity
SELECT biscuit_match_count('%pattern%');
-- If >50% of table: Not ideal for index
-- Solution: Add additional filters
SELECT * FROM biscuit_match('%pattern%')
WHERE created_at > '2024-01-01';
Issue 4: High Memory Usage
Symptom:
SELECT biscuit_index_status();
-- Memory: 500 MB (expected ~20 MB for dataset)
Diagnosis:
-- Check record count
SELECT biscuit_get_active_count();
-- Check string characteristics
SELECT
COUNT(*) AS records,
AVG(LENGTH(column_name)) AS avg_length,
MAX(LENGTH(column_name)) AS max_length,
COUNT(DISTINCT column_name) AS unique_values
FROM table_name;
-- Check tombstone accumulation
SELECT biscuit_get_tombstone_count();
Solutions:
A. Tombstone Accumulation
SELECT biscuit_cleanup();
-- Then check memory again
B. Memory Fragmentation
-- Rebuild to compact
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
C. Unexpected Data Characteristics
-- Very long strings or high diversity
-- Consider normalizing data
UPDATE table_name SET column_name = LEFT(column_name, 200);
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
Issue 5: Trigger Not Firing
Symptom:
INSERT INTO table_name (id, column_name) VALUES (999, 'test');
SELECT * FROM biscuit_match('%test%');
-- Returns: no rows (expected 1 row)
Diagnosis:
-- Check trigger existence
SELECT tgname, tgenabled, tgisinternal
FROM pg_trigger
WHERE tgrelid = 'table_name'::regclass
AND tgname = 'biscuit_auto_update';
-- Check for errors in logs
SHOW log_destination;
-- Review PostgreSQL logs for trigger errors
Solutions:
A. Trigger Disabled
SELECT biscuit_enable_triggers();
B. Trigger Missing
SELECT biscuit_setup('table_name', 'column_name', 'pk_column');
-- Or manually
SELECT biscuit_enable_triggers();
C. Trigger Error
-- Check PostgreSQL logs
-- Common causes:
-- - NULL primary key
-- - Type conversion error
-- - Memory allocation failure
-- Rebuild to reset
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
SELECT biscuit_enable_triggers();
Issue 6: “NULL primary key” Error
Symptom:
INSERT INTO table_name (column_name) VALUES ('test');
ERROR: NULL primary key in INSERT
Cause: Primary key column contains NULL
Solution:
A. Use Serial/Identity Column
ALTER TABLE table_name ALTER COLUMN id ADD GENERATED ALWAYS AS IDENTITY;
B. Set Default Value
ALTER TABLE table_name ALTER COLUMN id SET DEFAULT nextval('table_name_id_seq');
C. Ensure Non-NULL Constraint
ALTER TABLE table_name ALTER COLUMN id SET NOT NULL;
Issue 7: Out of Memory
Symptom:
SELECT biscuit_build_index('large_table', 'text_column', 'id');
ERROR: out of memory
Diagnosis:
-- Check table size
SELECT
pg_size_pretty(pg_total_relation_size('large_table')) AS table_size,
COUNT(*) AS row_count,
AVG(LENGTH(text_column)) AS avg_length
FROM large_table;
-- Check available memory
SHOW shared_buffers;
SHOW work_mem;
Solutions:
A. Increase Memory Allocation
-- In postgresql.conf
shared_buffers = 2GB -- Increase from default
work_mem = 256MB -- Increase from default
-- Restart PostgreSQL
sudo systemctl restart postgresql
B. Partition Table
-- Create partitioned table
CREATE TABLE large_table_partitioned (
id SERIAL PRIMARY KEY,
text_column TEXT,
created_at DATE
) PARTITION BY RANGE (created_at);
-- Create partitions
CREATE TABLE large_table_2023 PARTITION OF large_table_partitioned
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
-- Index each partition separately
SELECT biscuit_setup('large_table_2023', 'text_column', 'id');
C. Reduce String Diversity
-- Normalize before indexing
UPDATE large_table SET text_column = LOWER(text_column);
SELECT biscuit_build_index('large_table', 'text_column', 'id');
Issue 8: Index-Table Mismatch
Symptom:
SELECT COUNT(*) FROM table_name;
-- Returns: 100000
SELECT biscuit_get_active_count();
-- Returns: 95000 (mismatch!)
Diagnosis:
-- Check trigger status
SELECT tgenabled FROM pg_trigger WHERE tgname = 'biscuit_auto_update';
-- Check for errors
SELECT biscuit_index_status();
-- Manual comparison
WITH table_data AS (
SELECT COUNT(*) AS cnt FROM table_name
),
index_data AS (
SELECT biscuit_get_active_count() AS cnt
)
SELECT
t.cnt AS table_count,
i.cnt AS index_count,
t.cnt - i.cnt AS discrepancy
FROM table_data t, index_data i;
Solutions:
A. Rebuild Index
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
SELECT biscuit_enable_triggers();
B. Check for Concurrent Modifications
-- Ensure exclusive access during rebuild
BEGIN;
LOCK TABLE table_name IN EXCLUSIVE MODE;
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
COMMIT;
Diagnostic Queries
Comprehensive Health Check
DO $
DECLARE
table_count BIGINT;
index_count INTEGER;
tombstone_count INTEGER;
free_slots INTEGER;
memory_mb NUMERIC;
BEGIN
EXECUTE 'SELECT COUNT(*) FROM your_table' INTO table_count;
index_count := biscuit_get_active_count();
tombstone_count := biscuit_get_tombstone_count();
free_slots := biscuit_get_free_slots();
memory_mb := (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC;
RAISE NOTICE 'BISCUIT Health Check:';
RAISE NOTICE ' Table rows: %', table_count;
RAISE NOTICE ' Index active: %', index_count;
RAISE NOTICE ' Discrepancy: %', table_count - index_count;
RAISE NOTICE ' Tombstones: %', tombstone_count;
RAISE NOTICE ' Free slots: %', free_slots;
RAISE NOTICE ' Memory: % MB', memory_mb;
IF table_count != index_count THEN
RAISE WARNING 'Index out of sync! Rebuild recommended.';
END IF;
IF tombstone_count > 5000 THEN
RAISE WARNING 'High tombstone count! Cleanup recommended.';
END IF;
IF memory_mb > (index_count::NUMERIC / 1000000 * 30) THEN
RAISE WARNING 'High memory usage! Check for issues.';
END IF;
END $;
Performance Baseline
-- Establish performance baseline
\timing on
SELECT biscuit_match_count('%'); -- Baseline: all records
SELECT biscuit_match_count('%common_char%'); -- Baseline: common pattern
SELECT biscuit_match_count('prefix%'); -- Baseline: prefix
SELECT biscuit_match_count('%suffix'); -- Baseline: suffix
\timing off
-- Save baselines and compare periodically
Memory Leak Detection
-- Monitor memory over operations
SELECT (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC AS mb_before;
-- Perform operations
INSERT INTO table_name SELECT generate_series(1, 10000), 'test' || generate_series(1, 10000);
DELETE FROM table_name WHERE id > 50000;
SELECT biscuit_cleanup();
-- Check memory after
SELECT (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC AS mb_after;
-- Memory should be stable or decrease after cleanup
Error Code Reference
| Error Message | Cause | Solution |
|–––––––|—––|–––––|
| Index not built | No index initialized | Run biscuit_setup() |
| NULL primary key | PK value is NULL | Ensure PK constraint |
| Column not found | Invalid column name | Verify table schema |
| Insufficient permissions | User lacks privileges | Grant table permissions |
| Out of memory | Insufficient RAM | Increase shared_buffers |
| Table not found | Invalid table name | Check table existence |
| Index already exists | Duplicate index | Drop and rebuild |
| Trigger already exists | Duplicate trigger | Use biscuit_disable_triggers() first |
Getting Help
Before Seeking Support:
- Run comprehensive health check (above)
- Review PostgreSQL logs for errors
- Check extension version:
SELECT biscuit_version() - Verify PostgreSQL version:
SELECT version() - Document reproduction steps
Support Channels:
- GitHub Issues: [Project Repository]
- Mailing List: [Contact Information]
- Documentation: [Online Docs URL]
Information to Provide:
- Extension version
- PostgreSQL version
- Operating system
- Table schema
- Reproduction steps
- Error messages
- Output of
SELECT biscuit_index_status()
Examples and Use Cases
Example 1: E-Commerce Product Search
Scenario: Product catalog with 5M products, need fast SKU and name searches
Schema:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
sku VARCHAR(50) NOT NULL,
name TEXT NOT NULL,
description TEXT,
category VARCHAR(100),
price NUMERIC(10, 2),
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Sample data
INSERT INTO products (sku, name, category, price) VALUES
('ELEC-LAP-001', 'Dell XPS 15 Laptop', 'Electronics', 1299.99),
('ELEC-LAP-002', 'MacBook Pro 16"', 'Electronics', 2499.99),
('HOME-FURN-001', 'Modern Office Chair', 'Furniture', 349.99),
('HOME-FURN-002', 'Standing Desk', 'Furniture', 599.99);
BISCUIT Setup:
-- Index product names
SELECT biscuit_setup('products', 'name', 'id');
Query Examples:
-- Search by product name
SELECT * FROM biscuit_match('%Laptop%');
-- Case-insensitive search (with normalized column)
ALTER TABLE products ADD COLUMN name_lower TEXT
GENERATED ALWAYS AS (LOWER(name)) STORED;
SELECT biscuit_setup('products', 'name_lower', 'id');
SELECT * FROM biscuit_match('%laptop%');
-- Complex search
SELECT * FROM biscuit_match('%office%chair%')
WHERE price BETWEEN 200 AND 500;
-- Count matches
SELECT biscuit_match_count('%macbook%');
-- Top sellers matching pattern
SELECT p.*, s.sales_count
FROM biscuit_match('%laptop%') p
JOIN sales_summary s ON p.id = s.product_id
ORDER BY s.sales_count DESC
LIMIT 10;
Performance Comparison:
-- Traditional LIKE (sequential scan)
\timing on
SELECT COUNT(*) FROM products WHERE name LIKE '%Laptop%';
Time: 1850 ms
-- BISCUIT
SELECT biscuit_match_count('%Laptop%');
Time: 3.2 ms
-- Speedup: 578×
Example 2: Customer Email Domain Analysis
Scenario: Analyze customer email patterns, identify providers, find similar domains
Schema:
CREATE TABLE customers (
customer_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email TEXT NOT NULL,
first_name TEXT,
last_name TEXT,
signup_date DATE DEFAULT CURRENT_DATE,
status VARCHAR(20) DEFAULT 'active'
);
-- Sample data
INSERT INTO customers (email, first_name, last_name) VALUES
('john.doe@gmail.com', 'John', 'Doe'),
('jane.smith@yahoo.com', 'Jane', 'Smith'),
('bob.wilson@company.com', 'Bob', 'Wilson'),
('alice.brown@gmail.com', 'Alice', 'Brown');
BISCUIT Setup:
-- Note: UUID primary key fully supported
SELECT biscuit_setup('customers', 'email', 'customer_id');
Query Examples:
-- Find all Gmail users
SELECT * FROM biscuit_match('%@gmail.com');
-- Find all corporate emails (non-freemail)
SELECT *
FROM biscuit_match('%@%.com')
WHERE email NOT LIKE '%@gmail.com'
AND email NOT LIKE '%@yahoo.com'
AND email NOT LIKE '%@hotmail.com';
-- Domain distribution analysis
WITH email_domains AS (
SELECT
customer_id,
email,
SUBSTRING(email FROM '@(.*)) AS domain
FROM biscuit_match('%')
)
SELECT
domain,
COUNT(*) AS customer_count,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) AS percentage
FROM email_domains
GROUP BY domain
ORDER BY customer_count DESC
LIMIT 10;
-- Find emails with specific patterns
SELECT * FROM biscuit_match('%.doe@%'); -- All .doe emails
SELECT * FROM biscuit_match('admin%@%'); -- Starts with admin
SELECT * FROM biscuit_match('%+%@%'); -- Contains plus addressing
Business Intelligence Query:
-- Signup trends by email provider
WITH gmail_users AS (
SELECT signup_date
FROM biscuit_match('%@gmail.com')
),
yahoo_users AS (
SELECT signup_date
FROM biscuit_match('%@yahoo.com')
)
SELECT
DATE_TRUNC('month', signup_date) AS month,
'Gmail' AS provider,
COUNT(*) AS signups
FROM gmail_users
GROUP BY 1
UNION ALL
SELECT
DATE_TRUNC('month', signup_date) AS month,
'Yahoo' AS provider,
COUNT(*) AS signups
FROM yahoo_users
GROUP BY 1
ORDER BY month DESC, provider;
Example 3: Log Analysis and Error Detection
Scenario: 50M log entries, need to quickly find error patterns and anomalies
Schema:
CREATE TABLE application_logs (
log_id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ DEFAULT NOW(),
level VARCHAR(10),
message TEXT,
service VARCHAR(50),
user_id INTEGER
);
-- Sample data
INSERT INTO application_logs (level, message, service, user_id) VALUES
('INFO', 'User login successful', 'auth', 12345),
('ERROR', 'Database connection timeout', 'api', 67890),
('WARN', 'Rate limit approaching', 'api', 12345),
('ERROR', 'Failed to connect to payment gateway', 'payments', 11111);
BISCUIT Setup:
SELECT biscuit_setup('application_logs', 'message', 'log_id');
Query Examples:
-- Find all connection errors
SELECT * FROM biscuit_match('%connection%')
WHERE level = 'ERROR'
ORDER BY timestamp DESC
LIMIT 100;
-- Find timeout issues across all services
SELECT
service,
COUNT(*) AS timeout_count,
MIN(timestamp) AS first_seen,
MAX(timestamp) AS last_seen
FROM biscuit_match('%timeout%')
WHERE timestamp > NOW() - INTERVAL '24 hours'
GROUP BY service
ORDER BY timeout_count DESC;
-- Complex error pattern
SELECT * FROM biscuit_match('%failed%connect%')
WHERE level IN ('ERROR', 'CRITICAL')
AND timestamp > NOW() - INTERVAL '1 hour'
ORDER BY timestamp DESC;
-- Error correlation
WITH error_patterns AS (
SELECT
log_id,
message,
timestamp,
CASE
WHEN message LIKE '%timeout%' THEN 'timeout'
WHEN message LIKE '%connection%' THEN 'connection'
WHEN message LIKE '%memory%' THEN 'memory'
ELSE 'other'
END AS error_category
FROM biscuit_match('%error%')
WHERE timestamp > NOW() - INTERVAL '24 hours'
)
SELECT
error_category,
COUNT(*) AS occurrences,
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS percentage
FROM error_patterns
GROUP BY error_category
ORDER BY occurrences DESC;
Real-Time Monitoring:
-- Create monitoring view
CREATE OR REPLACE VIEW recent_errors AS
SELECT
log_id,
timestamp,
level,
message,
service,
user_id
FROM biscuit_match('%error%')
WHERE timestamp > NOW() - INTERVAL '5 minutes'
ORDER BY timestamp DESC;
-- Query view
SELECT * FROM recent_errors;
Example 4: Genetic Sequence Matching
Scenario: Bioinformatics database with DNA sequences, need pattern matching
Schema:
CREATE TABLE dna_sequences (
sequence_id VARCHAR(50) PRIMARY KEY,
organism TEXT,
sequence TEXT, -- ATCG sequence
length INTEGER,
discovered_date DATE
);
-- Sample data
INSERT INTO dna_sequences VALUES
('SEQ001', 'E. coli', 'ATCGATCGATCG', 12, '2024-01-15'),
('SEQ002', 'Human', 'GCTAGCTAGCTA', 12, '2024-01-16'),
('SEQ003', 'E. coli', 'ATCGGGCCCATCG', 13, '2024-01-17');
BISCUIT Setup:
SELECT biscuit_setup('dna_sequences', 'sequence', 'sequence_id');
Query Examples:
-- Find sequences with specific motif
SELECT * FROM biscuit_match('%ATCG%');
-- Find palindromic sequences (conceptual)
SELECT * FROM biscuit_match('%GATC%'); -- Start codon
-- Complex pattern matching
SELECT * FROM biscuit_match('%ATC%GGG%')
WHERE organism = 'E. coli';
-- Count sequences by pattern
SELECT
'Contains ATCG' AS pattern,
biscuit_match_count('%ATCG%') AS count
UNION ALL
SELECT
'Starts with GC',
biscuit_match_count('GC%')
UNION ALL
SELECT
'Ends with TA',
biscuit_match_count('%TA');
Example 5: Social Media Username Search
Scenario: Social platform with 100M users, need fast username search
Schema:
CREATE TABLE users (
user_id BIGSERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
display_name TEXT,
bio TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
follower_count INTEGER DEFAULT 0
);
-- Sample data
INSERT INTO users (username, display_name, bio, follower_count) VALUES
('john_doe_123', 'John Doe', 'Software engineer', 1250),
('jane_smith', 'Jane Smith', 'Data scientist', 3400),
('tech_guru_99', 'Tech Guru', 'Tech enthusiast', 15000);
BISCUIT Setup:
-- Index lowercase usernames for case-insensitive search
ALTER TABLE users ADD COLUMN username_lower TEXT
GENERATED ALWAYS AS (LOWER(username)) STORED;
SELECT biscuit_setup('users', 'username_lower', 'user_id');
Query Examples:
-- Search usernames
SELECT user_id, username, display_name, follower_count
FROM biscuit_match('%tech%')
ORDER BY follower_count DESC;
-- Find available similar usernames
WITH desired_username AS (
SELECT 'john_doe' AS base
),
existing_matches AS (
SELECT username
FROM biscuit_match('john_doe%')
)
SELECT
base || '_' || generate_series(1, 10) AS suggestion
FROM desired_username
WHERE base || '_' || generate_series(1, 10) NOT IN (
SELECT username FROM existing_matches
)
LIMIT 5;
-- Find influencers matching pattern
SELECT username, display_name, follower_count
FROM biscuit_match('%guru%')
WHERE follower_count > 10000
ORDER BY follower_count DESC;
-- Username analytics
WITH username_patterns AS (
SELECT
CASE
WHEN username LIKE '%\_\_\_%' THEN 'has_underscores'
WHEN username LIKE '%[0-9]%' THEN 'has_numbers'
ELSE 'alphanumeric'
END AS pattern_type,
follower_count
FROM biscuit_match('%')
)
SELECT
pattern_type,
COUNT(*) AS user_count,
AVG(follower_count) AS avg_followers
FROM username_patterns
GROUP BY pattern_type;
Example 6: Document Management System
Scenario: Legal document repository, need full-text-like search on titles
Schema:
CREATE TABLE documents (
doc_id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
document_type VARCHAR(50),
file_path TEXT,
uploaded_by INTEGER,
uploaded_at TIMESTAMPTZ DEFAULT NOW(),
case_number VARCHAR(50)
);
-- Sample data
INSERT INTO documents (title, document_type, case_number) VALUES
('Contract Agreement - Smith vs Johnson', 'Contract', 'CASE-2024-001'),
('Patent Application for AI Technology', 'Patent', 'PAT-2024-042'),
('Lease Agreement - Commercial Property', 'Lease', 'CASE-2024-002');
BISCUIT Setup:
-- Create searchable column with normalized text
ALTER TABLE documents ADD COLUMN title_search TEXT
GENERATED ALWAYS AS (LOWER(REGEXP_REPLACE(title, '[^a-z0-9 ]', '', 'gi'))) STORED;
SELECT biscuit_setup('documents', 'title_search', 'doc_id');
Query Examples:
-- Find contract documents
SELECT * FROM biscuit_match('%contract%');
-- Multi-term search (all terms must appear)
SELECT *
FROM biscuit_match('%lease%commercial%')
WHERE document_type = 'Lease';
-- Case document search
SELECT doc_id, title, case_number, uploaded_at
FROM biscuit_match('%smith%johnson%')
ORDER BY uploaded_at DESC;
-- Document type distribution
WITH doc_matches AS (
SELECT document_type, COUNT(*) AS cnt
FROM biscuit_match('%agreement%')
GROUP BY document_type
)
SELECT
document_type,
cnt,
ROUND(100.0 * cnt / SUM(cnt) OVER (), 2) AS percentage
FROM doc_matches
ORDER BY cnt DESC;
Example 7: Bulk Data Migration with BISCUIT
Scenario: Migrating 10M records from legacy system
Migration Script:
-- Step 1: Create target table
CREATE TABLE products_new (
id INTEGER PRIMARY KEY,
sku TEXT,
name TEXT,
description TEXT
);
-- Step 2: Prepare BISCUIT (without triggers for bulk)
SELECT biscuit_build_index('products_new', 'name', 'id');
-- Step 3: Bulk load data
\timing on
SELECT biscuit_disable_triggers();
COPY products_new FROM '/data/products.csv' CSV HEADER;
-- Alternative: INSERT SELECT from staging
Time: 45,000 ms
-- Step 4: Rebuild index
SELECT biscuit_build_index('products_new', 'name', 'id');
Time: 38,000 ms
-- Step 5: Enable triggers for future updates
SELECT biscuit_enable_triggers();
\timing off
-- Step 6: Verify
SELECT
COUNT(*) AS table_count,
biscuit_get_active_count() AS index_count,
biscuit_get_free_slots() AS free_slots
FROM products_new;
-- Step 7: Test queries
SELECT biscuit_match_count('%'); -- Should match table count
Example 8: Multi-Pattern Search Interface
Scenario: Build API endpoint that accepts multiple search patterns
Application Code (Python + psycopg2):
import psycopg2
from typing import List, Dict
def multi_pattern_search(patterns: List[str], table: str) -> List[Dict]:
"""
Search for records matching any of the provided patterns
"""
conn = psycopg2.connect("dbname=mydb user=myuser")
cur = conn.cursor()
# Build UNION query for multiple patterns
union_queries = []
for pattern in patterns:
union_queries.append(
f"SELECT * FROM biscuit_match(%s)"
)
query = " UNION ".join(union_queries)
cur.execute(query, patterns)
results = cur.fetchall()
cur.close()
conn.close()
return results
# Usage
results = multi_pattern_search(
patterns=['%laptop%', '%macbook%', '%computer%'],
table='products'
)
SQL Equivalent:
-- Search multiple patterns
SELECT * FROM biscuit_match('%laptop%')
UNION
SELECT * FROM biscuit_match('%macbook%')
UNION
SELECT * FROM biscuit_match('%computer%')
ORDER BY id;
-- Or with CTE for clarity
WITH
pattern1 AS (SELECT * FROM biscuit_match('%laptop%')),
pattern2 AS (SELECT * FROM biscuit_match('%macbook%')),
pattern3 AS (SELECT * FROM biscuit_match('%computer%'))
SELECT * FROM pattern1
UNION
SELECT * FROM pattern2
UNION
SELECT * FROM pattern3;
Example 9: Performance Monitoring Dashboard
Scenario: Real-time monitoring of BISCUIT performance
Monitoring Queries:
-- Create monitoring function
CREATE OR REPLACE FUNCTION biscuit_performance_metrics()
RETURNS TABLE (
metric TEXT,
value NUMERIC,
unit TEXT,
status TEXT
) AS $
DECLARE
active_ct INTEGER;
tombstone_ct INTEGER;
free_ct INTEGER;
memory_val NUMERIC;
query_ct BIGINT;
insert_ct BIGINT;
update_ct BIGINT;
delete_ct BIGINT;
BEGIN
active_ct := biscuit_get_active_count();
tombstone_ct := biscuit_get_tombstone_count();
free_ct := biscuit_get_free_slots();
memory_val := (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC;
query_ct := (regexp_matches(biscuit_index_status(), 'Queries: ([0-9]+)'))[1]::BIGINT;
insert_ct := (regexp_matches(biscuit_index_status(), 'Inserts: ([0-9]+)'))[1]::BIGINT;
update_ct := (regexp_matches(biscuit_index_status(), 'Updates: ([0-9]+)'))[1]::BIGINT;
delete_ct := (regexp_matches(biscuit_index_status(), 'Deletes: ([0-9]+)'))[1]::BIGINT;
RETURN QUERY SELECT 'Active Records'::TEXT, active_ct::NUMERIC, 'records'::TEXT,
CASE WHEN active_ct > 0 THEN 'OK' ELSE 'WARNING' END;
RETURN QUERY SELECT 'Tombstones', tombstone_ct::NUMERIC, 'records',
CASE WHEN tombstone_ct < 5000 THEN 'OK' WHEN tombstone_ct < 10000 THEN 'WARNING' ELSE 'CRITICAL' END;
RETURN QUERY SELECT 'Free Slots', free_ct::NUMERIC, 'slots', 'INFO';
RETURN QUERY SELECT 'Memory Usage', memory_val, 'MB',
CASE WHEN memory_val < 100 THEN 'OK' WHEN memory_val < 500 THEN 'WARNING' ELSE 'HIGH' END;
RETURN QUERY SELECT 'Total Queries', query_ct::NUMERIC, 'operations', 'INFO';
RETURN QUERY SELECT 'Total Inserts', insert_ct::NUMERIC, 'operations', 'INFO';
RETURN QUERY SELECT 'Total Updates', update_ct::NUMERIC, 'operations', 'INFO';
RETURN QUERY SELECT 'Total Deletes', delete_ct::NUMERIC, 'operations', 'INFO';
END;
$ LANGUAGE plpgsql;
-- Use in dashboard
SELECT * FROM biscuit_performance_metrics();
Output:
metric | value | unit | status
--------------------+-------+------------+--------
Active Records | 998567| records | OK
Tombstones | 856 | records | OK
Free Slots | 1433 | slots | INFO
Memory Usage | 18.3 | MB | OK
Total Queries | 45892 | operations | INFO
Total Inserts | 1000000| operations| INFO
Total Updates | 125678| operations | INFO
Total Deletes | 1433 | operations | INFO
Complete Production Example
Full Setup for Production Environment:
-- 1. Create schema
CREATE SCHEMA production;
SET search_path TO production;
-- 2. Create table
CREATE TABLE products (
id SERIAL PRIMARY KEY,
sku VARCHAR(100) UNIQUE NOT NULL,
name TEXT NOT NULL,
description TEXT,
category VARCHAR(100),
price NUMERIC(12, 2),
stock_level INTEGER DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- 3. Add indexes
CREATE INDEX idx_products_category ON products(category);
CREATE INDEX idx_products_price ON products(price);
-- 4. Setup BISCUIT
SELECT biscuit_setup('products', 'name', 'id');
-- 5. Verify setup
SELECT biscuit_index_status();
-- 6. Insert sample data
INSERT INTO products (sku, name, category, price, stock_level)
SELECT
'SKU-' || generate_series(1, 100000),
'Product ' || generate_series(1, 100000),
(ARRAY['Electronics', 'Furniture', 'Clothing'])[floor(random() * 3 + 1)],
(random() * 1000)::NUMERIC(12, 2),
floor(random() * 100)::INTEGER;
-- 7. Test queries
\timing on
SELECT * FROM biscuit_match('%Product 1234%');
SELECT biscuit_match_count('%Electronics%');
\timing off
-- 8. Setup monitoring
CREATE TABLE biscuit_metrics_log (
logged_at TIMESTAMPTZ DEFAULT NOW(),
active_count INTEGER,
tombstone_count INTEGER,
memory_mb NUMERIC,
query_count BIGINT
);
-- 9. Schedule monitoring (via pg_cron)
SELECT cron.schedule(
'biscuit-metrics',
'*/5 * * * *', -- Every 5 minutes
$INSERT INTO production.biscuit_metrics_log
SELECT NOW(), biscuit_get_active_count(), biscuit_get_tombstone_count(),
(regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC,
(regexp_matches(biscuit_index_status(), 'Queries: ([0-9]+)'))[1]::BIGINT$
);
-- 10. Create API views
CREATE VIEW product_search AS
SELECT id, sku, name, category, price
FROM products
WHERE id IN (SELECT id FROM biscuit_match('%')); -- Template view
License
PostgreSQL License (similar to MIT/BSD)
Copyright (c) 2024 BISCUIT Contributors
Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.
Version History
v1.0.0 (Current)
- Initial release
- Roaring Bitmap integration
- Lazy deletion with tombstones
- Incremental updates
- O(1) hash table PK lookup
- Automatic trigger-based CRUD
- Comprehensive monitoring
Contributors
BISCUIT is developed and maintained by Sivaprasad Murali .
Support and Contact
Issues: https://github.com/crystallinecore/biscuit/issues
Discussions: https://github.com/crystallinecore/biscuit/discussions
When pg_trgm feels half-baked, grab a BISCUIT 🍪