Extensions
README
Contents
- BISCUIT: Bitmap Indexed Searching with Combinatorial Union and Intersection Techniques
- PostgreSQL Extension for High-Performance Wildcard Pattern Matching
- Table of Contents
- Executive Summary
- Technical Overview
- Installation and Requirements
- Configuration and Setup
- Query Interface
- Pattern Matching Specification
- CRUD Operations and Index Maintenance
- Architecture and Implementation
- Advanced Features
- API Reference
- Monitoring and Diagnostics
- Optimization Guidelines
- Troubleshooting
- Examples and Use Cases
- Example 1: E-Commerce Product Search
- Example 2: Customer Email Domain Analysis
- Example 3: Log Analysis and Error Detection
- Example 4: Genetic Sequence Matching
- Example 5: Social Media Username Search
- Example 6: Document Management System
- Example 7: Bulk Data Migration with BISCUIT
- Example 8: Multi-Pattern Search Interface
- Example 9: Performance Monitoring Dashboard
- Complete Production Example
- License
- THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- Version History
- Contributors
- Support and Contact
BISCUIT: Bitmap Indexed Searching with Combinatorial Union and Intersection Techniques
PostgreSQL Extension for High-Performance Wildcard Pattern Matching
Version 1.0.2
Table of Contents
- Executive Summary
- Technical Overview
- Installation and Requirements
- Configuration and Setup
- Query Interface
- Pattern Matching Specification
- CRUD Operations and Index Maintenance
- Architecture and Implementation
- API Reference
- Monitoring and Diagnostics
- Optimization Guidelines
- Limitations and Constraints
- Troubleshooting
- Examples and Use Cases
Executive Summary
BISCUIT is a PostgreSQL C extension that provides specialized bitmap indexing for wildcard pattern matching operations. The extension addresses a fundamental limitation in traditional database indexes: efficient handling of leading wildcard queries (e.g., LIKE '%pattern'), which typically require full table scans.
Key Capabilities
- High-Performance Wildcard Matching: Achieves O(1) to O(log n) query performance regardless of wildcard placement
- Automatic Index Maintenance: Transparent CRUD synchronization via PostgreSQL triggers
- Memory-Efficient Storage: Compressed bitmap structures using Roaring Bitmap algorithm
- Lazy Deletion Strategy: Deferred cleanup with intelligent batching for optimal throughput
- Incremental Update Optimization: Minimized reindexing overhead for similar string modifications
- Universal Primary Key Support: Compatible with any PostgreSQL data type
- Production-Ready Design: Comprehensive error handling, statistics tracking, and diagnostic capabilities
Performance Characteristics
| Metric | Value | |––––|—––| | Query Latency (Simple Pattern) | O(1) to O(log n) | | Query Latency (Complex Pattern) | O(k × m), k=parts, m=window | | Insert Operation | O(L), L=string length | | Update Operation (Incremental) | O(d), d=differences | | Delete Operation | O(1) (lazy tombstone) | | Memory Overhead | 10-30% of indexed data size | | Supported String Length | 256 characters (indexed portion) |
Technical Overview
Problem Statement
Traditional database indexes (B-tree, GiST, GIN) exhibit suboptimal performance for wildcard pattern matching queries, particularly when wildcards appear at the beginning of patterns. A query such as SELECT * FROM table WHERE column LIKE '%pattern%' typically degrades to a sequential scan with O(n) complexity, where n represents the total number of rows.
Solution Architecture
BISCUIT implements a multi-dimensional bitmap index structure that decomposes strings into positional character mappings. Each character at each position is tracked using compressed bitmaps (Roaring Bitmaps), enabling constant-time intersection operations for pattern matching.
Core Index Structures
-
Positional Character Index: Maps each character to a sorted array of position-bitmap pairs
- Forward positions:
{0, 1, 2, ..., n}from string start - Reverse positions:
{-1, -2, -3, ..., -n}from string end
- Forward positions:
-
Character Existence Cache: Union of all positional bitmaps per character for O(1) containment queries
-
Length Index: Bitmap array indexed by string length for length-constrained queries
-
Primary Key Hash Table: O(1) lookup structure mapping primary keys to internal indices
-
Lazy Deletion Layer: Tombstone bitmap tracking deleted records with deferred cleanup
Algorithmic Approach
Pattern Parsing: Input patterns are decomposed into literal segments separated by wildcards (%). Each segment’s position constraints (prefix, suffix, or floating) are analyzed.
Query Optimization: Pattern structure determines execution strategy:
- Single-character queries: Direct bitmap lookup
- Prefix patterns (
abc%): Position-0 intersection with length filter - Suffix patterns (
%xyz): Negative-position intersection with length filter - Containment patterns (
%abc%): Character cache intersection - Multi-part patterns: Windowed position matching with recursive refinement
Bitmap Operations: Compressed bitmaps enable efficient set operations:
- Intersection: AND operation for constraint combination
- Union: OR operation for alternative matches
- Difference: ANDNOT operation for tombstone filtering
Data Flow
Query Pattern → Parse → Optimize → Bitmap Ops → Filter Tombstones → Result Set
↓
Insert/Update/Delete → Hash Lookup → Bitmap Update → Tombstone Management
Installation and Requirements
System Requirements
- PostgreSQL Version: 11.0 or higher
- Operating System: Linux, macOS, Windows (with MinGW)
- Architecture: x86_64, ARM64
- Memory: Minimum 512MB available RAM per index
- Compiler: GCC 4.8+, Clang 3.9+, or MSVC 2017+
Optional Dependencies
- CRoaring Library: For optimized Roaring Bitmap operations (recommended)
- If unavailable, extension falls back to built-in bitmap implementation
- Installation:
apt-get install libroaring-dev(Debian/Ubuntu)
Installation Procedure
Method 1: PostgreSQL Extension System
CREATE EXTENSION biscuit;
This command must be executed by a superuser or a user with CREATE EXTENSION privileges.
Method 2: Manual Compilation
# Clone repository
git clone https://github.com/crystallinecore/biscuit.git
cd biscuit
# Compile
make
# Install (requires superuser privileges)
sudo make install
# Load extension
psql -d your_database -c "CREATE EXTENSION biscuit;"
Verification
SELECT biscuit_version();
-- Expected output: 1.0.2-Biscuit
Configuration and Setup
Initialization Options
Option 1: Automated Setup (Recommended)
The biscuit_setup() function provides a complete initialization workflow:
SELECT biscuit_setup(
p_table_name TEXT,
p_column_name TEXT,
p_pk_column_name TEXT DEFAULT 'id'
);
Parameters:
p_table_name: Target table name (required)p_column_name: Column to index (required, must be text-compatible)p_pk_column_name: Primary key column name (default: ‘id’)
Operations Performed:
- Validates table and column existence
- Constructs bitmap index structure
- Generates strongly-typed wrapper functions
- Installs AFTER triggers for INSERT, UPDATE, DELETE operations
- Returns detailed confirmation message
Example:
SELECT biscuit_setup('customer_records', 'email_address', 'customer_id');
Output:
Biscuit index built successfully.
Created biscuit_match() and biscuit_match_rows() functions for table: customer_records
Columns: customer_id integer, email_address text
Successfully created trigger on table: customer_records
The index will now automatically update on INSERT, UPDATE, and DELETE operations.
Option 2: Manual Configuration (Advanced)
For fine-grained control, configure components individually:
Step 1: Index Construction
SELECT biscuit_build_index(
table_name TEXT,
column_name TEXT,
pk_column_name TEXT DEFAULT 'id'
);
Step 2: Function Generation
SELECT biscuit_create_match_function();
This creates two query functions:
biscuit_match(pattern TEXT): ReturnsSETOF table_typebiscuit_match_rows(pattern TEXT): ReturnsTABLE(pk type, value TEXT)
Step 3: Trigger Activation
SELECT biscuit_enable_triggers();
Installs trigger named biscuit_auto_update on the indexed table.
Primary Key Considerations
BISCUIT supports heterogeneous primary key types through automatic type conversion:
| PostgreSQL Type | Internal Representation | Notes | |––––––––|————————|—––| | INT2, INT4, INT8 | String (optimized) | Direct conversion | | TEXT, VARCHAR, CHAR | String (native) | No conversion overhead | | UUID | Canonical string format | Standardized representation | | Other types | Type output function | Uses PostgreSQL type system |
Performance Note: Integer primary keys receive optimized conversion paths. UUID and text types have minimal overhead. Custom types utilize PostgreSQL’s type output functions.
Index Scope and Limitations
- Single Column Per Index: Each index covers one text column
- Multiple Indexes Supported: Create separate indexes for different columns
- Table Scope: One active BISCUIT index per table
- Character Limit: Positions beyond 256 characters are not indexed (but stored)
Example: Multiple Column Indexing
-- Index email column
SELECT biscuit_setup('users', 'email', 'id');
-- To index another column, use a different approach
-- (current implementation: one index per table)
Query Interface
Function Signatures and Return Types
Primary Query Functions
1. Full Tuple Retrieval (Strongly-Typed)
biscuit_match(pattern TEXT) RETURNS SETOF table_type
Returns complete rows from the indexed table matching the specified pattern. Return type matches the exact schema of the indexed table.
Characteristics:
- Type-safe: Returns actual table type
- No column specification required
- Supports standard SQL operations (WHERE, ORDER BY, JOIN)
- Optimal for production queries
Example:
SELECT * FROM biscuit_match('%@example.com%');
SELECT id, email FROM biscuit_match('user%') WHERE created_at > '2024-01-01';
SELECT COUNT(*) FROM biscuit_match('%gmail%');
2. Generic Tuple Retrieval
biscuit_match_rows(pattern TEXT) RETURNS TABLE(pk primary_key_type, value TEXT)
Returns primary key and indexed column value pairs.
Use Cases:
- When only key-value pairs are needed
- Intermediate processing before joining
- Reduced data transfer overhead
Example:
SELECT pk, value FROM biscuit_match_rows('%pattern%');
-- Join back to original table if needed
SELECT t.*
FROM original_table t
JOIN biscuit_match_rows('%pattern%') b ON t.id = b.pk;
3. Count Aggregation
biscuit_match_count(pattern TEXT) RETURNS INTEGER
Returns the count of matching records without materializing result set.
Performance: This is the fastest query method, performing only bitmap operations without tuple retrieval.
Example:
SELECT biscuit_match_count('%search_term%');
4. Key-Value Extraction
biscuit_match_keys(pattern TEXT) RETURNS TABLE(pk TEXT, value TEXT)
Low-level function returning primary keys and indexed values as text. Used internally by higher-level functions.
Note: Primary keys are returned as text regardless of original type.
Query Patterns and SQL Integration
Standard SQL Operations
All biscuit_match() results support complete SQL operations:
-- Filtering
SELECT * FROM biscuit_match('%pattern%')
WHERE status = 'active' AND created_at > NOW() - INTERVAL '30 days';
-- Ordering
SELECT * FROM biscuit_match('A%')
ORDER BY name DESC, created_at ASC
LIMIT 100;
-- Aggregation
SELECT category, COUNT(*), AVG(price)
FROM biscuit_match('%widget%')
GROUP BY category
HAVING COUNT(*) > 10;
-- Joins
SELECT p.*, c.category_name
FROM biscuit_match('%special%') p
JOIN categories c ON p.category_id = c.id;
-- Subqueries
SELECT *
FROM products
WHERE id IN (
SELECT id FROM biscuit_match('%premium%')
);
-- Common Table Expressions
WITH matched_products AS (
SELECT * FROM biscuit_match('%electronics%')
)
SELECT m.*, s.stock_level
FROM matched_products m
JOIN stock s ON m.id = s.product_id;
Performance Considerations
Optimal Patterns:
- Use bitmap index for pattern matching
- Apply additional filters after pattern match
- Leverage PostgreSQL’s query planner for joins
Query Plan Example:
EXPLAIN ANALYZE
SELECT * FROM biscuit_match('%search%')
WHERE status = 'active';
-- Result: Bitmap operations complete in milliseconds
-- Additional filter applied to result set
Pattern Matching Specification
Wildcard Operators
BISCUIT implements SQL LIKE-style pattern matching with two wildcard operators:
| Operator | Semantics | Character Match | Examples |
|–––––|———–|––––––––|–––––|
| % | Zero or more characters | [0, ∞) | 'a%' → “a”, “ab”, “abcdef” |
| _ | Exactly one character | 1 | 'a_c' → “abc”, “axc” |
Pattern Categories and Optimization Paths
1. Exact Match Patterns
Pattern Structure: No wildcards
Example: 'exact_string'
Optimization: Length bitmap intersection with position-0 character match
Complexity: O(1)
SELECT biscuit_match_count('hello');
-- Matches only "hello" exactly
2. Prefix Patterns
Pattern Structure: Literal prefix followed by %
Example: 'prefix%'
Optimization: Position-0 character sequence with length≥prefix_length filter
Complexity: O(log n)
SELECT * FROM biscuit_match('user_');
-- Matches: "user_admin", "user_john", etc.
3. Suffix Patterns
Pattern Structure: % followed by literal suffix
Example: '%suffix'
Optimization: Negative-position character sequence with length≥suffix_length filter
Complexity: O(log n)
SELECT * FROM biscuit_match('%@gmail.com');
-- Matches: "john@gmail.com", "admin@gmail.com"
4. Containment Patterns
Pattern Structure: %literal%
Example: '%search%'
Optimization: Character cache intersection
Complexity: O(1) to O(m), m=pattern_length
SELECT * FROM biscuit_match('%error%');
-- Matches any string containing "error"
5. Complex Multi-Part Patterns
Pattern Structure: Multiple literal segments with wildcards Examples:
'start%middle%end''%part1%part2%''a%b%c%d'
Optimization: Windowed position matching with recursive refinement Complexity: O(k × m), k=parts, m=search_window
SELECT * FROM biscuit_match('user_%@%.com');
-- Matches: "user_john@example.com", "user_admin@test.com"
6. Single-Character Wildcard Patterns
Pattern Structure: Underscores with optional literals Examples:
'___'(exactly 3 characters)'a_c'(a, any char, c)'___%'(at least 3 characters)
Optimization: Length bitmap or position bitmap intersection Complexity: O(1) to O(log n)
SELECT * FROM biscuit_match('___');
-- Matches: "abc", "xyz", "123" (any 3-character string)
SELECT * FROM biscuit_match('test_');
-- Matches: "test1", "testA", "test_" (test + any char)
Pattern Syntax Reference
| Pattern | Description | Matches | Does Not Match |
|———|———––|———|––––––––|
| 'abc' | Exact string | “abc” | “abcd”, “xabc”, “ABC” |
| 'a%' | Starts with ‘a’ | “a”, “abc”, “a123” | “ba”, “ABC” |
| '%z' | Ends with ‘z’ | “z”, “xyz”, “123z” | “za”, “Z” |
| '%abc%' | Contains ‘abc’ | “abc”, “xabcx”, “123abc” | “ab”, “ABC” |
| 'a%z' | Starts ‘a’, ends ‘z’ | “az”, “abcz”, “a123xyz” | “za”, “abc” |
| '_' | Single character | “a”, “1”, “” | “”, “ab” |
| '___' | Exactly 3 chars | “abc”, “123”, “__” | “ab”, “abcd” |
| 'a_c' | ‘a’ + any + ‘c’ | “abc”, “axc”, “a1c” | “ac”, “abbc” |
| '%a%b%' | ‘a’ before ‘b’ | “ab”, “axb”, “xaby” | “ba”, “bxa” |
| '%' | Any string | All non-NULL | NULL |
| '' | Empty string | “” | Any non-empty |
Special Cases
Empty Pattern
SELECT biscuit_match_count('');
-- Returns count of empty strings in indexed column
Match All
SELECT biscuit_match_count('%');
-- Returns count of all non-deleted records
Case Sensitivity
BISCUIT patterns are case-sensitive by default. For case-insensitive matching:
-- Option 1: Index lowercase values
CREATE TABLE users_normalized (
id SERIAL PRIMARY KEY,
email TEXT,
email_lower TEXT GENERATED ALWAYS AS (LOWER(email)) STORED
);
SELECT biscuit_setup('users_normalized', 'email_lower', 'id');
SELECT * FROM biscuit_match(LOWER('%SEARCH%'));
-- Option 2: Use expression index (future enhancement)
NULL Handling
- NULL values are never indexed
- NULL column values are treated as empty strings during indexing
- Pattern matching never returns NULL values
-- NULL values are skipped
INSERT INTO table (id, text_col) VALUES (1, NULL);
SELECT biscuit_match_count('%');
-- Does not include row 1
CRUD Operations and Index Maintenance
Automatic Synchronization
When triggers are enabled via biscuit_setup() or biscuit_enable_triggers(), all DML operations automatically maintain index consistency with zero application code changes.
INSERT Operations
Behavior: New records are immediately indexed and available for queries.
Algorithm:
- Extract primary key and indexed column value from new tuple
- Convert primary key to canonical string representation
- Allocate index slot (reuse free slot if available, otherwise expand)
- Generate positional bitmaps for each character
- Update length index and character cache
- Register primary key in hash table
Complexity: O(L), where L = string length (up to 256 characters)
Example:
INSERT INTO products (id, name, description)
VALUES (10001, 'Premium Widget', 'High-quality widget');
-- Index automatically updated
SELECT * FROM biscuit_match('%Widget%');
-- Returns the newly inserted row immediately
Bulk Insert Performance:
-- For large bulk inserts, consider disabling triggers
SELECT biscuit_disable_triggers();
INSERT INTO products (id, name)
SELECT generate_series(1, 1000000), 'Product ' || generate_series(1, 1000000);
-- Rebuild index after bulk operation
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
UPDATE Operations
Behavior: Updates are intelligently optimized based on string similarity.
Incremental Update Path
Criteria for Incremental Update:
- Same string length (before and after)
- Minimum length ≥ 3 characters
- Character differences < 20% of string length
- Maximum 3 changed characters
Algorithm:
- Compute character-level diff between old and new values
- Remove old characters from affected position bitmaps
- Add new characters to affected position bitmaps
- Update character cache for changed characters only
Complexity: O(d), where d = number of different characters
Example (Incremental):
-- Original: "john.doe@example.com"
UPDATE users SET email = 'john.doe@example.org' WHERE id = 123;
-- Only 3 characters changed (com→org), uses incremental update
Full Reindex Path
Criteria:
- Different string lengths
- Changes exceed incremental update threshold
- String < 3 characters
Algorithm:
- Remove all position bitmaps for old value
- Remove from length index and character cache
- Re-index completely with new value
- Update length index and character cache
Complexity: O(L), where L = new string length
Example (Full Reindex):
-- Original: "short"
UPDATE users SET email = 'much.longer.email@example.com' WHERE id = 123;
-- Different lengths, requires full reindex
Statistics Tracking:
SELECT biscuit_index_status();
-- Shows ratio of incremental vs full updates
-- Example: "Updates: 1000 (incr: 750, 75.0%)"
DELETE Operations
Behavior: Implements lazy deletion with tombstone bitmaps for optimal performance.
Algorithm (O(1) Lazy Deletion):
- Hash table lookup to find internal index (O(1))
- Add index to tombstone bitmap (O(1))
- Store index in deletion queue
- Remove primary key from hash table
- Add slot to free list for future reuse
- Defer removal from position/length bitmaps
Complexity: O(1) - actual cleanup deferred
Example:
DELETE FROM products WHERE id = 123;
-- Returns immediately, cleanup happens later
Tombstone Cleanup Process
Automatic Cleanup Triggers:
- Tombstone count ≥ threshold (default: 1000)
- Manual invocation:
SELECT biscuit_cleanup();
Cleanup Algorithm:
- Batch remove tombstoned indices from all bitmaps
- Free associated string memory
- Clear tombstone bitmap
- Reset deletion queue
Cleanup Statistics:
SELECT biscuit_index_status();
-- Pending tombstones: 856 (85.6% of deletes)
-- Total cleanups: 12
-- Items cleaned: 11442
Performance Characteristics:
| Scenario | Delete Latency | Query Overhead | Memory Overhead | |–––––|—————|––––––––|—————–| | No tombstones | O(1) | None | Minimal | | < 1000 tombstones | O(1) | O(t) filter | ~t × 32 bytes | | ≥ 1000 tombstones | O(1) + cleanup | Minimal | Minimal |
Note: Query operations automatically filter tombstones, so deleted records never appear in results despite deferred cleanup.
Upsert Operations (INSERT … ON CONFLICT)
Behavior: Automatically handled through INSERT and UPDATE trigger paths.
Example:
INSERT INTO products (id, name)
VALUES (123, 'New Product')
ON CONFLICT (id)
DO UPDATE SET name = EXCLUDED.name;
-- If row 123 exists: triggers UPDATE path
-- If row 123 doesn't exist: triggers INSERT path
Transaction Semantics
ACID Compliance:
- Atomicity: Index updates commit/rollback with transaction
- Consistency: Index always reflects committed table state
- Isolation: Follows PostgreSQL’s MVCC isolation levels
- Durability: Index persists across server restarts (rebuild required)
Example:
BEGIN;
INSERT INTO products (id, name) VALUES (1, 'Product A');
SELECT * FROM biscuit_match('%Product A%'); -- Visible in transaction
ROLLBACK;
SELECT * FROM biscuit_match('%Product A%'); -- Not visible, index reverted
Current Limitation: Index resides in shared memory and must be rebuilt after server restart.
Trigger Management
Enable Triggers
SELECT biscuit_enable_triggers();
Creates trigger: biscuit_auto_update as AFTER INSERT OR UPDATE OR DELETE FOR EACH ROW.
Disable Triggers
SELECT biscuit_disable_triggers();
Removes triggers but preserves index structure. Use for:
- Bulk data operations
- Table truncation
- Data migration
- Testing scenarios
Check Trigger Status
SELECT biscuit_index_status();
-- Shows whether triggers are active
Architecture and Implementation
System Components
┌─────────────────────────────────────────────────────────────┐
│ PostgreSQL Backend │
├─────────────────────────────────────────────────────────────┤
│ Query Interface │ Trigger System │ Memory Management │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ BISCUIT Extension Layer │
├──────────────────────┬──────────────────┬───────────────────┤
│ Pattern Parser │ Query Executor │ CRUD Handler │
│ - Wildcard analysis │ - Bitmap ops │ - Insert/Update │
│ - Optimization │ - Result filter │ - Delete/Cleanup │
└──────────────────────┴──────────────────┴───────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Index Data Structures │
├──────────────┬──────────────┬──────────────┬────────────────┤
│ Positional │ Character │ Length │ PK Hash │
│ Bitmaps │ Cache │ Index │ Table │
│ (256 pos) │ (256 chars) │ (L bitmaps) │ (O(1)) │
└──────────────┴──────────────┴──────────────┴────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Roaring Bitmap Implementation │
├──────────────────────────────────────────────────────────────┤
│ - Compressed bitmap storage │
│ - Fast set operations (AND, OR, ANDNOT) │
│ - Memory-efficient sparse representation │
└──────────────────────────────────────────────────────────────┘
Data Structure Details
1. Positional Character Index
Purpose: Track character occurrences at specific positions
Structure:
typedef struct {
int pos; // Position in string
RoaringBitmap *bitmap; // Record IDs containing char at pos
} PosEntry;
typedef struct {
PosEntry *entries; // Sorted array of position entries
int count; // Number of positions tracked
int capacity; // Allocated capacity
} CharIndex;
CharIndex pos_idx[256]; // Forward positions (0, 1, 2, ...)
CharIndex neg_idx[256]; // Reverse positions (-1, -2, -3, ...)
Example: For strings [“cat”, “car”, “dog”]:
pos_idx['c'][0] = {0, 1} // 'c' at position 0 in records 0, 1
pos_idx['a'][1] = {0, 1} // 'a' at position 1 in records 0, 1
pos_idx['t'][2] = {0} // 't' at position 2 in record 0
neg_idx['t'][-1] = {0} // 't' at last position in record 0
Access Complexity: Binary search over positions: O(log P), P ≤ 256
2. Character Existence Cache
Purpose: O(1) lookup for “contains character” queries
Structure:
RoaringBitmap *char_cache[256]; // Union of all position bitmaps per char
Construction:
char_cache['a'] = pos_idx['a'][0] ∪ pos_idx['a'][1] ∪ ... ∪ pos_idx['a'][n]
Use Case: Pattern %a% directly queries char_cache['a']
3. Length Index
Purpose: Filter records by string length
Structure:
typedef struct {
RoaringBitmap **length_bitmaps; // Array indexed by length
int max_length; // Maximum observed length
} LengthIndex;
Example:
length_bitmaps[3] = {0, 1, 2} // Records 0, 1, 2 have length 3
length_bitmaps[5] = {3, 4} // Records 3, 4 have length 5
Use Case: Pattern ___ queries length_bitmaps[3] directly
4. Primary Key Hash Table
Purpose: O(1) mapping from primary key to internal index
Implementation:
typedef struct {
char pk_str[NAMEDATALEN]; // Fixed-size key (max 64 bytes)
uint32_t idx; // Internal record index
} PKMapEntry;
HTAB *pk_to_index; // PostgreSQL hash table
Operations:
- Insert:
hash_search(HASH_ENTER)- O(1) - Lookup:
hash_search(HASH_FIND)- O(1) - Delete:
hash_search(HASH_REMOVE)- O(1)
Memory: ~96 bytes per entry (key + value + overhead)
5. Lazy Deletion System
Purpose: Defer expensive cleanup operations
Structure:
typedef struct {
RoaringBitmap *tombstones; // Bitmap of deleted indices
uint32_t *deleted_indices; // Queue for cleanup
int tombstone_count; // Current tombstone count
int tombstone_threshold; // Cleanup trigger threshold
int64 total_cleanups; // Lifetime cleanup count
int64 items_cleaned; // Total items cleaned
} LazyDeletion;
Workflow:
- Delete: Mark bit in tombstone bitmap (O(1))
- Query: Filter results using
ANDNOT tombstones(O(log n)) - Cleanup: Batch remove when threshold reached (O(n × 256))
Adaptive Threshold: Adjusts based on query/delete ratio
Memory Management
Allocation Strategy
Memory Context: Dedicated AllocSetContext for index structures
- Lifetime: Until explicit index rebuild or server restart
- Isolation: Independent of query contexts
- Cleanup: Automatic on context deletion
Size Estimation:
Total Memory = Base + Positional + Cache + Length + Hash + Strings
Base = ~1 MB (structs and metadata)
Positional = ~256 chars × P positions × 64 bytes/bitmap ≈ 16P KB
Cache = ~256 chars × 64 bytes/bitmap ≈ 16 KB
Length = L × 64 bytes/bitmap (L = max length)
Hash = N × 96 bytes (N = record count)
Strings = N × avg_length × 2 (data + PK)
Example (1M records, avg 50 chars, max 100 positions):
= 1 MB + 1.6 MB + 16 KB + 6.4 KB + 96 MB + 100 MB ≈ 200 MB
Compression Benefits: Roaring Bitmaps achieve 10-50× compression for sparse data
Garbage Collection
Automatic:
- Tombstone cleanup when threshold reached
- Free list management for slot reuse
Manual:
SELECT biscuit_cleanup(); -- Force tombstone cleanup
Memory Leak Prevention:
- All allocations tracked in index context
- Single deallocation point on rebuild
- No external references retained
Concurrency Model
Index Access: Read-optimized, single-writer model
- Queries: Concurrent read access (no locking)
- Modifications: Serialized through PostgreSQL trigger system
- Consistency: PostgreSQL’s MVCC handles transaction isolation
Limitation: High-concurrency write workloads may experience contention at trigger level (PostgreSQL limitation, not BISCUIT-specific)
Scalability: Read queries scale linearly across cores
Advanced Features
Incremental Updates
For same-length strings with <20% character changes, BISCUIT uses incremental updates:
-- Original: "hello world"
-- Updated: "hello wurld"
-- Only updates changed positions (saves ~80% of work)
Lazy Deletion with Smart Cleanup
Deletes are instant (O(1)) using tombstone bitmaps. Cleanup triggers automatically when:
- Tombstone count reaches threshold (default: 1000)
- Query/Delete ratio suggests overhead
- Manual trigger:
SELECT biscuit_cleanup();
Pattern Optimization
BISCUIT automatically detects and optimizes:
- Wildcard-only patterns:
%%%,___ - Single-character positioned:
%a%,a%,%a - Prefix patterns:
abc% - Suffix patterns:
%xyz - Exact length:
___(3 chars)
API Reference
Index Management Functions
biscuit_setup()
Complete initialization function combining index build, function creation, and trigger setup.
biscuit_setup(
p_table_name TEXT,
p_column_name TEXT,
p_pk_column_name TEXT DEFAULT 'id'
) RETURNS TEXT
Parameters:
p_table_name: Name of table to index (required)p_column_name: Column containing text data to index (required)p_pk_column_name: Primary key column name (default: ‘id’)
Returns: Status message with setup details
Example:
SELECT biscuit_setup('customers', 'email', 'customer_id');
Output:
Biscuit index built successfully.
Created biscuit_match() and biscuit_match_rows() functions for table: customers
Columns: customer_id integer, email text
Successfully created trigger on table: customers
The index will now automatically update on INSERT, UPDATE, and DELETE operations.
Errors:
- Primary key column not found
- Indexed column not found
- Insufficient permissions
- Table does not exist
biscuit_build_index()
Constructs bitmap index structure from table data.
biscuit_build_index(
table_name TEXT,
column_name TEXT,
pk_column_name TEXT DEFAULT 'id'
) RETURNS BOOLEAN
Parameters:
table_name: Target table namecolumn_name: Column to indexpk_column_name: Primary key column (default: ‘id’)
Returns: TRUE on success
Behavior:
- Drops existing index if present
- Scans entire table sequentially
- Builds all bitmap structures
- Allocates shared memory context
Performance: O(n × L), n=rows, L=avg string length
Example:
SELECT biscuit_build_index('products', 'name', 'id');
biscuit_create_match_function()
Generates strongly-typed wrapper functions for queries.
biscuit_create_match_function() RETURNS TEXT
Returns: Confirmation message with function signatures
Generated Functions:
biscuit_match(pattern TEXT) RETURNS SETOF table_typebiscuit_match_rows(pattern TEXT) RETURNS TABLE(pk type, value TEXT)
Prerequisite: Index must be built first
Example:
SELECT biscuit_create_match_function();
Output:
Created biscuit_match() and biscuit_match_rows() functions for table: products
Columns: id integer, name text, category text, price numeric
biscuit_enable_triggers()
Activates automatic index maintenance triggers.
biscuit_enable_triggers() RETURNS TEXT
Returns: Status message
Behavior:
- Creates AFTER INSERT OR UPDATE OR DELETE trigger
- Trigger name:
biscuit_auto_update - Fires FOR EACH ROW
- Calls
biscuit_trigger()function
Example:
SELECT biscuit_enable_triggers();
biscuit_disable_triggers()
Deactivates automatic index maintenance.
biscuit_disable_triggers() RETURNS TEXT
Returns: Status message
Use Cases:
- Bulk data operations
- Table maintenance
- Performance testing
- Data migration
Example:
SELECT biscuit_disable_triggers();
-- Perform bulk operations
SELECT biscuit_enable_triggers();
Query Functions
biscuit_match()
Primary query interface returning complete table rows.
biscuit_match(pattern TEXT) RETURNS SETOF table_type
Parameters:
pattern: Wildcard pattern (SQL LIKE syntax)
Returns: Complete rows matching pattern
Characteristics:
- Strongly-typed return
- Full SQL compatibility
- Automatic tombstone filtering
- Optimal for production use
Example:
SELECT * FROM biscuit_match('%@example.com');
SELECT id, name FROM biscuit_match('Product%') WHERE price > 100;
biscuit_match_rows()
Alternative query interface returning key-value pairs.
biscuit_match_rows(pattern TEXT) RETURNS TABLE(pk primary_key_type, value TEXT)
Parameters:
pattern: Wildcard pattern
Returns: Table of (primary_key, indexed_value) pairs
Example:
SELECT * FROM biscuit_match_rows('%error%');
biscuit_match_count()
Efficient count-only query.
biscuit_match_count(pattern TEXT) RETURNS INTEGER
Parameters:
pattern: Wildcard pattern
Returns: Count of matching records
Performance: Fastest query method (no tuple materialization)
Example:
SELECT biscuit_match_count('%gmail.com%');
biscuit_match_keys()
Low-level interface returning primary keys and values as text.
biscuit_match_keys(pattern TEXT) RETURNS TABLE(pk TEXT, value TEXT)
Parameters:
pattern: Wildcard pattern
Returns: Table of (pk_text, value_text) pairs
Note: Used internally by higher-level functions
Example:
SELECT pk::INTEGER, value FROM biscuit_match_keys('%pattern%');
Status and Diagnostic Functions
biscuit_index_status()
Comprehensive index status report.
biscuit_index_status() RETURNS TEXT
Returns: Formatted status report including:
- Table and column information
- Record counts (active, total, free, tombstoned)
- Memory usage
- CRUD statistics
- Lazy deletion metrics
- Optimization indicators
Example:
SELECT biscuit_index_status();
Sample Output:
========================================
Biscuit Index v3 - FIXED
========================================
Table: products
Column: name
Primary Key: id
Active Records: 998567
Total Slots: 1000000
Free Slots: 1433
Tombstoned Slots: 856
Max length: 128
Memory: 18.3 MB
----------------------------------------
CRUD Statistics:
Inserts: 1000000
Deletes: 1433
Updates: 125678 (incr: 89234, 71.0%)
Queries: 45892
----------------------------------------
Lazy Deletion Status:
Pending tombstones: 856 (59.7% of deletes)
Cleanup threshold: 1000
Total cleanups: 1
Items cleaned: 577
Query/Delete ratio: 32.02
----------------------------------------
biscuit_get_active_count()
Returns active record count.
biscuit_get_active_count() RETURNS INTEGER
Returns: Count of non-deleted records
Example:
SELECT biscuit_get_active_count();
-- Output: 998567
biscuit_get_free_slots()
Returns available slot count for reuse.
biscuit_get_free_slots() RETURNS INTEGER
Returns: Number of slots available for new inserts
Example:
SELECT biscuit_get_free_slots();
-- Output: 1433
biscuit_get_tombstone_count()
Returns pending tombstone count.
biscuit_get_tombstone_count() RETURNS INTEGER
Returns: Number of deleted records awaiting cleanup
Use Case: Monitor cleanup necessity
Example:
SELECT biscuit_get_tombstone_count();
-- Output: 856
biscuit_version()
Returns extension version.
biscuit_version() RETURNS TEXT
Returns: Version string
Example:
SELECT biscuit_version();
-- Output: 1.0.2-Biscuit
Maintenance Functions
biscuit_cleanup()
Manual tombstone cleanup.
biscuit_cleanup() RETURNS TEXT
Returns: Cleanup summary
Behavior:
- Removes tombstoned indices from all bitmaps
- Frees associated memory
- Resets tombstone counter
- Updates cleanup statistics
Performance: O(n × 256), n=tombstone_count
Example:
SELECT biscuit_cleanup();
Output:
Tombstone cleanup complete:
Cleaned: 856 tombstones
Remaining: 0
Total cleanups: 2
When to Use:
- Before performance-critical operations
- After bulk deletions
- When tombstone count exceeds threshold
- During maintenance windows
Internal Functions (Advanced)
biscuit_trigger()
Internal trigger function (do not call directly).
biscuit_trigger() RETURNS TRIGGER
Purpose: Maintains index consistency on DML operations
Called By: PostgreSQL trigger system
Handles:
- INSERT: Adds new records to index
- UPDATE: Applies incremental or full updates
- DELETE: Marks tombstones
biscuit_match_tuples()
Generic tuple retrieval (legacy).
biscuit_match_tuples(pattern TEXT) RETURNS SETOF RECORD
Note: Use biscuit_match() instead for type-safe queries
Function Error Handling
All functions implement comprehensive error handling:
Common Errors:
ERROR: Index not built- Callbiscuit_build_index()firstERROR: NULL primary key- Primary key cannot be NULLERROR: Column not found- Verify table/column namesERROR: Insufficient permissions- Requires table owner or superuser
Example Error:
SELECT * FROM biscuit_match('%pattern%');
-- ERROR: Index not built. Call biscuit_build_index() first.
Return Type Specifications
| Function | Return Type | Nullable | Notes | |–––––|———––|–––––|—––| | biscuit_match() | SETOF table | No | Type matches indexed table | | biscuit_match_rows() | TABLE(pk, TEXT) | No | PK type preserved | | biscuit_match_count() | INTEGER | No | Always returns count ≥ 0 | | biscuit_match_keys() | TABLE(TEXT, TEXT) | No | Both fields as text | | biscuit_index_status() | TEXT | No | Always returns report | | biscuit_get_*_count() | INTEGER | No | Returns 0 if no index | | biscuit_version() | TEXT | No | Version string |
Monitoring and Diagnostics
Key Performance Indicators
1. Index Health Metrics
Active Record Count
SELECT biscuit_get_active_count();
Interpretation:
- Should match table row count (excluding deleted)
- Discrepancy indicates synchronization issues
- Monitor trend for capacity planning
Free Slot Count
SELECT biscuit_get_free_slots();
Interpretation:
- Available slots for insert reuse
- High count after bulk deletes is normal
- Prevents capacity expansion overhead
Tombstone Count
SELECT biscuit_get_tombstone_count();
Interpretation:
- Pending deleted records
- Critical Threshold: >1000 (default cleanup trigger)
- Warning Threshold: >5000 (performance degradation)
- Action Required: >10000 (manual cleanup recommended)
2. Operational Statistics
Comprehensive Status
SELECT biscuit_index_status();
Key Sections:
A. Capacity Metrics
Active Records: 998567 ← Currently queryable records
Total Slots: 1000000 ← Allocated capacity
Free Slots: 1433 ← Available for reuse
Tombstoned Slots: 856 ← Pending cleanup
B. CRUD Performance
Inserts: 1000000
Deletes: 1433
Updates: 125678 (incr: 89234, 71.0%) ← Incremental update rate
Queries: 45892
Incremental Update Rate:
- >70%: Excellent (similar updates)
- 50-70%: Good (moderate similarity)
- <50%: Normal (diverse updates)
- <20%: Review update patterns
C. Lazy Deletion Health
Pending tombstones: 856 (59.7% of deletes)
Cleanup threshold: 1000
Total cleanups: 1
Items cleaned: 577
Query/Delete ratio: 32.02
Query/Delete Ratio:
- >20: Healthy (reads dominate, lazy deletion beneficial)
- 10-20: Balanced (cleanup overhead acceptable)
- <10: Write-heavy (consider more aggressive cleanup)
3. Memory Utilization
SELECT
(regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+) MB'))[1]::NUMERIC AS memory_mb;
Guidelines:
Memory Growth = f(record_count, avg_length, char_diversity)
Expected Ranges:
100K records: 2-5 MB
1M records: 15-25 MB
10M records: 150-250 MB
100M records: 1.5-2.5 GB
Anomalies:
>30% of expected: Check for memory leaks or fragmentation
<50% of expected: Excellent compression (sparse data)
Monitoring Queries
Real-Time Health Check
-- Comprehensive health dashboard
WITH metrics AS (
SELECT
biscuit_get_active_count() AS active,
biscuit_get_free_slots() AS free,
biscuit_get_tombstone_count() AS tombstones,
(regexp_matches(biscuit_index_status(), 'Total Slots: ([0-9]+)'))[1]::INTEGER AS capacity,
(regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC AS memory_mb
)
SELECT
active,
free,
tombstones,
capacity,
memory_mb,
ROUND(100.0 * active / capacity, 2) AS utilization_pct,
ROUND(100.0 * tombstones / NULLIF(capacity - active, 0), 2) AS tombstone_ratio,
CASE
WHEN tombstones > 10000 THEN 'CRITICAL - Cleanup Required'
WHEN tombstones > 5000 THEN 'WARNING - Monitor Closely'
WHEN tombstones > 1000 THEN 'NOTICE - Cleanup Soon'
ELSE 'HEALTHY'
END AS health_status
FROM metrics;
Sample Output:
active | free | tombstones | capacity | memory_mb | utilization_pct | tombstone_ratio | health_status
---------+------+------------+----------+-----------+-----------------+-----------------+---------------
998567 | 1433 | 856 | 1000000 | 18.30 | 99.86 | 59.70 | HEALTHY
Performance Degradation Detection
-- Detect query performance issues
WITH baseline AS (
SELECT 2.1 AS expected_ms -- Baseline query time
),
current AS (
SELECT
biscuit_get_tombstone_count() AS tombstones,
-- Simulate query time (in production, measure actual queries)
2.1 + (biscuit_get_tombstone_count()::NUMERIC / 1000) * 0.2 AS estimated_ms
)
SELECT
tombstones,
estimated_ms,
ROUND(((estimated_ms - expected_ms) / expected_ms) * 100, 2) AS degradation_pct,
CASE
WHEN estimated_ms > expected_ms * 2 THEN 'ACTION REQUIRED'
WHEN estimated_ms > expected_ms * 1.5 THEN 'DEGRADED'
WHEN estimated_ms > expected_ms * 1.2 THEN 'ELEVATED'
ELSE 'NORMAL'
END AS performance_status
FROM current, baseline;
Capacity Planning
-- Project capacity needs
WITH stats AS (
SELECT
biscuit_get_active_count() AS current_active,
(regexp_matches(biscuit_index_status(), 'Total Slots: ([0-9]+)'))[1]::INTEGER AS capacity,
(regexp_matches(biscuit_index_status(), 'Inserts: ([0-9]+)'))[1]::BIGINT AS total_inserts,
(regexp_matches(biscuit_index_status(), 'Deletes: ([0-9]+)'))[1]::BIGINT AS total_deletes
),
growth AS (
SELECT
current_active,
capacity,
total_inserts - total_deletes AS net_growth,
CASE
WHEN total_deletes > 0 THEN total_inserts::NUMERIC / total_deletes
ELSE NULL
END AS insert_delete_ratio
FROM stats
)
SELECT
current_active,
capacity,
capacity - current_active AS available,
ROUND(100.0 * current_active / capacity, 2) AS utilization_pct,
net_growth,
insert_delete_ratio,
CASE
WHEN current_active::NUMERIC / capacity > 0.9 THEN 'EXPAND SOON'
WHEN current_active::NUMERIC / capacity > 0.8 THEN 'MONITOR'
ELSE 'ADEQUATE'
END AS capacity_status
FROM growth;
Alerting Thresholds
Critical Alerts
-- Immediate action required
DO $
DECLARE
tombstone_count INTEGER;
active_count INTEGER;
capacity INTEGER;
BEGIN
tombstone_count := biscuit_get_tombstone_count();
active_count := biscuit_get_active_count();
capacity := (regexp_matches(biscuit_index_status(), 'Total Slots: ([0-9]+)'))[1]::INTEGER;
IF tombstone_count > 10000 THEN
RAISE WARNING 'CRITICAL: Tombstone count % exceeds 10,000. Run biscuit_cleanup() immediately.', tombstone_count;
END IF;
IF active_count::NUMERIC / capacity > 0.95 THEN
RAISE WARNING 'CRITICAL: Index utilization %.2f%% exceeds 95%%. Capacity expansion imminent.',
100.0 * active_count / capacity;
END IF;
END $;
Warning Alerts
-- Monitor closely, plan maintenance
DO $
DECLARE
tombstone_count INTEGER;
memory_mb NUMERIC;
BEGIN
tombstone_count := biscuit_get_tombstone_count();
memory_mb := (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC;
IF tombstone_count BETWEEN 5000 AND 10000 THEN
RAISE NOTICE 'WARNING: Tombstone count % in elevated range. Schedule cleanup.', tombstone_count;
END IF;
-- Assuming 1M records should use ~20MB
IF memory_mb > 30 THEN
RAISE NOTICE 'WARNING: Memory usage %.2f MB higher than expected. Check for leaks.', memory_mb;
END IF;
END $;
Diagnostic Procedures
Issue: Slow Query Performance
Diagnosis:
-- Check tombstone overhead
SELECT biscuit_get_tombstone_count();
-- Check memory usage
SELECT (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC;
-- Review pattern complexity
EXPLAIN ANALYZE SELECT * FROM biscuit_match('%complex%pattern%');
Resolution:
-- If tombstones > 5000
SELECT biscuit_cleanup();
-- If memory excessive
SELECT biscuit_build_index('table', 'column', 'pk'); -- Rebuild
Issue: Index Out of Sync
Diagnosis:
-- Compare counts
SELECT
COUNT(*) AS table_count,
biscuit_get_active_count() AS index_count
FROM your_table;
-- Check trigger status
SELECT tgname, tgenabled
FROM pg_trigger
WHERE tgname = 'biscuit_auto_update';
Resolution:
-- Rebuild index
SELECT biscuit_build_index('table', 'column', 'pk');
SELECT biscuit_enable_triggers();
Issue: High Memory Usage
Diagnosis:
-- Detailed memory breakdown
SELECT biscuit_index_status();
-- Check for long strings
SELECT MAX(LENGTH(column)), AVG(LENGTH(column)) FROM table;
Resolution:
-- Rebuild to compact
SELECT biscuit_build_index('table', 'column', 'pk');
-- Consider limiting indexed length (future enhancement)
Logging and Audit Trail
Enable PostgreSQL Logging
-- Log slow queries
SET log_min_duration_statement = 100; -- Log queries >100ms
-- Log BISCUIT operations
SET client_min_messages = NOTICE;
-- View logs
SELECT * FROM pg_stat_statements
WHERE query LIKE '%biscuit%'
ORDER BY mean_exec_time DESC;
Custom Monitoring Table
-- Create monitoring log
CREATE TABLE biscuit_metrics (
logged_at TIMESTAMPTZ DEFAULT NOW(),
active_count INTEGER,
free_slots INTEGER,
tombstone_count INTEGER,
memory_mb NUMERIC,
query_count BIGINT,
insert_count BIGINT,
update_count BIGINT,
delete_count BIGINT
);
-- Periodic logging (via cron or pg_cron)
INSERT INTO biscuit_metrics
SELECT
NOW(),
biscuit_get_active_count(),
biscuit_get_free_slots(),
biscuit_get_tombstone_count(),
(regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC,
(regexp_matches(biscuit_index_status(), 'Queries: ([0-9]+)'))[1]::BIGINT,
(regexp_matches(biscuit_index_status(), 'Inserts: ([0-9]+)'))[1]::BIGINT,
(regexp_matches(biscuit_index_status(), 'Updates: ([0-9]+)'))[1]::BIGINT,
(regexp_matches(biscuit_index_status(), 'Deletes: ([0-9]+)'))[1]::BIGINT;
-- Analyze trends
SELECT
DATE_TRUNC('hour', logged_at) AS hour,
AVG(tombstone_count) AS avg_tombstones,
MAX(tombstone_count) AS max_tombstones,
AVG(memory_mb) AS avg_memory
FROM biscuit_metrics
WHERE logged_at > NOW() - INTERVAL '7 days'
GROUP BY 1
ORDER BY 1 DESC;
Optimization Guidelines
Query Optimization
1. Pattern Design Best Practices
Optimal Patterns:
-- ✓ GOOD: Specific prefix
SELECT * FROM biscuit_match('user_admin%');
-- ✓ GOOD: Specific suffix
SELECT * FROM biscuit_match('%@company.com');
-- ✓ GOOD: Single containment
SELECT * FROM biscuit_match('%error%');
-- ⚠ ACCEPTABLE: Two-part pattern
SELECT * FROM biscuit_match('%user%admin%');
-- ⚠ SLOWER: Three-part pattern
SELECT * FROM biscuit_match('%a%b%c%');
-- ✗ AVOID: Excessive parts (4+)
SELECT * FROM biscuit_match('%a%b%c%d%e%');
Pattern Complexity Impact:
Parts Query Time Recommendation
----- ---------- --------------
1 ~2ms Optimal
2 ~5ms Excellent
3 ~12ms Good
4 ~25ms Acceptable
5+ ~50ms+ Avoid if possible
2. Query Structure Optimization
Use Count for Existence Checks:
-- ✗ AVOID: Materializing entire result set
SELECT COUNT(*) FROM biscuit_match('%pattern%');
-- ✓ BETTER: Direct count
SELECT biscuit_match_count('%pattern%');
Apply Filters After Pattern Match:
-- ✓ OPTIMAL: Pattern match first, then filter
SELECT * FROM biscuit_match('%@gmail.com%')
WHERE created_at > '2024-01-01'
AND status = 'active';
-- ✗ SUBOPTIMAL: Complex joins before pattern match
SELECT *
FROM complex_view v
WHERE v.email LIKE '%@gmail.com%'; -- Falls back to sequential scan
Leverage Result Reuse:
-- ✓ OPTIMAL: Use CTE for multiple operations on same pattern
WITH matches AS (
SELECT * FROM biscuit_match('%search%')
)
SELECT
(SELECT COUNT(*) FROM matches) AS total,
(SELECT COUNT(*) FROM matches WHERE category = 'A') AS category_a,
(SELECT AVG(price) FROM matches) AS avg_price;
3. Index Selectivity Considerations
High Selectivity (Few Matches):
-- Excellent: ~0.1% selectivity
SELECT * FROM biscuit_match('unique_prefix_%');
Low Selectivity (Many Matches):
-- Consider adding filters or reformulating query
SELECT * FROM biscuit_match('%') -- Returns everything
WHERE additional_criteria;
Selectivity Guidelines:
Match % Query Performance Recommendation
------- ----------------- --------------
<1% Excellent Ideal use case
1-10% Very Good Well-suited
10-30% Good Acceptable
30-50% Fair Add filters
>50% Poor Reconsider approach
CRUD Optimization
1. Bulk Insert Strategy
Small Batches (<1000 rows):
-- Keep triggers enabled
INSERT INTO products (id, name) VALUES
(1, 'Product 1'),
(2, 'Product 2'),
...
(1000, 'Product 1000');
-- Index updates happen automatically
Large Batches (>10,000 rows):
-- Disable triggers, bulk insert, rebuild
SELECT biscuit_disable_triggers();
COPY products FROM '/data/products.csv' CSV;
-- or
INSERT INTO products SELECT * FROM staging_table;
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
Performance Comparison:
10,000 records:
With triggers: ~2 seconds
Disable/rebuild: ~1 second (50% faster)
100,000 records:
With triggers: ~20 seconds
Disable/rebuild: ~8 seconds (60% faster)
1,000,000 records:
With triggers: ~220 seconds
Disable/rebuild: ~85 seconds (61% faster)
2. Update Optimization
Maximize Incremental Updates:
-- ✓ GOOD: Same-length updates (incremental)
UPDATE users SET email = 'newemail@example.com' -- 21 chars
WHERE email = 'oldemail@example.com'; -- 21 chars
-- ⚠ ACCEPTABLE: Different-length (full reindex)
UPDATE users SET email = 'verylongemail@example.com' -- 27 chars
WHERE email = 'short@ex.co'; -- 11 chars
Batch Updates:
-- For large update batches
SELECT biscuit_disable_triggers();
UPDATE products SET name = UPPER(name);
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
3. Delete Strategy
Individual Deletes:
-- Lazy deletion is O(1), no optimization needed
DELETE FROM products WHERE id = 123;
Bulk Deletes:
-- Option 1: Let lazy deletion handle (recommended for <10K deletes)
DELETE FROM products WHERE category = 'obsolete';
-- Option 2: Disable/rebuild (for >100K deletes)
SELECT biscuit_disable_triggers();
DELETE FROM products WHERE created_at < '2020-01-01';
SELECT biscuit_build_index('products', 'name', 'id');
SELECT biscuit_enable_triggers();
Cleanup Scheduling:
-- Schedule periodic cleanup during low-traffic hours
-- Example: Every night at 2 AM via pg_cron
SELECT cron.schedule('biscuit-cleanup', '0 2 * * *',
'SELECT biscuit_cleanup()');
Memory Optimization
1. Monitor Memory Growth
-- Track memory over time
SELECT
NOW() AS checked_at,
biscuit_get_active_count() AS records,
(regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC AS mb,
ROUND((regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC
/ NULLIF(biscuit_get_active_count(), 0) * 1024, 2) AS kb_per_record;
2. Memory Reduction Strategies
Rebuild Compaction:
-- Periodic rebuild to compact fragmentation
SELECT biscuit_build_index('table', 'column', 'pk');
Character Diversity Reduction (Application Level):
-- Normalize data before indexing
ALTER TABLE users ADD COLUMN email_normalized TEXT;
UPDATE users SET email_normalized = LOWER(email);
SELECT biscuit_setup('users', 'email_normalized', 'id');
Maintenance Optimization
1. Tombstone Management
Adaptive Threshold Tuning:
-- Current threshold: 1000 (default)
-- Adjust based on workload:
-- Read-heavy (Q/D ratio > 20): Increase threshold
-- Write-heavy (Q/D ratio < 10): Decrease threshold
-- Balanced: Keep default
-- Manual threshold adjustment (via C code recompilation)
#define TOMBSTONE_CLEANUP_THRESHOLD 5000 // Adjust value
Proactive Cleanup:
-- Before performance-critical operations
SELECT biscuit_cleanup();
-- Before maintenance windows
SELECT biscuit_cleanup();
2. Index Rebuild Strategy
When to Rebuild:
- After bulk operations (>10% of table size)
- Memory usage exceeds expectations by >50%
- After schema changes
- Periodic maintenance (monthly/quarterly)
Rebuild Procedure:
BEGIN;
SELECT biscuit_disable_triggers();
-- Perform any table maintenance
SELECT biscuit_build_index('table', 'column', 'pk');
SELECT biscuit_enable_triggers();
COMMIT;
Concurrency Optimization
1. Read Concurrency
Queries are lock-free:
-- Multiple concurrent queries: No contention
SELECT * FROM biscuit_match('%pattern1%'); -- Session 1
SELECT * FROM biscuit_match('%pattern2%'); -- Session 2
SELECT * FROM biscuit_match('%pattern3%'); -- Session 3
-- All execute concurrently without blocking
2. Write Concurrency
Minimize trigger overhead:
-- ✓ OPTIMAL: Batch small inserts
INSERT INTO products SELECT * FROM new_products_batch;
-- ⚠ SUBOPTIMAL: Individual inserts in loop
-- Each insert triggers index update
Transaction Batching:
-- Group related operations
BEGIN;
INSERT INTO products VALUES (...);
UPDATE products SET name = ... WHERE id = ...;
DELETE FROM products WHERE id = ...;
COMMIT;
-- All index updates happen at commit
Performance Testing
Benchmark Query Performance
-- Timing template
\timing on
SELECT * FROM biscuit_match('%pattern%');
\timing off
-- With EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM biscuit_match('%pattern%');
Compare with Traditional Methods
-- BISCUIT
\timing on
SELECT COUNT(*) FROM biscuit_match('%search%');
\timing off
-- Traditional LIKE
\timing on
SELECT COUNT(*) FROM table WHERE column LIKE '%search%';
\timing off
-- GIN Index
\timing on
SELECT COUNT(*) FROM table WHERE column ILIKE '%search%';
\timing off
-- ✗ NOT SUPPORTED
SELECT * FROM biscuit_match('[0-9]+');
SELECT * FROM biscuit_match('(abc|xyz)');
-- ✓ EQUIVALENT WORKAROUNDS
SELECT * FROM biscuit_match('abc%')
UNION
SELECT * FROM biscuit_match('xyz%');
Architectural Constraints
1. Memory Residency
Index Location: Shared memory (non-persistent)
Implications:
- Index lost on server restart
- Must rebuild after restart
- No automatic persistence
Startup Procedure:
-- Add to database initialization script
\c your_database
SELECT biscuit_setup('table1', 'column1', 'pk1');
SELECT biscuit_setup('table2', 'column2', 'pk2');
2. Single Index Instance
Limitation: Only one BISCUIT index per PostgreSQL instance
Implications:
-- ✗ Cannot maintain multiple indexes simultaneously
SELECT biscuit_setup('table1', 'col1', 'id');
SELECT biscuit_setup('table2', 'col2', 'id'); -- Replaces table1 index
-- ✓ Workaround: Separate databases
CREATE DATABASE db1;
\c db1
SELECT biscuit_setup('table', 'column', 'id');
CREATE DATABASE db2;
\c db2
SELECT biscuit_setup('table', 'column', 'id');
3. Transaction Isolation
Current Behavior: Index updates commit with transaction
Limitation: No snapshot isolation for queries
BEGIN;
INSERT INTO products VALUES (1, 'New Product');
-- Index updated in current transaction
SELECT * FROM biscuit_match('%New%'); -- Sees new product
ROLLBACK;
-- Index reverted (product removed)
SELECT * FROM biscuit_match('%New%'); -- No longer sees product
Note: This is expected behavior and consistent with PostgreSQL’s MVCC model at the trigger level.
4. Single Column Per Index
Current Design: One index per table, one column per index
-- ✗ Cannot index multiple columns simultaneously
-- Must choose one column
-- ✓ Workaround: Concatenated column
ALTER TABLE products ADD COLUMN search_text TEXT
GENERATED ALWAYS AS (name || ' ' || description || ' ' || sku) STORED;
SELECT biscuit_setup('products', 'search_text', 'id');
5. NULL Handling
Behavior: NULL values treated as empty strings
INSERT INTO products (id, name) VALUES (1, NULL);
-- NULL treated as ''
SELECT * FROM biscuit_match('%'); -- Includes row 1
SELECT * FROM biscuit_match('%%'); -- Includes row 1
SELECT biscuit_match_count(''); -- Counts row 1 if name is NULL
Performance Limitations
1. Pattern Complexity Scaling
Complexity: O(k × m), k = pattern parts, m = search window
-- Linear growth in query time with pattern parts
SELECT biscuit_match('%a%'); -- ~2ms
SELECT biscuit_match('%a%b%'); -- ~5ms
SELECT biscuit_match('%a%b%c%'); -- ~12ms
SELECT biscuit_match('%a%b%c%d%'); -- ~25ms
SELECT biscuit_match('%a%b%c%d%e%'); -- ~52ms
Recommendation: Limit patterns to 3-4 parts for optimal performance
2. Write Concurrency
Limitation: Trigger-based updates serialize at row level
Impact:
High-concurrency writes (>10K/sec):
May experience contention
Trigger overhead accumulates
Solution:
Batch writes
Disable triggers for bulk operations
Consider application-level queue
3. Memory Growth
Behavior: Memory grows with:
- Record count
- String length diversity
- Character diversity
- Tombstone accumulation
-- Memory bounds
Minimum: 10% of indexed data size
Maximum: 50% of indexed data size
Typical: 20-25% of indexed data size
Mitigation:
-- Periodic rebuild for compaction
SELECT biscuit_build_index('table', 'column', 'pk');
-- Cleanup tombstones
SELECT biscuit_cleanup();
Compatibility Constraints
1. PostgreSQL Version
Minimum: PostgreSQL 11.0
Tested Versions:
- PostgreSQL 11.x: ✓ Supported
- PostgreSQL 12.x: ✓ Supported
- PostgreSQL 13.x: ✓ Supported
- PostgreSQL 14.x: ✓ Supported
- PostgreSQL 15.x: ✓ Supported
- PostgreSQL 16.x: ✓ Supported (expected)
2. Data Type Support
Primary Key Types:
- Fully Supported: INT2, INT4, INT8, TEXT, VARCHAR, CHAR, UUID
- Generic Support: All types with output functions
- Not Supported: Composite types, arrays (without custom casting)
Indexed Column Types:
- Required: TEXT, VARCHAR, CHAR, or castable to TEXT
- Not Supported: Binary types (BYTEA), JSON, arrays without casting
3. Extension Dependencies
Required:
- PostgreSQL server-side C API
- Standard PostgreSQL libraries
Optional:
- CRoaring library (for optimized bitmaps)
- Falls back to built-in implementation if unavailable
Security Considerations
1. Permissions
Required Privileges:
-- Index creation requires
GRANT CREATE ON DATABASE TO user; -- For extension
GRANT ALL ON TABLE target_table TO user; -- For triggers
-- Query execution requires
GRANT SELECT ON TABLE target_table TO user;
2. SQL Injection
Safe Query Patterns:
-- ✓ SAFE: Parameterized queries
PREPARE search_stmt AS
SELECT * FROM biscuit_match($1);
EXECUTE search_stmt('%user_input%');
-- ✓ SAFE: Application sanitization
pattern = sanitize_input(user_input)
query = f"SELECT * FROM biscuit_match('{pattern}')"
Unsafe Patterns:
-- ✗ UNSAFE: Direct string concatenation
SELECT * FROM biscuit_match('%' || user_input || '%');
3. Resource Exhaustion
Potential Vectors:
- Complex patterns (>5 parts): High CPU usage
- Very long patterns: Memory allocation
- Rapid index rebuilds: Memory fragmentation
Mitigation:
-- Rate limit index rebuilds
-- Validate pattern complexity before execution
-- Set statement timeout
SET statement_timeout = '5s';
Known Issues
1. Index Persistence
Issue: Index not persistent across restarts
Status: Design limitation (in-memory structure)
Workaround: Rebuild on startup via initialization script
2. Multiple Index Support
Issue: Only one index per PostgreSQL instance
Status: Architectural limitation
Workaround: Use multiple databases or implement queuing
3. Incremental Update Threshold
Issue: Updates with >20% character changes trigger full reindex
Status: Configurable threshold (requires C code modification)
Workaround: Design application to minimize large updates
Future Enhancements (Roadmap)
Planned:
- Persistent index storage
- Multiple concurrent indexes
- Case-insensitive mode option
- Configurable position limit (>256 chars)
- Pattern query optimizer
- Index statistics collection
Under Consideration:
- Fuzzy matching support
- Phonetic matching
- Unicode normalization
- Multi-column composite indexes
- Parallel query execution
- Incremental checkpoint/restore
Troubleshooting
Common Issues and Solutions
Issue 1: “Index not built” Error
Symptom:
SELECT * FROM biscuit_match('%pattern%');
ERROR: Index not built. Call biscuit_build_index() first.
Cause: Index not initialized or server restarted
Solution:
-- Option 1: Quick setup
SELECT biscuit_setup('table_name', 'column_name', 'pk_column');
-- Option 2: Manual build
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
SELECT biscuit_create_match_function();
SELECT biscuit_enable_triggers();
Prevention:
-- Add to database startup script
-- File: /etc/postgresql/14/main/init_biscuit.sql
\c your_database
SELECT biscuit_setup('table1', 'column1', 'pk1');
Issue 2: Query Returns No Results
Symptom:
SELECT biscuit_match_count('%known_value%');
-- Returns: 0 (expected >0)
Diagnosis:
-- Check if value exists in table
SELECT COUNT(*) FROM table_name WHERE column_name LIKE '%known_value%';
-- Check index status
SELECT biscuit_index_status();
-- Check trigger status
SELECT tgname, tgenabled FROM pg_trigger WHERE tgname = 'biscuit_auto_update';
Possible Causes and Solutions:
A. Index Out of Sync
-- Rebuild index
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
B. Case Sensitivity
-- Pattern is case-sensitive
SELECT * FROM biscuit_match('%VALUE%'); -- Won't match "value"
-- Solution: Normalize
SELECT * FROM biscuit_match(LOWER('%VALUE%'));
C. Pattern Beyond 256 Characters
-- Match occurs after position 256
SELECT LENGTH(column_name) FROM table_name WHERE id = 123;
-- Returns: 280
-- Solution: Pattern must be within first 256 chars
Issue 3: Slow Query Performance
Symptom:
\timing on
SELECT * FROM biscuit_match('%pattern%');
Time: 850.234 ms
-- Expected: <10ms
Diagnosis:
-- Check tombstone count
SELECT biscuit_get_tombstone_count();
-- If >5000: Cleanup needed
-- Check pattern complexity
-- Count % symbols
SELECT regexp_count('%a%b%c%d%', '%');
-- If >4: Pattern too complex
-- Check memory usage
SELECT biscuit_index_status();
Solutions:
A. High Tombstone Count
SELECT biscuit_cleanup();
-- Should restore normal performance
B. Complex Pattern
-- Simplify pattern
-- From: SELECT * FROM biscuit_match('%a%b%c%d%e%');
-- To: SELECT * FROM biscuit_match('%abc%de%');
C. Large Result Set
-- Check selectivity
SELECT biscuit_match_count('%pattern%');
-- If >50% of table: Not ideal for index
-- Solution: Add additional filters
SELECT * FROM biscuit_match('%pattern%')
WHERE created_at > '2024-01-01';
Issue 4: High Memory Usage
Symptom:
SELECT biscuit_index_status();
-- Memory: 500 MB (expected ~20 MB for dataset)
Diagnosis:
-- Check record count
SELECT biscuit_get_active_count();
-- Check string characteristics
SELECT
COUNT(*) AS records,
AVG(LENGTH(column_name)) AS avg_length,
MAX(LENGTH(column_name)) AS max_length,
COUNT(DISTINCT column_name) AS unique_values
FROM table_name;
-- Check tombstone accumulation
SELECT biscuit_get_tombstone_count();
Solutions:
A. Tombstone Accumulation
SELECT biscuit_cleanup();
-- Then check memory again
B. Memory Fragmentation
-- Rebuild to compact
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
C. Unexpected Data Characteristics
-- Very long strings or high diversity
-- Consider normalizing data
UPDATE table_name SET column_name = LEFT(column_name, 200);
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
Issue 5: Trigger Not Firing
Symptom:
INSERT INTO table_name (id, column_name) VALUES (999, 'test');
SELECT * FROM biscuit_match('%test%');
-- Returns: no rows (expected 1 row)
Diagnosis:
-- Check trigger existence
SELECT tgname, tgenabled, tgisinternal
FROM pg_trigger
WHERE tgrelid = 'table_name'::regclass
AND tgname = 'biscuit_auto_update';
-- Check for errors in logs
SHOW log_destination;
-- Review PostgreSQL logs for trigger errors
Solutions:
A. Trigger Disabled
SELECT biscuit_enable_triggers();
B. Trigger Missing
SELECT biscuit_setup('table_name', 'column_name', 'pk_column');
-- Or manually
SELECT biscuit_enable_triggers();
C. Trigger Error
-- Check PostgreSQL logs
-- Common causes:
-- - NULL primary key
-- - Type conversion error
-- - Memory allocation failure
-- Rebuild to reset
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
SELECT biscuit_enable_triggers();
Issue 6: “NULL primary key” Error
Symptom:
INSERT INTO table_name (column_name) VALUES ('test');
ERROR: NULL primary key in INSERT
Cause: Primary key column contains NULL
Solution:
A. Use Serial/Identity Column
ALTER TABLE table_name ALTER COLUMN id ADD GENERATED ALWAYS AS IDENTITY;
B. Set Default Value
ALTER TABLE table_name ALTER COLUMN id SET DEFAULT nextval('table_name_id_seq');
C. Ensure Non-NULL Constraint
ALTER TABLE table_name ALTER COLUMN id SET NOT NULL;
Issue 7: Out of Memory
Symptom:
SELECT biscuit_build_index('large_table', 'text_column', 'id');
ERROR: out of memory
Diagnosis:
-- Check table size
SELECT
pg_size_pretty(pg_total_relation_size('large_table')) AS table_size,
COUNT(*) AS row_count,
AVG(LENGTH(text_column)) AS avg_length
FROM large_table;
-- Check available memory
SHOW shared_buffers;
SHOW work_mem;
Solutions:
A. Increase Memory Allocation
-- In postgresql.conf
shared_buffers = 2GB -- Increase from default
work_mem = 256MB -- Increase from default
-- Restart PostgreSQL
sudo systemctl restart postgresql
B. Partition Table
-- Create partitioned table
CREATE TABLE large_table_partitioned (
id SERIAL PRIMARY KEY,
text_column TEXT,
created_at DATE
) PARTITION BY RANGE (created_at);
-- Create partitions
CREATE TABLE large_table_2023 PARTITION OF large_table_partitioned
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
-- Index each partition separately
SELECT biscuit_setup('large_table_2023', 'text_column', 'id');
C. Reduce String Diversity
-- Normalize before indexing
UPDATE large_table SET text_column = LOWER(text_column);
SELECT biscuit_build_index('large_table', 'text_column', 'id');
Issue 8: Index-Table Mismatch
Symptom:
SELECT COUNT(*) FROM table_name;
-- Returns: 100000
SELECT biscuit_get_active_count();
-- Returns: 95000 (mismatch!)
Diagnosis:
-- Check trigger status
SELECT tgenabled FROM pg_trigger WHERE tgname = 'biscuit_auto_update';
-- Check for errors
SELECT biscuit_index_status();
-- Manual comparison
WITH table_data AS (
SELECT COUNT(*) AS cnt FROM table_name
),
index_data AS (
SELECT biscuit_get_active_count() AS cnt
)
SELECT
t.cnt AS table_count,
i.cnt AS index_count,
t.cnt - i.cnt AS discrepancy
FROM table_data t, index_data i;
Solutions:
A. Rebuild Index
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
SELECT biscuit_enable_triggers();
B. Check for Concurrent Modifications
-- Ensure exclusive access during rebuild
BEGIN;
LOCK TABLE table_name IN EXCLUSIVE MODE;
SELECT biscuit_build_index('table_name', 'column_name', 'pk_column');
COMMIT;
Diagnostic Queries
Comprehensive Health Check
DO $
DECLARE
table_count BIGINT;
index_count INTEGER;
tombstone_count INTEGER;
free_slots INTEGER;
memory_mb NUMERIC;
BEGIN
EXECUTE 'SELECT COUNT(*) FROM your_table' INTO table_count;
index_count := biscuit_get_active_count();
tombstone_count := biscuit_get_tombstone_count();
free_slots := biscuit_get_free_slots();
memory_mb := (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC;
RAISE NOTICE 'BISCUIT Health Check:';
RAISE NOTICE ' Table rows: %', table_count;
RAISE NOTICE ' Index active: %', index_count;
RAISE NOTICE ' Discrepancy: %', table_count - index_count;
RAISE NOTICE ' Tombstones: %', tombstone_count;
RAISE NOTICE ' Free slots: %', free_slots;
RAISE NOTICE ' Memory: % MB', memory_mb;
IF table_count != index_count THEN
RAISE WARNING 'Index out of sync! Rebuild recommended.';
END IF;
IF tombstone_count > 5000 THEN
RAISE WARNING 'High tombstone count! Cleanup recommended.';
END IF;
IF memory_mb > (index_count::NUMERIC / 1000000 * 30) THEN
RAISE WARNING 'High memory usage! Check for issues.';
END IF;
END $;
Performance Baseline
-- Establish performance baseline
\timing on
SELECT biscuit_match_count('%'); -- Baseline: all records
SELECT biscuit_match_count('%common_char%'); -- Baseline: common pattern
SELECT biscuit_match_count('prefix%'); -- Baseline: prefix
SELECT biscuit_match_count('%suffix'); -- Baseline: suffix
\timing off
-- Save baselines and compare periodically
Memory Leak Detection
-- Monitor memory over operations
SELECT (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC AS mb_before;
-- Perform operations
INSERT INTO table_name SELECT generate_series(1, 10000), 'test' || generate_series(1, 10000);
DELETE FROM table_name WHERE id > 50000;
SELECT biscuit_cleanup();
-- Check memory after
SELECT (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC AS mb_after;
-- Memory should be stable or decrease after cleanup
Error Code Reference
| Error Message | Cause | Solution |
|–––––––|—––|–––––|
| Index not built | No index initialized | Run biscuit_setup() |
| NULL primary key | PK value is NULL | Ensure PK constraint |
| Column not found | Invalid column name | Verify table schema |
| Insufficient permissions | User lacks privileges | Grant table permissions |
| Out of memory | Insufficient RAM | Increase shared_buffers |
| Table not found | Invalid table name | Check table existence |
| Index already exists | Duplicate index | Drop and rebuild |
| Trigger already exists | Duplicate trigger | Use biscuit_disable_triggers() first |
Getting Help
Before Seeking Support:
- Run comprehensive health check (above)
- Review PostgreSQL logs for errors
- Check extension version:
SELECT biscuit_version() - Verify PostgreSQL version:
SELECT version() - Document reproduction steps
Support Channels:
- GitHub Issues: [Project Repository]
- Mailing List: [Contact Information]
- Documentation: [Online Docs URL]
Information to Provide:
- Extension version
- PostgreSQL version
- Operating system
- Table schema
- Reproduction steps
- Error messages
- Output of
SELECT biscuit_index_status()
Examples and Use Cases
Example 1: E-Commerce Product Search
Scenario: Product catalog with 5M products, need fast SKU and name searches
Schema:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
sku VARCHAR(50) NOT NULL,
name TEXT NOT NULL,
description TEXT,
category VARCHAR(100),
price NUMERIC(10, 2),
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Sample data
INSERT INTO products (sku, name, category, price) VALUES
('ELEC-LAP-001', 'Dell XPS 15 Laptop', 'Electronics', 1299.99),
('ELEC-LAP-002', 'MacBook Pro 16"', 'Electronics', 2499.99),
('HOME-FURN-001', 'Modern Office Chair', 'Furniture', 349.99),
('HOME-FURN-002', 'Standing Desk', 'Furniture', 599.99);
BISCUIT Setup:
-- Index product names
SELECT biscuit_setup('products', 'name', 'id');
Query Examples:
-- Search by product name
SELECT * FROM biscuit_match('%Laptop%');
-- Case-insensitive search (with normalized column)
ALTER TABLE products ADD COLUMN name_lower TEXT
GENERATED ALWAYS AS (LOWER(name)) STORED;
SELECT biscuit_setup('products', 'name_lower', 'id');
SELECT * FROM biscuit_match('%laptop%');
-- Complex search
SELECT * FROM biscuit_match('%office%chair%')
WHERE price BETWEEN 200 AND 500;
-- Count matches
SELECT biscuit_match_count('%macbook%');
-- Top sellers matching pattern
SELECT p.*, s.sales_count
FROM biscuit_match('%laptop%') p
JOIN sales_summary s ON p.id = s.product_id
ORDER BY s.sales_count DESC
LIMIT 10;
Performance Comparison:
-- Traditional LIKE (sequential scan)
\timing on
SELECT COUNT(*) FROM products WHERE name LIKE '%Laptop%';
Time: 1850 ms
-- BISCUIT
SELECT biscuit_match_count('%Laptop%');
Time: 3.2 ms
-- Speedup: 578×
Example 2: Customer Email Domain Analysis
Scenario: Analyze customer email patterns, identify providers, find similar domains
Schema:
CREATE TABLE customers (
customer_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email TEXT NOT NULL,
first_name TEXT,
last_name TEXT,
signup_date DATE DEFAULT CURRENT_DATE,
status VARCHAR(20) DEFAULT 'active'
);
-- Sample data
INSERT INTO customers (email, first_name, last_name) VALUES
('john.doe@gmail.com', 'John', 'Doe'),
('jane.smith@yahoo.com', 'Jane', 'Smith'),
('bob.wilson@company.com', 'Bob', 'Wilson'),
('alice.brown@gmail.com', 'Alice', 'Brown');
BISCUIT Setup:
-- Note: UUID primary key fully supported
SELECT biscuit_setup('customers', 'email', 'customer_id');
Query Examples:
-- Find all Gmail users
SELECT * FROM biscuit_match('%@gmail.com');
-- Find all corporate emails (non-freemail)
SELECT *
FROM biscuit_match('%@%.com')
WHERE email NOT LIKE '%@gmail.com'
AND email NOT LIKE '%@yahoo.com'
AND email NOT LIKE '%@hotmail.com';
-- Domain distribution analysis
WITH email_domains AS (
SELECT
customer_id,
email,
SUBSTRING(email FROM '@(.*)) AS domain
FROM biscuit_match('%')
)
SELECT
domain,
COUNT(*) AS customer_count,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) AS percentage
FROM email_domains
GROUP BY domain
ORDER BY customer_count DESC
LIMIT 10;
-- Find emails with specific patterns
SELECT * FROM biscuit_match('%.doe@%'); -- All .doe emails
SELECT * FROM biscuit_match('admin%@%'); -- Starts with admin
SELECT * FROM biscuit_match('%+%@%'); -- Contains plus addressing
Business Intelligence Query:
-- Signup trends by email provider
WITH gmail_users AS (
SELECT signup_date
FROM biscuit_match('%@gmail.com')
),
yahoo_users AS (
SELECT signup_date
FROM biscuit_match('%@yahoo.com')
)
SELECT
DATE_TRUNC('month', signup_date) AS month,
'Gmail' AS provider,
COUNT(*) AS signups
FROM gmail_users
GROUP BY 1
UNION ALL
SELECT
DATE_TRUNC('month', signup_date) AS month,
'Yahoo' AS provider,
COUNT(*) AS signups
FROM yahoo_users
GROUP BY 1
ORDER BY month DESC, provider;
Example 3: Log Analysis and Error Detection
Scenario: 50M log entries, need to quickly find error patterns and anomalies
Schema:
CREATE TABLE application_logs (
log_id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ DEFAULT NOW(),
level VARCHAR(10),
message TEXT,
service VARCHAR(50),
user_id INTEGER
);
-- Sample data
INSERT INTO application_logs (level, message, service, user_id) VALUES
('INFO', 'User login successful', 'auth', 12345),
('ERROR', 'Database connection timeout', 'api', 67890),
('WARN', 'Rate limit approaching', 'api', 12345),
('ERROR', 'Failed to connect to payment gateway', 'payments', 11111);
BISCUIT Setup:
SELECT biscuit_setup('application_logs', 'message', 'log_id');
Query Examples:
-- Find all connection errors
SELECT * FROM biscuit_match('%connection%')
WHERE level = 'ERROR'
ORDER BY timestamp DESC
LIMIT 100;
-- Find timeout issues across all services
SELECT
service,
COUNT(*) AS timeout_count,
MIN(timestamp) AS first_seen,
MAX(timestamp) AS last_seen
FROM biscuit_match('%timeout%')
WHERE timestamp > NOW() - INTERVAL '24 hours'
GROUP BY service
ORDER BY timeout_count DESC;
-- Complex error pattern
SELECT * FROM biscuit_match('%failed%connect%')
WHERE level IN ('ERROR', 'CRITICAL')
AND timestamp > NOW() - INTERVAL '1 hour'
ORDER BY timestamp DESC;
-- Error correlation
WITH error_patterns AS (
SELECT
log_id,
message,
timestamp,
CASE
WHEN message LIKE '%timeout%' THEN 'timeout'
WHEN message LIKE '%connection%' THEN 'connection'
WHEN message LIKE '%memory%' THEN 'memory'
ELSE 'other'
END AS error_category
FROM biscuit_match('%error%')
WHERE timestamp > NOW() - INTERVAL '24 hours'
)
SELECT
error_category,
COUNT(*) AS occurrences,
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS percentage
FROM error_patterns
GROUP BY error_category
ORDER BY occurrences DESC;
Real-Time Monitoring:
-- Create monitoring view
CREATE OR REPLACE VIEW recent_errors AS
SELECT
log_id,
timestamp,
level,
message,
service,
user_id
FROM biscuit_match('%error%')
WHERE timestamp > NOW() - INTERVAL '5 minutes'
ORDER BY timestamp DESC;
-- Query view
SELECT * FROM recent_errors;
Example 4: Genetic Sequence Matching
Scenario: Bioinformatics database with DNA sequences, need pattern matching
Schema:
CREATE TABLE dna_sequences (
sequence_id VARCHAR(50) PRIMARY KEY,
organism TEXT,
sequence TEXT, -- ATCG sequence
length INTEGER,
discovered_date DATE
);
-- Sample data
INSERT INTO dna_sequences VALUES
('SEQ001', 'E. coli', 'ATCGATCGATCG', 12, '2024-01-15'),
('SEQ002', 'Human', 'GCTAGCTAGCTA', 12, '2024-01-16'),
('SEQ003', 'E. coli', 'ATCGGGCCCATCG', 13, '2024-01-17');
BISCUIT Setup:
SELECT biscuit_setup('dna_sequences', 'sequence', 'sequence_id');
Query Examples:
-- Find sequences with specific motif
SELECT * FROM biscuit_match('%ATCG%');
-- Find palindromic sequences (conceptual)
SELECT * FROM biscuit_match('%GATC%'); -- Start codon
-- Complex pattern matching
SELECT * FROM biscuit_match('%ATC%GGG%')
WHERE organism = 'E. coli';
-- Count sequences by pattern
SELECT
'Contains ATCG' AS pattern,
biscuit_match_count('%ATCG%') AS count
UNION ALL
SELECT
'Starts with GC',
biscuit_match_count('GC%')
UNION ALL
SELECT
'Ends with TA',
biscuit_match_count('%TA');
Example 5: Social Media Username Search
Scenario: Social platform with 100M users, need fast username search
Schema:
CREATE TABLE users (
user_id BIGSERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
display_name TEXT,
bio TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
follower_count INTEGER DEFAULT 0
);
-- Sample data
INSERT INTO users (username, display_name, bio, follower_count) VALUES
('john_doe_123', 'John Doe', 'Software engineer', 1250),
('jane_smith', 'Jane Smith', 'Data scientist', 3400),
('tech_guru_99', 'Tech Guru', 'Tech enthusiast', 15000);
BISCUIT Setup:
-- Index lowercase usernames for case-insensitive search
ALTER TABLE users ADD COLUMN username_lower TEXT
GENERATED ALWAYS AS (LOWER(username)) STORED;
SELECT biscuit_setup('users', 'username_lower', 'user_id');
Query Examples:
-- Search usernames
SELECT user_id, username, display_name, follower_count
FROM biscuit_match('%tech%')
ORDER BY follower_count DESC;
-- Find available similar usernames
WITH desired_username AS (
SELECT 'john_doe' AS base
),
existing_matches AS (
SELECT username
FROM biscuit_match('john_doe%')
)
SELECT
base || '_' || generate_series(1, 10) AS suggestion
FROM desired_username
WHERE base || '_' || generate_series(1, 10) NOT IN (
SELECT username FROM existing_matches
)
LIMIT 5;
-- Find influencers matching pattern
SELECT username, display_name, follower_count
FROM biscuit_match('%guru%')
WHERE follower_count > 10000
ORDER BY follower_count DESC;
-- Username analytics
WITH username_patterns AS (
SELECT
CASE
WHEN username LIKE '%\_\_\_%' THEN 'has_underscores'
WHEN username LIKE '%[0-9]%' THEN 'has_numbers'
ELSE 'alphanumeric'
END AS pattern_type,
follower_count
FROM biscuit_match('%')
)
SELECT
pattern_type,
COUNT(*) AS user_count,
AVG(follower_count) AS avg_followers
FROM username_patterns
GROUP BY pattern_type;
Example 6: Document Management System
Scenario: Legal document repository, need full-text-like search on titles
Schema:
CREATE TABLE documents (
doc_id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
document_type VARCHAR(50),
file_path TEXT,
uploaded_by INTEGER,
uploaded_at TIMESTAMPTZ DEFAULT NOW(),
case_number VARCHAR(50)
);
-- Sample data
INSERT INTO documents (title, document_type, case_number) VALUES
('Contract Agreement - Smith vs Johnson', 'Contract', 'CASE-2024-001'),
('Patent Application for AI Technology', 'Patent', 'PAT-2024-042'),
('Lease Agreement - Commercial Property', 'Lease', 'CASE-2024-002');
BISCUIT Setup:
-- Create searchable column with normalized text
ALTER TABLE documents ADD COLUMN title_search TEXT
GENERATED ALWAYS AS (LOWER(REGEXP_REPLACE(title, '[^a-z0-9 ]', '', 'gi'))) STORED;
SELECT biscuit_setup('documents', 'title_search', 'doc_id');
Query Examples:
-- Find contract documents
SELECT * FROM biscuit_match('%contract%');
-- Multi-term search (all terms must appear)
SELECT *
FROM biscuit_match('%lease%commercial%')
WHERE document_type = 'Lease';
-- Case document search
SELECT doc_id, title, case_number, uploaded_at
FROM biscuit_match('%smith%johnson%')
ORDER BY uploaded_at DESC;
-- Document type distribution
WITH doc_matches AS (
SELECT document_type, COUNT(*) AS cnt
FROM biscuit_match('%agreement%')
GROUP BY document_type
)
SELECT
document_type,
cnt,
ROUND(100.0 * cnt / SUM(cnt) OVER (), 2) AS percentage
FROM doc_matches
ORDER BY cnt DESC;
Example 7: Bulk Data Migration with BISCUIT
Scenario: Migrating 10M records from legacy system
Migration Script:
-- Step 1: Create target table
CREATE TABLE products_new (
id INTEGER PRIMARY KEY,
sku TEXT,
name TEXT,
description TEXT
);
-- Step 2: Prepare BISCUIT (without triggers for bulk)
SELECT biscuit_build_index('products_new', 'name', 'id');
-- Step 3: Bulk load data
\timing on
SELECT biscuit_disable_triggers();
COPY products_new FROM '/data/products.csv' CSV HEADER;
-- Alternative: INSERT SELECT from staging
Time: 45,000 ms
-- Step 4: Rebuild index
SELECT biscuit_build_index('products_new', 'name', 'id');
Time: 38,000 ms
-- Step 5: Enable triggers for future updates
SELECT biscuit_enable_triggers();
\timing off
-- Step 6: Verify
SELECT
COUNT(*) AS table_count,
biscuit_get_active_count() AS index_count,
biscuit_get_free_slots() AS free_slots
FROM products_new;
-- Step 7: Test queries
SELECT biscuit_match_count('%'); -- Should match table count
Example 8: Multi-Pattern Search Interface
Scenario: Build API endpoint that accepts multiple search patterns
Application Code (Python + psycopg2):
import psycopg2
from typing import List, Dict
def multi_pattern_search(patterns: List[str], table: str) -> List[Dict]:
"""
Search for records matching any of the provided patterns
"""
conn = psycopg2.connect("dbname=mydb user=myuser")
cur = conn.cursor()
# Build UNION query for multiple patterns
union_queries = []
for pattern in patterns:
union_queries.append(
f"SELECT * FROM biscuit_match(%s)"
)
query = " UNION ".join(union_queries)
cur.execute(query, patterns)
results = cur.fetchall()
cur.close()
conn.close()
return results
# Usage
results = multi_pattern_search(
patterns=['%laptop%', '%macbook%', '%computer%'],
table='products'
)
SQL Equivalent:
-- Search multiple patterns
SELECT * FROM biscuit_match('%laptop%')
UNION
SELECT * FROM biscuit_match('%macbook%')
UNION
SELECT * FROM biscuit_match('%computer%')
ORDER BY id;
-- Or with CTE for clarity
WITH
pattern1 AS (SELECT * FROM biscuit_match('%laptop%')),
pattern2 AS (SELECT * FROM biscuit_match('%macbook%')),
pattern3 AS (SELECT * FROM biscuit_match('%computer%'))
SELECT * FROM pattern1
UNION
SELECT * FROM pattern2
UNION
SELECT * FROM pattern3;
Example 9: Performance Monitoring Dashboard
Scenario: Real-time monitoring of BISCUIT performance
Monitoring Queries:
-- Create monitoring function
CREATE OR REPLACE FUNCTION biscuit_performance_metrics()
RETURNS TABLE (
metric TEXT,
value NUMERIC,
unit TEXT,
status TEXT
) AS $
DECLARE
active_ct INTEGER;
tombstone_ct INTEGER;
free_ct INTEGER;
memory_val NUMERIC;
query_ct BIGINT;
insert_ct BIGINT;
update_ct BIGINT;
delete_ct BIGINT;
BEGIN
active_ct := biscuit_get_active_count();
tombstone_ct := biscuit_get_tombstone_count();
free_ct := biscuit_get_free_slots();
memory_val := (regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC;
query_ct := (regexp_matches(biscuit_index_status(), 'Queries: ([0-9]+)'))[1]::BIGINT;
insert_ct := (regexp_matches(biscuit_index_status(), 'Inserts: ([0-9]+)'))[1]::BIGINT;
update_ct := (regexp_matches(biscuit_index_status(), 'Updates: ([0-9]+)'))[1]::BIGINT;
delete_ct := (regexp_matches(biscuit_index_status(), 'Deletes: ([0-9]+)'))[1]::BIGINT;
RETURN QUERY SELECT 'Active Records'::TEXT, active_ct::NUMERIC, 'records'::TEXT,
CASE WHEN active_ct > 0 THEN 'OK' ELSE 'WARNING' END;
RETURN QUERY SELECT 'Tombstones', tombstone_ct::NUMERIC, 'records',
CASE WHEN tombstone_ct < 5000 THEN 'OK' WHEN tombstone_ct < 10000 THEN 'WARNING' ELSE 'CRITICAL' END;
RETURN QUERY SELECT 'Free Slots', free_ct::NUMERIC, 'slots', 'INFO';
RETURN QUERY SELECT 'Memory Usage', memory_val, 'MB',
CASE WHEN memory_val < 100 THEN 'OK' WHEN memory_val < 500 THEN 'WARNING' ELSE 'HIGH' END;
RETURN QUERY SELECT 'Total Queries', query_ct::NUMERIC, 'operations', 'INFO';
RETURN QUERY SELECT 'Total Inserts', insert_ct::NUMERIC, 'operations', 'INFO';
RETURN QUERY SELECT 'Total Updates', update_ct::NUMERIC, 'operations', 'INFO';
RETURN QUERY SELECT 'Total Deletes', delete_ct::NUMERIC, 'operations', 'INFO';
END;
$ LANGUAGE plpgsql;
-- Use in dashboard
SELECT * FROM biscuit_performance_metrics();
Output:
metric | value | unit | status
--------------------+-------+------------+--------
Active Records | 998567| records | OK
Tombstones | 856 | records | OK
Free Slots | 1433 | slots | INFO
Memory Usage | 18.3 | MB | OK
Total Queries | 45892 | operations | INFO
Total Inserts | 1000000| operations| INFO
Total Updates | 125678| operations | INFO
Total Deletes | 1433 | operations | INFO
Complete Production Example
Full Setup for Production Environment:
-- 1. Create schema
CREATE SCHEMA production;
SET search_path TO production;
-- 2. Create table
CREATE TABLE products (
id SERIAL PRIMARY KEY,
sku VARCHAR(100) UNIQUE NOT NULL,
name TEXT NOT NULL,
description TEXT,
category VARCHAR(100),
price NUMERIC(12, 2),
stock_level INTEGER DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- 3. Add indexes
CREATE INDEX idx_products_category ON products(category);
CREATE INDEX idx_products_price ON products(price);
-- 4. Setup BISCUIT
SELECT biscuit_setup('products', 'name', 'id');
-- 5. Verify setup
SELECT biscuit_index_status();
-- 6. Insert sample data
INSERT INTO products (sku, name, category, price, stock_level)
SELECT
'SKU-' || generate_series(1, 100000),
'Product ' || generate_series(1, 100000),
(ARRAY['Electronics', 'Furniture', 'Clothing'])[floor(random() * 3 + 1)],
(random() * 1000)::NUMERIC(12, 2),
floor(random() * 100)::INTEGER;
-- 7. Test queries
\timing on
SELECT * FROM biscuit_match('%Product 1234%');
SELECT biscuit_match_count('%Electronics%');
\timing off
-- 8. Setup monitoring
CREATE TABLE biscuit_metrics_log (
logged_at TIMESTAMPTZ DEFAULT NOW(),
active_count INTEGER,
tombstone_count INTEGER,
memory_mb NUMERIC,
query_count BIGINT
);
-- 9. Schedule monitoring (via pg_cron)
SELECT cron.schedule(
'biscuit-metrics',
'*/5 * * * *', -- Every 5 minutes
$INSERT INTO production.biscuit_metrics_log
SELECT NOW(), biscuit_get_active_count(), biscuit_get_tombstone_count(),
(regexp_matches(biscuit_index_status(), 'Memory: ([0-9.]+)'))[1]::NUMERIC,
(regexp_matches(biscuit_index_status(), 'Queries: ([0-9]+)'))[1]::BIGINT$
);
-- 10. Create API views
CREATE VIEW product_search AS
SELECT id, sku, name, category, price
FROM products
WHERE id IN (SELECT id FROM biscuit_match('%')); -- Template view
License
PostgreSQL License (similar to MIT/BSD)
Copyright (c) 2024 BISCUIT Contributors
Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following paragraphs appear in all copies.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Version History
v1.0.0
- Initial release
- Roaring Bitmap integration
- Lazy deletion with tombstones
- Incremental updates
- O(1) hash table PK lookup
- Automatic trigger-based CRUD
- Comprehensive monitoring
v1.0.1
- Updated README
v1.0.2
- Updated README
- Fixed name conflicts of functions
Contributors
BISCUIT is developed and maintained by Sivaprasad Murali .
Support and Contact
Issues: https://github.com/crystallinecore/biscuit/issues
Discussions: https://github.com/crystallinecore/biscuit/discussions
When pg_trgm feels half-baked, grab a BISCUIT 🍪