Contents
Performance Benchmarks
Comparison benchmarks for the weighted_statistics extension, focusing on C vs PL/pgSQL performance and quantile method differences.
Quick Start
# From repository root
./benchmark/run_benchmark.sh
# Custom database
PGDATABASE=mydb PGUSER=myuser ./benchmark/run_benchmark.sh
What Gets Tested
Group 1: C vs PL/pgSQL Comparison
- Functions tested:
weighted_mean
,weighted_quantile
- Implementation comparison: Optimized C code vs optimized PL/pgSQL
- Array sizes: 1K, 10K, and 100K elements
- Methodology: 5 iterations per test with statistical averages and standard deviation
- Purpose: Measure performance advantage of C implementation across scaling
Group 2: Quantile Methods Comparison
- Methods compared:
weighted_quantile
- Empirical CDF (baseline)wquantile
- Type 7 / Hyndman-Fan (R/NumPy default)whdquantile
- Harrell-Davis (smooth estimator)
- Array sizes: 1K and 10K elements
- Methodology: 5 iterations per test with statistical averages
- Purpose: Compare computational cost of different quantile algorithms
Additional Tests
- Single vs Multiple Quantiles: Efficiency of computing multiple quantiles in one call
- Sparse Data: All tests use sparse weight arrays (sum ≈ 1.0) to test real-world scenarios
Environment Variables
Uses standard PostgreSQL environment variables (aligned with test suite):
Primary:
PGDATABASE
- Database name (default: postgres)PGUSER
- Username (default: postgres)PGHOST
- Host (default: localhost)PGPORT
- Port (default: 5432)
Alternatives:
TEST_DATABASE
,TEST_USER
,TEST_HOST
,TEST_PORT
Prerequisites
- PostgreSQL with
weighted_statistics
extension installed and enabled psql
client available in PATH- Database connection permissions
Output Interpretation
The benchmark shows timing results for each test. Look for:
- Time values - Execution time for each function
- C vs PL/pgSQL ratios - Performance improvement of C implementation
- Quantile method differences - Relative cost of different algorithms
- Scaling behavior - How performance changes with array size
Manual Execution
# Load PL/pgSQL functions first
psql -f benchmark/plpgsql_functions.sql
# Run performance tests
psql -f benchmark/performance_test.sql
Performance Results
Based on benchmarks run on the target system:
C vs PL/pgSQL Performance
| Function | Array Size | C Time (±stddev) | PL/pgSQL Time (±stddev) | Speedup | |–––––|————|——————|———————––|———| | weighted_mean | 1K | 0.20ms (±0.39) | 0.19ms (±0.04) | 1.0x (equal) | | weighted_mean | 10K | 0.58ms (±0.94) | 2.08ms (±0.51) | 3.6x faster | | weighted_mean | 100K | 4.79ms (±0.78) | 18.77ms (±1.42) | 3.9x faster | | weighted_quantile | 1K | 0.06ms (±0.04) | 0.75ms (±0.07) | 11.9x faster | | weighted_quantile | 10K | 0.51ms (±0.05) | 6.91ms (±0.14) | 13.5x faster | | weighted_quantile | 100K | 10.02ms (±0.83) | 136.71ms (±22.06) | 13.6x faster |
Quantile Methods Comparison
| Method | Array Size | Time (±stddev) | vs Empirical | |––––|————|––––––––|–––––––| | weighted_quantile (Empirical CDF) | 1K | 0.07ms (±0.05) | baseline | | wquantile (Type 7) | 1K | 0.10ms (±0.02) | 1.5x slower | | whdquantile (Harrell-Davis) | 1K | 1.75ms (±0.04) | 25.7x slower | | weighted_quantile (Empirical CDF) | 10K | 0.52ms (±0.07) | baseline | | wquantile (Type 7) | 10K | 0.92ms (±0.08) | 1.8x slower | | whdquantile (Harrell-Davis) | 10K | 17.33ms (±0.21) | 33.4x slower |
Key Insights
- C advantage scales with complexity: Mean functions equal at 1K but 4x faster at 100K; Quantiles consistently 12-14x faster
- Statistical reliability: Standard deviations show consistent performance across iterations
- Empirical CDF quantiles are fastest for general use
- Type 7 quantiles have minimal overhead (1.5-1.8x slower than empirical)
- Harrell-Davis method is very expensive (25-33x slower than empirical) but provides smoothest estimates
- Linear scaling confirmed: Performance scales predictably with array size for all methods