Performance Benchmarks

Comparison benchmarks for the weighted_statistics extension, focusing on C vs PL/pgSQL performance and quantile method differences.

Quick Start

# From repository root
./benchmark/run_benchmark.sh

# Custom database
PGDATABASE=mydb PGUSER=myuser ./benchmark/run_benchmark.sh

What Gets Tested

Group 1: C vs PL/pgSQL Comparison

  • Functions tested: weighted_mean, weighted_quantile
  • Implementation comparison: Optimized C code vs optimized PL/pgSQL
  • Array sizes: 1K, 10K, and 100K elements
  • Methodology: 5 iterations per test with statistical averages and standard deviation
  • Purpose: Measure performance advantage of C implementation across scaling

Group 2: Quantile Methods Comparison

  • Methods compared:
    • weighted_quantile - Empirical CDF (baseline)
    • wquantile - Type 7 / Hyndman-Fan (R/NumPy default)
    • whdquantile - Harrell-Davis (smooth estimator)
  • Array sizes: 1K and 10K elements
  • Methodology: 5 iterations per test with statistical averages
  • Purpose: Compare computational cost of different quantile algorithms

Additional Tests

  • Single vs Multiple Quantiles: Efficiency of computing multiple quantiles in one call
  • Sparse Data: All tests use sparse weight arrays (sum ≈ 1.0) to test real-world scenarios

Environment Variables

Uses standard PostgreSQL environment variables (aligned with test suite):

Primary:

  • PGDATABASE - Database name (default: postgres)
  • PGUSER - Username (default: postgres)
  • PGHOST - Host (default: localhost)
  • PGPORT - Port (default: 5432)

Alternatives:

  • TEST_DATABASE, TEST_USER, TEST_HOST, TEST_PORT

Prerequisites

  • PostgreSQL with weighted_statistics extension installed and enabled
  • psql client available in PATH
  • Database connection permissions

Output Interpretation

The benchmark shows timing results for each test. Look for:

  • Time values - Execution time for each function
  • C vs PL/pgSQL ratios - Performance improvement of C implementation
  • Quantile method differences - Relative cost of different algorithms
  • Scaling behavior - How performance changes with array size

Manual Execution

# Load PL/pgSQL functions first
psql -f benchmark/plpgsql_functions.sql

# Run performance tests
psql -f benchmark/performance_test.sql

Performance Results

Based on benchmarks run on the target system:

C vs PL/pgSQL Performance

| Function | Array Size | C Time (±stddev) | PL/pgSQL Time (±stddev) | Speedup | |–––––|————|——————|———————––|———| | weighted_mean | 1K | 0.20ms (±0.39) | 0.19ms (±0.04) | 1.0x (equal) | | weighted_mean | 10K | 0.58ms (±0.94) | 2.08ms (±0.51) | 3.6x faster | | weighted_mean | 100K | 4.79ms (±0.78) | 18.77ms (±1.42) | 3.9x faster | | weighted_quantile | 1K | 0.06ms (±0.04) | 0.75ms (±0.07) | 11.9x faster | | weighted_quantile | 10K | 0.51ms (±0.05) | 6.91ms (±0.14) | 13.5x faster | | weighted_quantile | 100K | 10.02ms (±0.83) | 136.71ms (±22.06) | 13.6x faster |

Quantile Methods Comparison

| Method | Array Size | Time (±stddev) | vs Empirical | |––––|————|––––––––|–––––––| | weighted_quantile (Empirical CDF) | 1K | 0.07ms (±0.05) | baseline | | wquantile (Type 7) | 1K | 0.10ms (±0.02) | 1.5x slower | | whdquantile (Harrell-Davis) | 1K | 1.75ms (±0.04) | 25.7x slower | | weighted_quantile (Empirical CDF) | 10K | 0.52ms (±0.07) | baseline | | wquantile (Type 7) | 10K | 0.92ms (±0.08) | 1.8x slower | | whdquantile (Harrell-Davis) | 10K | 17.33ms (±0.21) | 33.4x slower |

Key Insights

  • C advantage scales with complexity: Mean functions equal at 1K but 4x faster at 100K; Quantiles consistently 12-14x faster
  • Statistical reliability: Standard deviations show consistent performance across iterations
  • Empirical CDF quantiles are fastest for general use
  • Type 7 quantiles have minimal overhead (1.5-1.8x slower than empirical)
  • Harrell-Davis method is very expensive (25-33x slower than empirical) but provides smoothest estimates
  • Linear scaling confirmed: Performance scales predictably with array size for all methods