trimmed_aggregates

This Release
trimmed_aggregates 1.3.2
Date
Status
Testing
Latest Testing
trimmed_aggregates 2.0.0-dev —
Other Releases
Abstract
Aggregate for computing trimmed avg/stddev/var (with outliers removed).
Description
An extension written in C that allows you to remove outliers before computing the basic statistical aggregates (AVG/STDDEV/VARIANCE).
Released By
tomasv
License
BSD
Resources
Special Files
Tags

Extensions

trimmed_aggregates 2.0.0-dev

README

Trimmed aggregates

This PostgreSQL extension provides several aggregate functions that trim the input data set before applying the function, i.e. remove lowest/highest values. The number of values to be removed is configured using the parameters.

WARNING: Those aggregates require the whole set, as they need to collect and sort the whole data set ((to trim low/high values). This may be a time consuming process and require a lot of memory. Keep this in mind when using those functions.

Available aggregates

The extension implements aggregates that resemble those described here: http://www.postgresql.org/docs/9.1/static/functions-aggregate.html, i.e. AVG, VARIANCE, VAR_POP, VAR_SAMP, STDDEV, STDDEV_POP and STDDEV_SAMP

  • AVG

    avg_trimmed(value, low_cut, high_cut)
    
  • VARIANCE

    var_trimmed(value, low_cut, high_cut);
    var_pop_trimmed(value, low_cut, high_cut)
    var_samp_trimmed(value, low_cut, high_cut)
    
  • STDDEV (standard deviation)

    stddev_trimmed(value, low_cut, high_cut)
    stddev_pop_trimmed(value, low_cut, high_cut)
    stddev_samp_trimmed(value, low_cut, high_cut)
    
  • combined aggregate (computes all seven values at once)

    trimmed(value, low_cut, high_cut)
    

All those functions are overloaded for numeric, double precision, int32 and int64 data types.

Using the aggregates

All the aggregates are used the same way so let's see how to use the avg_trimmed aggregate. For example this

SELECT avg_trimmed(i, 0.1, 0.1) FROM generate_series(1,1000) s(i);

means 10% of the values will be removed on both ends, and the average will be computed using the middle 80%. On the other hand this

SELECT avg_trimmed(i, 0.2, 0.1) FROM generate_series(1,1000) s(i);

means 20% of the lowest and 10% of the highest values will be removed, so the average will be computed using the remaining 70% of values.

The combined aggregate computes and returns all values at once as an array. The values are stored in this order

  • average
  • var_pop
  • var_samp
  • variance
  • stddev_pop
  • stddev_samp
  • stddev

If you need more of the values at once this may be much more efficient as it shares the memory and can compute the values with only two passes through the data (to compute exact variance and stddev).

Installation

Installing this extension is very simple - if you're using pgxn client (and you should), just do this:

$ pgxn install --testing trimmed_aggregates
$ pgxn load --testing -d mydb trimmed_aggregates

You can also install manually, just it like any other extension, i.e.

$ make install
$ psql dbname -c "CREATE EXTENSION trimmed_averages"

And if you're on an older PostgreSQL version, you have to run the SQL script manually (use the proper version).

$ psql dbname < trimmed_averages--1.0.sql

That's all.

License

This software is distributed under the terms of BSD 2-clause license. See LICENSE or http://www.opensource.org/licenses/bsd-license.php for more details.