TPC-H Benchmark
This directory contains scripts to execute the TPC-H benchmark queries with the same data in ClickHouse and Postgres, and generates a table comparing PostgreSQL performance to pg_clickhouse performance.
The scripts run each query in queries three times each for native
PostgreSQL and pg_clickhouse performance and produces a Markdown table
reporting the averaged times for each, as well as whether the pg_clickhouse
query pushed down to ClickHouse. Times exceeding 60s will not be recorded, and
result in a -. An example:
| Query | PostgreSQL | pg_clickhouse | Pushdown |
| ----------:| ----------:| -------------:|:--------:|
| [Query 1] | 4693 ms | 268 ms | ✔︎ |
| [Query 2] | 458 ms | 3446 ms | |
| [Query 3] | 742 ms | 111 ms | ✔︎ |
| [Query 4] | 270 ms | 130 ms | ✔︎ |
| [Query 5] | 337 ms | 1460 ms | ✔︎ |
| [Query 6] | 764 ms | 53 ms | ✔︎ |
| [Query 7] | 619 ms | 96 ms | ✔︎ |
| [Query 8] | 342 ms | 156 ms | ✔︎ |
| [Query 9] | 3094 ms | 298 ms | ✔︎ |
| [Query 10] | 581 ms | 197 ms | ✔︎ |
| [Query 11] | 212 ms | 24 ms | |
| [Query 12] | 1116 ms | 84 ms | ✔︎ |
| [Query 13] | 958 ms | 1368 ms | |
| [Query 14] | 181 ms | 73 ms | ✔︎ |
| [Query 15] | 1118 ms | 557 ms | |
| [Query 16] | 497 ms | 1714 ms | |
| [Query 17] | 1846 ms | 32709 ms | |
| [Query 18] | 5823 ms | 10649 ms | |
| [Query 19] | 53 ms | 206 ms | ✔︎ |
| [Query 20] | 421 ms | - | |
| [Query 21] | 1349 ms | 4434 ms | |
| [Query 22] | 258 ms | 1415 ms | |
Setup & Execution
To use it, set up ClickHouse and Postgres running locally and execute these commands.
ClickHouse
make ch
This command connects to ClickHouse and runs tpch-ch.sql, which
creates a database named tpch, creates the ClickHouse TPC-H tables, and
loads them with scaling factor 1 data from ClickHouse S3 buckets. Export the
ClickHouse client environment variables CLICKHOUSE_USER,
CLICKHOUSE_PASSWORD and CLICKHOUSE_HOST as appropriate to configure the
ClickHouse server and user.
PostgreSQL
make pg
This command connects to PostgreSQL and runs tpch-pg.sql, which
creates a schema named pg, creates the PostgreSQL TPC-H tables, and loads
them with scaling factor 1 data from ClickHouse S3 buckets. It also loads
pg_clickhouse (which must already be installed in the cluster), creates a
schema named ch, and imports the the tables from the ClickHouse database
created above.
Export the PostgreSQL client environment variables as appropriate to configure the PostgreSQL server and user, and the ClickHouse client environment variables to configure the server to which pg_clickhouse connects.
Run
Run this command to run the benchmark:
make run
This will save the results in the result directory and produce a Markdown
table report with the average time between three runs of each query in
queries for both PostgreSQL and pg_clickhouse.
Cleanup
Run this command to clean up artifacts from running the benchmark:
make clean
Links
Use this command to create link references for the Markdown links to each query in the table output:
make links
Queries
The queries duplicate those from the pgtpc project. The only changes are to explicitly compare against dates, rather than timestamps, since ClickHouse does not implicitly cast them for the comparison. The changes are all from something like this (as in Query 1):
l_shipdate <= date '1998-12-01' - interval '90d'
To
l_shipdate <= date(date '1998-12-01' - interval '90d')
Reveal the complete diff below for details.
Complete Queries Diff
diff --git queries/1.sql b/tpch/queries/1.sql
index 20cecac..e4f841b 100644
--- queries/1.sql
+++ b/tpch/queries/1.sql
@@ -15,7 +15,7 @@ select
from
lineitem
where
- l_shipdate <= date '1998-12-01' - interval '90d'
+ l_shipdate <= date(date '1998-12-01' - interval '90d')
group by
l_returnflag,
l_linestatus
diff --git queries/10.sql b/tpch/queries/10.sql
index d063981..5cd8cd4 100644
--- queries/10.sql
+++ b/tpch/queries/10.sql
@@ -20,7 +20,7 @@ where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate >= date '1993-10-01'
- and o_orderdate < date '1993-10-01' + interval '3month'
+ and o_orderdate < date(date '1993-10-01' + interval '3month')
and l_returnflag = 'R'
and c_nationkey = n_nationkey
group by
diff --git queries/12.sql b/tpch/queries/12.sql
index 83dd6b4..b2bbd38 100644
--- queries/12.sql
+++ b/tpch/queries/12.sql
@@ -25,7 +25,7 @@ where
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1994-01-01'
- and l_receiptdate < date '1994-01-01' + interval '1y'
+ and l_receiptdate < date(date '1994-01-01' + interval '1y')
group by
l_shipmode
order by
diff --git queries/14.sql b/tpch/queries/14.sql
index b8949b6..e5642ff 100644
--- queries/14.sql
+++ b/tpch/queries/14.sql
@@ -14,4 +14,4 @@ from
where
l_partkey = p_partkey
and l_shipdate >= date '1995-09-01'
- and l_shipdate < date '1995-09-01' + interval '1month';
+ and l_shipdate < date(date '1995-09-01' + interval '1month');
diff --git queries/15.sql b/tpch/queries/15.sql
index c4fba55..f0e2166 100644
--- queries/15.sql
+++ b/tpch/queries/15.sql
@@ -8,7 +8,7 @@ create or replace view revenue0 (supplier_no, total_revenue) as
lineitem
where
l_shipdate >= date '1996-01-01'
- and l_shipdate < date '1996-01-01' + interval '3month'
+ and l_shipdate < date(date '1996-01-01' + interval '3month')
group by
l_suppkey;
diff --git queries/20.sql b/tpch/queries/20.sql
index babc01c..29de2f3 100644
--- queries/20.sql
+++ b/tpch/queries/20.sql
@@ -32,7 +32,7 @@ where
l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and l_shipdate >= date '1994-01-01'
- and l_shipdate < date '1994-01-01' + interval '1' year
+ and l_shipdate < date(date '1994-01-01' + interval '1' year)
)
)
and s_nationkey = n_nationkey
diff --git queries/4.sql b/tpch/queries/4.sql
index 7f7011e..75df4ab 100644
--- queries/4.sql
+++ b/tpch/queries/4.sql
@@ -8,7 +8,7 @@ select
from
orders
where
- o_orderdate >= date '1993-07-01'and o_orderdate < date '1993-07-01' + interval '3month'
+ o_orderdate >= date '1993-07-01'and o_orderdate < date(date '1993-07-01' + interval '3month')
and exists (select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate)
group by
o_orderpriority
diff --git queries/5.sql b/tpch/queries/5.sql
index b085c1b..2ccd73e 100644
--- queries/5.sql
+++ b/tpch/queries/5.sql
@@ -21,7 +21,7 @@ where
and n_regionkey = r_regionkey
and r_name = 'ASIA'
and o_orderdate >= date '1994-01-01'
- and o_orderdate < date '1994-01-01' + interval '1year'
+ and o_orderdate < date(date '1994-01-01' + interval '1year')
group by
n_name
order by
diff --git queries/6.sql b/tpch/queries/6.sql
index 053d79e..0e06313 100644
--- queries/6.sql
+++ b/tpch/queries/6.sql
@@ -8,6 +8,6 @@ from
lineitem
where
l_shipdate >= date '1994-01-01'
- and l_shipdate < date '1994-01-01' + interval '1year'
+ and l_shipdate < date(date '1994-01-01' + interval '1year')
and l_discount between .06 - 0.01 and .06 + 0.01
and l_quantity < 24;