On 20th of March, Andres Freund committed patch:
Use 128-bit math to accelerate some aggregation functions. On platforms where we support 128bit integers, use them to implement faster transition functions for sum(int8), avg(int8), var_*(int2/int4),stdev_*(int2/int4). Where not supported continue to use numeric as a transition type. In some synthetic benchmarks this has been shown to provide significant speedups. Bumps catversion. Discussion: 544BB5F1.50709@proxel.se Author: Andreas Karlsson Reviewed-By: Peter Geoghegan, Petr Jelinek, Andres Freund, Oskari Saarenmaa, David Rowley
Couple of months ago, I wrote blogpost which checked how much slower are numerics.
Based on the data from there, I know that:
- running sum(column) on int8 column took 265.88% of the time it took for int4
- running avg(column) on int8 column took 249.17% of the time it took for int4
With this new patch by Andres, we should get better results. So, let's see:
$ CREATE TABLE test_int4 ( some_column int4 ); CREATE TABLE $ INSERT INTO test_int4 SELECT i FROM generate_series(1,1000000) i; INSERT 0 1000000 $ CREATE TABLE test_int8 ( some_column int8 ); CREATE TABLE $ INSERT INTO test_int8 SELECT i FROM generate_series(1,1000000) i; INSERT 0 1000000
Running single test wouldn't really be sensible, so I'll run each of:
- select sum(some_column) from test_int4
- select avg(some_column) from test_int4
- select sum(some_column) from test_int8
- select avg(some_column) from test_int8
10 times, and report just the best time, using such script:
( echo "\\timing" seq 1 10 | sed 's/.*/select sum(some_column) from test_int4;/' ) | psql -qAtX | grep ^Time: | awk '{print $2}' | sort -n | head -n1
Results:
- sum(int4): 58.569ms
- avg(int4): 64.883ms
- sum(int8): 60.395ms – 3.1% slower than sum(int4) – virtually irrelevant overhead
- avg(int8): 60.281ms – 7% faster than avg(int4)
The result for avg(int8) I think is simply because some random load fluctuations on the test machine. In any way – the time differences are so small, that it simply doesn't matter. Int8 operations are now as fast as int4 ones. Thanks to all involved.
I am not a programmer but if its do with registry instructions then I surely look forward to utilizing avx2 (and fma) 256bits.