The general knowledge is that numerics are slower than integers/float, but offer precision and ranges that are better.
While I understand what is slow, I don't really know how much slower numerics are. So let's test it.
There are couple of things to consider:
- table size (related to disk usage per column)
- speed of inserts
- speed of indexing
- searching using equality operator
- searching using range operator
- doing some math on the values
Table size. Initial thinking is that int2 column would make the table 4 times smaller than int8. But is it true?
First, let's consider very simplistic table:
CREATE TABLE test_type ( some_column TYPE );
where type is one of:
- int2
- int4
- int8
- float4
- float8
- numeric
- numeric(10,0) (in this case table name will be test_numeric10)
- numeric(20,0) (in this case table name will be test_numeric20)
In each of these tables, let's insert 1 million rows. In all but int2 case these will be unique, but int2 range is too small to fit 1 million unique values, so I'll end up with duplicates:
INSERT INTO test_int2 SELECT i % 32767 FROM generate_series(1,1000000) i; INSERT INTO test_int4 SELECT i FROM generate_series(1,1000000) i; INSERT INTO test_int8 SELECT i FROM generate_series(1,1000000) i; INSERT INTO test_float4 SELECT i FROM generate_series(1,1000000) i; INSERT INTO test_float8 SELECT i FROM generate_series(1,1000000) i; INSERT INTO test_numeric SELECT i FROM generate_series(1,1000000) i; INSERT INTO test_numeric10 SELECT i FROM generate_series(1,1000000) i; INSERT INTO test_numeric20 SELECT i FROM generate_series(1,1000000) i;
Table sizes:
SELECT relname, pg_size_pretty( pg_table_size(oid) ) FROM pg_class WHERE relname ~ '^test_'; relname | pg_size_pretty ----------------+---------------- test_float4 | 35 MB test_float8 | 35 MB test_int2 | 35 MB test_int4 | 35 MB test_int8 | 35 MB test_numeric | 35 MB test_numeric10 | 35 MB test_numeric20 | 35 MB (8 ROWS)
Surprise, surprise.
But, let's redo the test, this time with 2 columns. Each table will get “some_column2", of the same datatype it had before. and all the data will be reinserted, to both columns (tables will get dropped and recreated to avoid any bloat). How will that change?
INSERT INTO test_int2 SELECT i % 32767, i % 32767 FROM generate_series(1,1000000) i; INSERT INTO test_int4 SELECT i, i FROM generate_series(1,1000000) i; INSERT INTO test_int8 SELECT i, i FROM generate_series(1,1000000) i; INSERT INTO test_float4 SELECT i, i FROM generate_series(1,1000000) i; INSERT INTO test_float8 SELECT i, i FROM generate_series(1,1000000) i; INSERT INTO test_numeric SELECT i, i FROM generate_series(1,1000000) i; INSERT INTO test_numeric10 SELECT i, i FROM generate_series(1,1000000) i; INSERT INTO test_numeric20 SELECT i, i FROM generate_series(1,1000000) i;
and the sizes now:
SELECT relname, pg_size_pretty( pg_table_size(oid) ) FROM pg_class WHERE relname ~ '^test_'; relname | pg_size_pretty ----------------+---------------- test_float4 | 35 MB test_float8 | 42 MB test_int2 | 35 MB test_int4 | 35 MB test_int8 | 42 MB test_numeric | 42 MB test_numeric10 | 42 MB test_numeric20 | 42 MB (8 ROWS)
Yeah. So, basically what we are seeing here is effect of two things:
- alignment – data is aligned to certain number of bytes. In my case – 8 (that's why int2 and int4 take as much as int8, but when I added next column – int8 grew and int2/int4 didn't). This means that if something is using less than 8 bytes, then it will be “padded" to 8 bytes. It doesn't mean that each value will take 8 bytes – it's a bit more complex than that, and if you're looking for specific details – I guess sources have definitive answers
- what a lot of people don't seem to remember – each row contains more than what you just see. There are system columns. And these take 26 bytes. Per row.
So, as far as data size is concerned – there is difference, but it's not really all that big.
What about inserting speed? I'll exclude int2 from the tests, as its smaller range requires limiting range, and this makes the test more complicated (i.e. there are additional operations to do).
So, what about the other datatypes?
I inserted 1 million rows into each of the tables, between tests doing “truncate" on the table, and repeated the process 10 times to get sensible average. Results:
table | time | more than fastest |
---|---|---|
test_int4 | 1285.952ms | 0.51% |
test_int8 | 1304.867ms | 1.99% |
test_float4 | 1284.393ms | 0.39% |
test_float8 | 1279.404ms | 0.00% |
test_numeric | 1533.078ms | 19.83% |
test_numeric10 | 1553.848ms | 21.45% |
test_numeric20 | 1550.829ms | 21.21% |
That's pretty interesting. I also ran the test, where there was unique index on each of the tables (only on one column). Results for the same insert test with index in place:
table | time | more than fastest |
---|---|---|
test_int4 | 2716.123ms | 0.00% |
test_int8 | 2927.931ms | 7.80% |
test_float4 | 3079.118ms | 13.36% |
test_float8 | 3208.430ms | 18.13% |
test_numeric | 3915.411ms | 44.15% |
test_numeric10 | 3925.478ms | 44.53% |
test_numeric20 | 3909.128ms | 43.92% |
Clearly indexing has its toll on larger datatypes. Up to ~ 50% slower than int4.
So, we know now, what about writes. Let's see how searches perform.
For equality searching, I will search, using 10 different queries for values (in indexed column): 1, 100000, 200000, 300000 … 1000000.
Since the searches are fast, I repeated each 1000 times. Averages:
table | time | more than fastest |
---|---|---|
test_int4 | 0.205ms | 0.00% |
test_int8 | 0.205ms | 0.08% |
test_float4 | 0.210ms | 2.50% |
test_float8 | 0.210ms | 2.39% |
test_numeric | 0.208ms | 1.68% |
test_numeric10 | 0.207ms | 1.19% |
test_numeric20 | 0.207ms | 0.73% |
For me it means that there is no difference.
Now range queries. I'll check queries that return 10 rows, 1000 rows and 10000 rows.
All queries were in form of:
SELECT * FROM TABLE WHERE some_column BETWEEN ... AND ...
First the 10 row:
table | time | more than fastest |
---|---|---|
test_int4 | 0.235ms | 0.49% |
test_int8 | 0.233ms | 0.00% |
test_float4 | 0.249ms | 6.81% |
test_float8 | 0.249ms | 6.75% |
test_numeric | 0.248ms | 6.29% |
test_numeric10 | 0.247ms | 5.66% |
test_numeric20 | 0.245ms | 5.02% |
1000 rows:
table | time | more than fastest |
---|
And the final, 10k row test:
table | time | more than fastest |
---|---|---|
test_int4 | 8.401ms | 0.00% |
test_int8 | 9.366ms | 11.49% |
test_float4 | 17.959ms | 113.77% |
test_float8 | 18.196ms | 116.60% |
test_numeric | 8.954ms | 6.59% |
test_numeric10 | 10.072ms | 19.89% |
test_numeric20 | 10.106ms | 20.30% |
That's actually pretty interesting – searching using floats is slower than using numerics?!
And now for some math. I ran a test, which was running commands like:
SELECT 12::int2 * 13::int2
Decided to test 3 operators: addition, multiplication and modulo.
Results for addition test (123 + 456):
table | time | more than fastest |
---|---|---|
int4 | 0.162ms | 0.00% |
int8 | 0.165ms | 2.18% |
float4 | 0.168ms | 4.11% |
float8 | 0.168ms | 4.03% |
numeric | 0.168ms | 3.76% |
numeric(10,0) | 0.177ms | 9.38% |
numeric(20,0) | 0.176ms | 8.92% |
Multiplication (123 * 99):
table | time | more than fastest |
---|---|---|
int4 | 0.159ms | 0.00% |
int8 | 0.163ms | 2.25% |
float4 | 0.166ms | 4.31% |
float8 | 0.166ms | 4.17% |
numeric | 0.165ms | 4.02% |
numeric(10,0) | 0.174ms | 9.62% |
numeric(20,0) | 0.173ms | 9.12% |
Modulo (12345 % 9876). This test couldn't be done with floats, as there is no “float % float" operator:
table | time | more than fastest |
---|---|---|
int4 | 0.177ms | 0.00% |
int8 | 0.180ms | 1.82% |
numeric | 0.183ms | 3.56% |
numeric(10,0) | 0.193ms | 8.77% |
numeric(20,0) | 0.193ms | 8.74% |
So, what does it all tell us? It all depends on use case. Indexing is biggest problem, but math itself, or searching is not that big. Whether it's OK for your situation – you have to assess yourself.
UPDATE
Dim from irc asked me to check aggregates too. So I did.
Results for select sum(some_column):
table | time | more than fastest |
---|---|---|
test_float4 | 63.163 ms | + 10.28% |
test_float8 | 62.684 ms | + 9.44% |
test_int2 | 57.685 ms | + 0.71% |
test_int4 | 57.276 ms | + 0.00% |
test_int8 | 152.288 ms | + 165.88% |
test_numeric | 118.889 ms | + 107.57% |
test_numeric10 | 118.179 ms | + 106.33% |
test_numeric20 | 118.100 ms | + 106.20% |
and select avg(some_column):
table | time | more than fastest |
---|---|---|
test_float4 | 69.261 ms | + 12.53% |
test_float8 | 69.909 ms | + 13.58% |
test_int2 | 62.018 ms | + 0.76% |
test_int4 | 61.551 ms | + 0.00% |
test_int8 | 153.370 ms | + 149.17% |
test_numeric | 120.552 ms | + 95.86% |
test_numeric10 | 120.645 ms | + 96.01% |
test_numeric20 | 120.597 ms | + 95.93% |
This is pretty interesting. Especially how bad int8 seems to be …
1. what about disk usage on table+indexes?
2. may be range test result would be better with explicit type specification like “between 1::float8 and 100::float8”?
hi
In my understanding search is not slower with float, ‘displaying’ float is slow, eg if you run explain analyse float select is faster than numeric.
how about comparing int8 vs char(8) and varchar(8).
since they are using the same size of memory and disk how about the performance ?
“This is pretty interesting. Especially how bad int8 seems to be …”
Maybe 32-bitness of tested PostgreSQL leads to such results?
How about the same on 64-bit PostgreSQL?
What’s postgres version, 64 or 32 bit?
ndwijaya:
Isn’t that enough: https://www.depesz.com/2012/06/07/123-vs-depesz-what-is-faster/
@sftf and @marian:
It was (and is) 64 bit on 64bit platform – 9.5devel on 64bit Linux.
Nice to actually see some quantifiable speed differences. I wonder why sum/avg is so slow for int8 and if older versions of postgres give the same result. I also hadn’t expected to see quite the insert penalty for numeric, is that mainly going to be down to varlena overhead/size?