Lately there have been couple of discussions on IRC, Slack, and Reddit that showed that people assume that by using int4/integer they use 4 bytes less than they would in case of int8/bigint. This is not really the case. Let me explain why.
First quick test. On my computer, which is running some Ryzen processor, and 64bit architecture:
$ SELECT version(); version ──────────────────────────────────────────────────────────────────────────────────────────────────── PostgreSQL 15devel ON x86_64-pc-linux-gnu, compiled BY gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, 64-bit (1 ROW) $ CREATE TABLE test8 (id int8); CREATE TABLE $ CREATE TABLE test4 (id int4); CREATE TABLE $ INSERT INTO test8 SELECT generate_series(1,1000000) i; INSERT 0 1000000 $ INSERT INTO test4 SELECT generate_series(1,1000000) i; INSERT 0 1000000 $ \dt+ test* List OF relations Schema │ Name │ TYPE │ Owner │ Persistence │ Access method │ SIZE │ Description ────────┼───────┼───────┼────────┼─────────────┼───────────────┼───────┼───────────── public │ test4 │ TABLE │ depesz │ permanent │ heap │ 35 MB │ public │ test8 │ TABLE │ depesz │ permanent │ heap │ 35 MB │ (2 ROWS)
I inserted 1 million rows, with single column of int4/int8 datatype, and resulting table sizes are exactly the same!
I also ran the test on 32 system:
postgres=# SELECT version(); version ------------------------------------------------------------------------------------------------------------------------- PostgreSQL 13.5 (Debian 13.5-0+deb11u1) ON i686-pc-linux-gnu, compiled BY gcc (Debian 10.2.1-6) 10.2.1 20210110, 32-bit (1 ROW) ... postgres=# \dt+ test* List OF relations Schema | Name | TYPE | Owner | Persistence | SIZE | Description --------+-------+-------+----------+-------------+-------+------------- public | test4 | TABLE | postgres | permanent | 31 MB | public | test8 | TABLE | postgres | permanent | 35 MB | (2 ROWS)
Interestingly, in here, the table size dropped by 4MB. Which is how much it “should". But that still left 27MB of data “unaccounted for", and doesn't explain why int4 on my normal computer is using the same space as int8.
Why is it? What makes it work that way?
The answer is: performance. Due to performance reasons Pg keeps data “aligned" to “arch-related" sizes. Which means, on 64bit computers the alignment is to 8 bytes.
What exactly is the alignment? Well, it means that the smallest unit of allocation is 8 bytes, and that, if technically possible, Pg will not split single values into more than one 8 byte block.
Which means – if all you have in your table is 4 byte column – 8 bytes will be used anyway. If you'd have two int4 columns – they will both fit into 8 bytes, which will use just this.
But – if you have int4, and the next column is something longer than 4 bytes – pg will not split the other column into “some part in 4 bytes of previous 8byte block, and some in next" – everything will go to next 8B block.
We can see this by doing simple test (all tests from now on are on 64 bit computer):
$ CREATE TABLE test AS SELECT i::int4 AS i1, i::int4 AS i2 FROM generate_series(1,1000000) i; SELECT 1000000 $ \dt+ test List OF relations Schema │ Name │ TYPE │ Owner │ Persistence │ Access method │ SIZE │ Description ────────┼──────┼───────┼────────┼─────────────┼───────────────┼───────┼───────────── public │ test │ TABLE │ depesz │ permanent │ heap │ 35 MB │ (1 ROW)
In here you can see that when I made table with two int4 columns, it used the same space as with single int8. And now let's see what will happen when I'll make a table with int4 + int8 columns. If my math is correct, it should use ~ 43MB:
$ CREATE TABLE test48 AS SELECT i::int4 AS i1, i::int8 AS i2 FROM generate_series(1,1000000) i; SELECT 1000000 $ \dt+ test48 List OF relations Schema │ Name │ TYPE │ Owner │ Persistence │ Access method │ SIZE │ Description ────────┼────────┼───────┼────────┼─────────────┼───────────────┼───────┼───────────── public │ test48 │ TABLE │ depesz │ permanent │ heap │ 42 MB │ (1 ROW)
It's slightly less, but as you can see it's clearly close to what I expected.
You can also see that it is the same case for even shorter types. INT2 (2 byte) or BOOL (1 byte) will still occupy whole 8 bytes, unless they can fit on 8 byte block with previous/next column:
$ CREATE TABLE test2 AS SELECT (i % 32000)::int2 AS i2 FROM generate_series(1,1000000) i; SELECT 1000000 $ \dt+ test2 List OF relations Schema │ Name │ TYPE │ Owner │ Persistence │ Access method │ SIZE │ Description ────────┼───────┼───────┼────────┼─────────────┼───────────────┼───────┼───────────── public │ test2 │ TABLE │ depesz │ permanent │ heap │ 35 MB │ (1 ROW) $ CREATE TABLE testb AS SELECT 'true'::bool AS b FROM generate_series(1,1000000) i; SELECT 1000000 $ \dt+ testb List OF relations Schema │ Name │ TYPE │ Owner │ Persistence │ Access method │ SIZE │ Description ────────┼───────┼───────┼────────┼─────────────┼───────────────┼───────┼───────────── public │ testb │ TABLE │ depesz │ permanent │ heap │ 35 MB │ (1 ROW)
All of them are 35MB.
Why is that, though? From what I gather the answer is: performance. I don't know low-level details, but based on what I understand, processors process data in arch-dependent block sizes. 64bit processor works on 64 bits. And this means that if you want to do something on int4 value, that is part of 8 byte block, you have to add operation to zero the other 32 bits.
The difference is too small to be easily testable, and will be dwarfed by random fluctuations in load, but it is there. And on heavily loaded machines might make a difference.
So, it is possible to use less disk space by switching to smaller datatypes. But you have to be very careful about ordering of columns in table. Which might be possible/easy when you design your table, but stops being reasonable once you deployed app, and now you're just changing schema to accommodate new features.
There is one more factor there to consider. Please note that with 1 million rows, and 8MB of data that I did put in the table, size of the table was 35MB. What is the rest?
Well, aside from normally visible columns, each row in PostgreSQL has some extra, system columns. And they have non-zero size. We can see:
$ SELECT a.attname, t.typname, t.typlen, a.attnum FROM pg_attribute a JOIN pg_type t ON a.atttypid = t.oid WHERE a.attrelid = 'test8'::regclass ORDER BY attnum; attname │ typname │ typlen │ attnum ──────────┼─────────┼────────┼──────── tableoid │ oid │ 4 │ -6 cmax │ cid │ 4 │ -5 xmax │ xid │ 4 │ -4 cmin │ cid │ 4 │ -3 xmin │ xid │ 4 │ -2 ctid │ tid │ 6 │ -1 id │ int8 │ 8 │ 1 (7 ROWS)
For each row we have tableoid, cmax, xmax, cmin and ctid (well, tableoid and ctid are not really there, in the datafiles). And then we have the “real" columns in the table.
://paste.depesz.com/s/77i
It's possible to see this information:
$ SELECT tableoid, cmax, xmax, cmin, xmin, ctid, id FROM test8 LIMIT 1; tableoid │ cmax │ xmax │ cmin │ xmin │ ctid │ id ──────────┼──────┼──────┼──────┼──────┼───────┼──── 307696 │ 0 │ 0 │ 0 │ 1773 │ (0,1) │ 1 (1 ROW)
You can find description on what all these columns mean in the docs, but one generally never had to worry about them.
This, and the fact that there are more stuff in datafiles (like checksums), means that the 4 bytes per row that you could “save" by switching int8 to int4 is, usually, negligible.
On the plus side – if you've added one four byte column to table, adding, next to it, another 4 byte column is (disk-space wise) essentially free.
Let's look at some real life table:
=> SELECT 'col_' || a.attnum, a.atttypid::regtype, a.attlen FROM pg_attribute a WHERE a.attrelid = 'accounts'::regclass AND attnum > 0; ?COLUMN? │ atttypid │ attlen ══════════╪═════════════════════════════╪════════ col_1 │ BIGINT │ 8 col_2 │ text │ -1 col_3 │ TIMESTAMP WITHOUT TIME zone │ 8 col_4 │ TIMESTAMP WITHOUT TIME zone │ 8 col_5 │ text │ -1 col_6 │ TIMESTAMP WITHOUT TIME zone │ 8 col_7 │ BIGINT │ 8 col_8 │ text │ -1 col_9 │ BIGINT │ 8 col_10 │ BIGINT │ 8 col_11 │ BIGINT │ 8 col_12 │ BIGINT │ 8 col_13 │ text │ -1 col_14 │ text │ -1 col_15 │ text │ -1 col_16 │ BIGINT │ 8 col_17 │ BIGINT │ 8 col_18 │ BOOLEAN │ 1 col_19 │ text │ -1 col_20 │ text │ -1 col_21 │ text │ -1 col_22 │ text │ -1 col_23 │ text │ -1 col_24 │ text │ -1 col_25 │ BOOLEAN │ 1 col_26 │ BOOLEAN │ 1 col_27 │ text │ -1 col_28 │ text │ -1 col_29 │ text │ -1 col_30 │ text │ -1 col_31 │ text │ -1 col_32 │ BIGINT │ 8 col_33 │ BIGINT │ 8 col_34 │ text │ -1 col_35 │ BIGINT │ 8 col_36 │ text │ -1 col_37 │ text │ -1 col_38 │ text │ -1 col_39 │ text │ -1 col_40 │ BIGINT │ 8 col_41 │ text │ -1 col_42 │ BIGINT │ 8 col_43 │ BIGINT │ 8 (43 ROWS)
The attnum > 0 condition simply hid the system columns. Attlen of -1 means that the data length is variable, depending on how much data there really is.
Estimating row size is complicated because of all the texts, but let's assume that they just take single 8 byte block, each. Given how the columns are ordered only columns 25 and 26 can be fit in single 8 byte block. So total row size (without system data is 42 * 8 bytes = 336 bytes.).
Now, if we'd change every single int8 to int4, I could combine columns 9-12 into 2 8 byte blocks, columns 16-17 to one, same for 32,33 and 42,43. Total: 296 bytes. This means that I would save 40 bytes per row. And please note that I assumed that none of 23 text columns will use more than 8 bytes. Guess how likely it is 🙂
You can see how wide your rows really are (on average) by simply running explain:
=> EXPLAIN SELECT * FROM accounts; QUERY PLAN ═══════════════════════════════════════════════════════════════════ Seq Scan ON accounts (cost=0.00..3979.23 ROWS=100323 width=1113) (1 ROW)
Each row is over 1kB. Saving 40 bytes on this is more or less equal to rounding error 🙂
So. There are situation where using int4/int2 is beneficial. And it is possible that you can save some disk space by using smaller datatypes. But the differences are not all that great, require careful planning, which is not always practical or even possible, and can lead to problems in future should the range of smaller integer became a problem.
Final note – while I was writing about int* columns, the same applies to float4/float8 (a.k.a. float/double).
Oh…I have solved similar problem this week and have found this horribly amazing script which helps to find better alignment for Postgres columns https://github.com/NikolayS/postgres_dba/blob/master/sql/p1_alignment_padding.sql
In our case it helped to drop 7% of data (few timetamp and few numeric columns), which might not sound like a big improvement but for 100gb+ tables it’s a pretty good fix (the 1 problem is that order of columns is artificial but disk space is expensive)
Is it time for online calculator for CREATEs with hints? 🙂
@Artur:
If your take from the blogpost is that “let’s redo our tables to save some space”, then I clearly didn’t underline my point enough.
How did you calculate the size of the table based on the number of rows.
Could you please elaborate
@Kart:
can you point me to which place in the blogpost you’re referring to? I re-skimmed the post, but I don’t see me calculating size of table in any place?