Some time ago on Slack some person said:
varchar is better (storage efficiency), i recommend using it for less than 2048 chars, for the best : TEXT
There was discussion that followed, the person that claimed this efficiency never backed their claim, saying only that:
VARChar takes much less ‘place' than TEXT … but have to face it to believe it
and when I asked for
Show me db fiddle with queries that show this difference and I'll believe
I got:
HA ha ! i let you dig it  i've some work ….
I reacted (over reacted), and the thread died.
I know (knew?) that varchar/text are the same internally (wrote about it), but perhaps there have been recent change?
So. Since the person claiming that there is difference didn't provide any proof, I decided to look into it myself.
Since the difference is supposedly around (less than) 2kB, I decided to test string lengths:
- 10 characters
- 100 characters
- 500 characters
- 1000 characters
- 1500 characters
- 2000 characters
- 2500 characters
- 4000 characters
- 8000 characters
- 10000 characters
I made three tables for each length:
- vc_n_NUMBER – having single column with varchar datatype, no length limit
- vc_NUMBER_NUMBER – having single column with varchar(NUMBER + 2) length limit
- text_NUMBER – having single volumn with text datatype
Then, into all three tables, I loaded the same set of 1 million values, where each value has NUMBER length, and each value is random combination of ‘a'..'z', ‘A'..'Z', ‘0'..'9′.
To do this, I wrote this simple ruby script:
chars = ("a".."z").to_a + ("A".."Z").to_a + ("0".."9").to_a [ 10, 100, 500, 1000, 1500, 2000, 2500, 4000, 8000, 10000 ].each do |len| File.open("strings-#{len}.data", "w") do |f| 1000000.times do f.write((1..len).map { chars.sample }.join('') + "\n") end end end
Loaded it all and got results:
String length | Table size, in bytes | ||
---|---|---|---|
vc_n_NUMBER | vc_NUMBER_NUMBER | text_NUMBER | |
10 | 44,335,104 | 44,326,912 | 44,335,104 |
100 | 134,373,376 | 134,365,184 | 134,373,376 |
500 | 546,324,480 | 546,324,480 | 546,324,480 |
1,000 | 1,170,653,184 | 1,170,653,184 | 1,170,653,184 |
1,500 | 1,638,891,520 | 1,638,891,520 | 1,638,891,520 |
2,000 | 2,048,598,016 | 2,048,598,016 | 2,048,598,016 |
2,500 | 2,828,615,680 | 2,828,615,680 | 2,828,615,680 |
4,000 | 5,582,495,744 | 5,582,495,744 | 5,582,495,744 |
8,000 | 8,420,007,936 | 8,420,007,936 | 8,420,007,936 |
10,000 | 12,478,431,232 | 12,478,431,232 | 12,478,431,232 |
It might be hard to see, but there actually IS a difference.
In case of lengths 10 and 100, table with varchar(NUMBER) column is smaller.
In both cases difference is exactly 8192 bytes.
Upon closer examination, the difference stems from the fact that for these tables PostgreSQL didn't make toast tables:
$ SELECT relname, reltoastrelid FROM pg_class WHERE relname ~ '^(text|vc).*_(10|100|500)$'; relname │ reltoastrelid ────────────┼─────────────── vc_n_10 │ 261293 vc_10_10 │ 0 text_10 │ 261301 vc_n_100 │ 261306 vc_100_100 │ 0 text_100 │ 261314 vc_n_500 │ 261319 vc_500_500 │ 261324 text_500 │ 261329 (9 ROWS)
Verified with pg_relation_size() calls that the base tables are the same size. So the 8kB difference is simply because of TOAST table that was created, though it's empty. At least that's how it looks for me.
I SO wish people would stop spreading misinformation, but apparently it's not going to happen. Luckily had some time to run tests, and provide this blogpost that shows that there is no REAL difference between storage size for text, varchar, and varchar(n).
Really interesting.
I would expect also vc_500_500 to not have any toast associated, since I would think that 502 bytes per tuple can fit easily in 8kB data page, even leaving room for tids and other bits. Apparently there is a threshold over which PostgreSQL decides to toast anyway.
Good post!
@Luca Ferrari: In UTF-8 encoding, a character can have up to 4 bytes. 500 * 4 = 2000, which exceeds TOAST_TUPLE_TARGET.