Does varchar(n) use less disk space than varchar() or text?

Some time ago on Slack some person said:

varchar is better (storage efficiency), i recommend using it for less than 2048 chars, for the best : TEXT

There was discussion that followed, the person that claimed this efficiency never backed their claim, saying only that:

VARChar takes much less ‘place' than TEXT … but have to face it to believe it

and when I asked for

Show me db fiddle with queries that show this difference and I'll believe

I got:

HA ha ! i let you dig it i've some work ….

I reacted (over reacted), and the thread died.

I know (knew?) that varchar/text are the same internally (wrote about it), but perhaps there have been recent change?

So. Since the person claiming that there is difference didn't provide any proof, I decided to look into it myself.

Since the difference is supposedly around (less than) 2kB, I decided to test string lengths:

10 characters
100 characters
500 characters
1000 characters
1500 characters
2000 characters
2500 characters
4000 characters
8000 characters
10000 characters

I made three tables for each length:

vc_n_NUMBER – having single column with varchar datatype, no length limit
vc_NUMBER_NUMBER – having single column with varchar(NUMBER + 2) length limit
text_NUMBER – having single volumn with text datatype

Then, into all three tables, I loaded the same set of 1 million values, where each value has NUMBER length, and each value is random combination of ‘a'..'z', ‘A'..'Z', ‘0'..'9′.

To do this, I wrote this simple ruby script:

chars = ("a".."z").to_a + ("A".."Z").to_a + ("0".."9").to_a
[ 10, 100, 500, 1000, 1500, 2000, 2500, 4000, 8000, 10000 ].each do |len|
  File.open("strings-#{len}.data", "w") do |f|
    1000000.times do
      f.write((1..len).map { chars.sample }.join('') + "\n")
    end
  end
end

Loaded it all and got results:

String length	Table size, in bytes
String length	vc_n_NUMBER	vc_NUMBER_NUMBER	text_NUMBER
10	44,335,104	44,326,912	44,335,104
100	134,373,376	134,365,184	134,373,376
500	546,324,480	546,324,480	546,324,480
1,000	1,170,653,184	1,170,653,184	1,170,653,184
1,500	1,638,891,520	1,638,891,520	1,638,891,520
2,000	2,048,598,016	2,048,598,016	2,048,598,016
2,500	2,828,615,680	2,828,615,680	2,828,615,680
4,000	5,582,495,744	5,582,495,744	5,582,495,744
8,000	8,420,007,936	8,420,007,936	8,420,007,936
10,000	12,478,431,232	12,478,431,232	12,478,431,232

It might be hard to see, but there actually IS a difference.

In case of lengths 10 and 100, table with varchar(NUMBER) column is smaller.

In both cases difference is exactly 8192 bytes.

Upon closer examination, the difference stems from the fact that for these tables PostgreSQL didn't make toast tables:

$ select relname, reltoastrelid from pg_class  where relname ~ '^(text|vc).*_(10|100|500)$';
  relname   │ reltoastrelid 
────────────┼───────────────
 vc_n_10    │        261293
 vc_10_10   │             0
 text_10    │        261301
 vc_n_100   │        261306
 vc_100_100 │             0
 text_100   │        261314
 vc_n_500   │        261319
 vc_500_500 │        261324
 text_500   │        261329
(9 rows)

Verified with pg_relation_size() calls that the base tables are the same size. So the 8kB difference is simply because of TOAST table that was created, though it's empty. At least that's how it looks for me.

I SO wish people would stop spreading misinformation, but apparently it's not going to happen. Luckily had some time to run tests, and provide this blogpost that shows that there is no REAL difference between storage size for text, varchar, and varchar(n).

2 thoughts on “Does varchar(n) use less disk space than varchar() or text?”

Luca Ferrari says:

2021-11-22 at 08:52

Really interesting.
I would expect also vc_500_500 to not have any toast associated, since I would think that 502 bytes per tuple can fit easily in 8kB data page, even leaving room for tids and other bits. Apparently there is a threshold over which PostgreSQL decides to toast anyway.
Laurenz Albe says:

2021-11-23 at 10:25

Good post!
@Luca Ferrari: In UTF-8 encoding, a character can have up to 4 bytes. 500 * 4 = 2000, which exceeds TOAST_TUPLE_TARGET.

Comments are closed.