On 3rd of April 2024, Tom Lane committed patch:
Invent SERIALIZE option for EXPLAIN. EXPLAIN (ANALYZE, SERIALIZE) allows collection of statistics about the volume of data emitted by a query, as well as the time taken to convert the data to the on-the-wire format. Previously there was no way to investigate this without actually sending the data to the client, in which case network transmission costs might swamp what you wanted to see. In particular this feature allows investigating the costs of de-TOASTing compressed or out-of-line data during formatting. Stepan Rutz and Matthias van de Meent, reviewed by Tomas Vondra and myself Discussion: https://postgr.es/m/ca0adb0e-fa4e-c37e-1cd7-91170b18cae1@gmx.de
This will be HUGE.
Let's look at simple example, database for explain.depesz.com. Let's see how long it would take to get all plans:
=$ EXPLAIN (analyze) SELECT * FROM simple; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Seq Scan ON simple (cost=0.00..210108.80 ROWS=4466880 width=267) (actual TIME=0.016..711.977 ROWS=1614584 loops=1) Planning TIME: 0.198 ms Execution TIME: 749.749 ms (3 ROWS)
Pretty OK. But it's not full picture. The thing is that while it did run most of query, it didn't really get the data to memory from toasted columns (and potentially some other places).
Now, with this new SERIALIZE option I can:
=$ EXPLAIN (analyze, serialize) SELECT * FROM simple; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Seq Scan ON simple (cost=0.00..210108.80 ROWS=4466880 width=267) (actual TIME=0.011..399.249 ROWS=1614584 loops=1) Planning TIME: 0.028 ms Serialization: TIME=19479.433 ms output=28074610kB format=text Execution TIME: 19961.576 ms (4 ROWS)
Please note that we got new line, and much higher total execution cost. Getting data from table was simple to do in ~ 750ms. But getting data from toast, and generating output to send to client – it took almost 20 seconds.
Plus, please note that explain shows that there were rows=1614584 with width=267 – which would be ~ 410MB.
But, serialization showed that full output was output=28074610kB – 26GB! Over 60 times more.
This is great. Very good news to anyone working on query optimization. Thanks a lot to everyone that worked on this.