On 7th of February, Robert Haas committed patch:
Sometimes it may be useful to get actual row counts out of EXPLAIN (ANALYZE) without paying the cost of timing every node entry/exit. With this patch, you can say EXPLAIN (ANALYZE, TIMING OFF) to get that. Tomas Vondra, reviewed by Eric Theise, with minor doc changes by me.
The commit log says it all – we got new option to EXPLAIN.
But how much does it exactly help?
Let's see on a very specific case:
$ EXPLAIN (analyze ON) SELECT ( SELECT 1 WHERE -1 < z ) FROM generate_series(1,10000000) z; QUERY PLAN ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── FUNCTION Scan ON generate_series z (cost=0.00..22.50 ROWS=1000 width=4) (actual TIME=1091.010..6050.521 ROWS=10000000 loops=1) SubPlan 1 -> RESULT (cost=0.00..0.01 ROWS=1 width=0) (actual TIME=0.000..0.000 ROWS=1 loops=10000000) One-TIME FILTER: ((-1) < z.z) Total runtime: 6443.722 ms (5 ROWS)
and the same query – which is a very tight loop – without timing:
$ EXPLAIN (analyze ON, timing off) SELECT ( SELECT 1 WHERE -1 < z ) FROM generate_series(1,10000000) z; QUERY PLAN ───────────────────────────────────────────────────────────────────────────────────────────────────────── FUNCTION Scan ON generate_series z (cost=0.00..22.50 ROWS=1000 width=4) (actual ROWS=10000000 loops=1) SubPlan 1 -> RESULT (cost=0.00..0.01 ROWS=1 width=0) (actual ROWS=1 loops=10000000) One-TIME FILTER: ((-1) < z.z) Total runtime: 6083.863 ms (5 ROWS)
The difference is not huge, but that could be because I'm running on bare metal. I heard that getting time on some virtualized systems is much more expensive, so such option, that would let you skip the timing, would be pretty helpful.
Personally – I don't use virtualization, but I know that many people do, so this could be a big win in such cases. The only problem is that explain.depesz.com doesn't (yet) parse such plans. But I'll look into it.
Just a follow-on. There are other cases where timing information is expensive to get:
* Older multi-CPU servers didn’t keep the TSC (timestamp counter) synchronized between CPU sockets, so it was unstable for direct user space access
* Older CPUs also changed the TSC frequency with power management changes, making it unstable
* Linux bypasses timing syscalls and reads time directly in user space when TSC timing is stable, but other OSes (*BSDs for example) don’t have this optimization
If the TSC is unstable, kernels have to revert to slower hardware timing sources. And it makes a *huge* difference. You can try for yourself on Linux with the command:
echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
TL;DR: on my computer, your example takes:
* 16 seconds, with timing off
* 18 seconds, timing with tsc clocksource
* 75 seconds, timing with hpet clocksource
@Marti:
thanks for very informative comment.