Waiting for PostgreSQL 18 – Add cost-based vacuum delay time to progress views.

On 11st of February 2025, Nathan Bossart committed patch:

Add cost-based vacuum delay time to progress views.
 
This commit adds the amount of time spent sleeping due to
cost-based delay to the pg_stat_progress_vacuum and
pg_stat_progress_analyze system views.  A new configuration
parameter named track_cost_delay_timing, which is off by default,
controls whether this information is gathered.  For vacuum, the
reported value includes the sleep time of any associated parallel
workers.  However, parallel workers only report their sleep time
once per second to avoid overloading the leader process.
 
Bumps catversion.
 
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal

Back in 2016 we got first version of simple reporting of vacuum progress – visible in pg_stat_progress_vacuum.

Since then we got number of improvements, more information, and more things that we can get progress info on.

This new change seems pretty cool, as it adds information on how much of the time it (so far) took vacuum to run was simple waiting because of one (or more) of:

Let's see how it works. I'd like the test to not take too long so I'll modify my settings so that even vacuum ran by hand will have some cost delays:

=$ SELECT name, setting FROM pg_settings  WHERE name  ~ 'vac.*cost';
             name             │ setting
──────────────────────────────┼─────────
 autovacuum_vacuum_cost_delay │ 2
 autovacuum_vacuum_cost_limit │ -1
 vacuum_cost_delay            │ 20
 vacuum_cost_limit            │ 200
 vacuum_cost_page_dirty       │ 20
 vacuum_cost_page_hit         │ 1
 vacuum_cost_page_miss        │ 2
(7 ROWS)

Please note the value for vacuum_cost_delay. Normally it's 0 – which means that when running vacuum manually (not via autovacuum) there is no delay.

Then, I started another psql (another from the one with changed vacuum_cost_delay), and ran in it:

=$ SELECT relid::regclass,phase,heap_blks_total,heap_blks_scanned,heap_blks_vacuumed,index_vacuum_count,max_dead_tuple_bytes,dead_tuple_bytes,num_dead_item_ids,indexes_total,indexes_processed,delay_time
FROM pg_stat_progress_vacuum WHERE pid = 309635 \watch 1

pid value was obtained using

=$ SELECT pg_backend_pid();

in the first psql.

For now, the output isn't really interesting:

 relid │ phase │ heap_blks_total │ heap_blks_scanned │ heap_blks_vacuumed │ index_vacuum_count │ max_dead_tuple_bytes │ dead_tuple_bytes │ num_dead_item_ids │ indexes_total │ indexes_processed │ delay_time
───────┼───────┼─────────────────┼───────────────────┼────────────────────┼────────────────────┼──────────────────────┼──────────────────┼───────────────────┼───────────────┼───────────────────┼────────────
(0 ROWS)

But that will change. So, back in first psql I made table so that vacuum has something to work on:

=$ CREATE TABLE test_it AS
SELECT i, repeat( 'depesz' || i::text, 200) AS payload
FROM generate_series(1,10000000) i;
 
=$ ALTER TABLE test_it ADD PRIMARY KEY (i);

This made ~ 900MB table, and 214MB index:

=$ \dt+ test_it
                                     List OF TABLES
 Schema │  Name   │ TYPE  │ Owner  │ Persistence │ Access method │  SIZE  │ Description
────────┼─────────┼───────┼────────┼─────────────┼───────────────┼────────┼─────────────
 public │ test_it │ TABLE │ depesz │ permanent   │ heap          │ 899 MB │ [NULL]
(1 ROW)
 
=$ \di+ test_it_pkey
                                            List OF indexes
 Schema │     Name     │ TYPE  │ Owner  │  TABLE  │ Persistence │ Access method │  SIZE  │ Description
────────┼──────────────┼───────┼────────┼─────────┼─────────────┼───────────────┼────────┼─────────────
 public │ test_it_pkey │ INDEX │ depesz │ test_it │ permanent   │ btree         │ 214 MB │ [NULL]
(1 ROW)

So, now. Let's give vacuum something to work on, and then immediately run vacuum:

=$ DELETE FROM test_it WHERE random() < 0.75;
DELETE 7499475
TIME: 5164.567 ms (00:05.165)
 
=$ vacuum test_it;
VACUUM
TIME: 582264.662 ms (09:42.265)

While the vacuum was running the second psql was showing progress info. I will not show the whole thing, just couple of interesting moments.

First time we got a row:

  relid  │     phase     │ heap_blks_total │ heap_blks_scanned │ heap_blks_vacuumed │ index_vacuum_count │ max_dead_tuple_bytes │ dead_tuple_bytes │ num_dead_item_ids │ indexes_total │ indexes_processed │ delay_time
─────────┼───────────────┼─────────────────┼───────────────────┼────────────────────┼────────────────────┼──────────────────────┼──────────────────┼───────────────────┼───────────────┼───────────────────┼────────────
 test_it │ scanning heap │          1150084300067108864204800128300915.843569
(1 ROW)

Two logs (with 1 second between them) that show that delay counter isn't zeroed on phase change:

  relid  │     phase     │ heap_blks_total │ heap_blks_scanned │ heap_blks_vacuumed │ index_vacuum_count │ max_dead_tuple_bytes │ dead_tuple_bytes │ num_dead_item_ids │ indexes_total │ indexes_processed │  delay_time
─────────┼───────────────┼─────────────────┼───────────────────┼────────────────────┼────────────────────┼──────────────────────┼──────────────────┼───────────────────┼───────────────┼───────────────────┼──────────────
 test_it │ scanning heap │          11500811490700671088649502720749491300251967.61019
(1 ROW)
 
  relid  │       phase       │ heap_blks_total │ heap_blks_scanned │ heap_blks_vacuumed │ index_vacuum_count │ max_dead_tuple_bytes │ dead_tuple_bytes │ num_dead_item_ids │ indexes_total │ indexes_processed │  delay_time
─────────┼───────────────────┼─────────────────┼───────────────────┼────────────────────┼────────────────────┼──────────────────────┼──────────────────┼───────────────────┼───────────────┼───────────────────┼──────────────
 test_it │ vacuuming indexes │          11500811500800671088649502720749947510252949.01199
(1 ROW)

You can also see here interesting thing. In these two runs above difference in delay time was 252949.01199 – 251967.61019 = 981.40180. Given that unit is millisecond, it means that during one second of time, 981 milliseconds vacuum spent sleeping because of cost.

Which shows that the settings for cost delays were causing vacuum to work only for ~ 2% of time.

Finally, at the end, the last logged line was:

  relid  │     phase      │ heap_blks_total │ heap_blks_scanned │ heap_blks_vacuumed │ index_vacuum_count │ max_dead_tuple_bytes │ dead_tuple_bytes │ num_dead_item_ids │ indexes_total │ indexes_processed │  delay_time
─────────┼────────────────┼─────────────────┼───────────────────┼────────────────────┼────────────────────┼──────────────────────┼──────────────────┼───────────────────┼───────────────┼───────────────────┼───────────────
 test_it │ vacuuming heap │          1150081150081148281671088649502720749947500575899.343048
(1 ROW)

So, in total, out of 582264.662 that vacuum took (that I know thanks to psql's \timing, 575899.343 ms was spent sleeping. 98.9%.

Anyway – it's great tool. Kinda makes me wish this value was also logged (at the end of vacuum) to pg logs…

But, as it is – it's great, and I'm grateful to everyone involved in making it happen. Thanks 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.