On 2nd of April 2019, Alvaro Herrera committed patch:
Report progress of CREATE INDEX operations This uses the progress reporting infrastructure added by , adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY. There are two pieces to this: one is index-AM-agnostic, and the other is AM-specific. The latter is fairly elaborate for btrees, including reportage for parallel index builds and the separate phases that btree index creation uses; other index AMs, which are much simpler in their building procedures, have simplistic reporting only, but that seems sufficient, at least for non-concurrent builds. The index-AM-agnostic part is fairly complete, providing insight into the CONCURRENTLY wait phases as well as block-based progress during the index validation table scan. (The index validation index scan requires patching each AM, which has not been included here.) Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
Well, explanation is rather irrelevant, so let's just make some test data, and see how it works:
=$ CREATE TABLE test ( id serial PRIMARY KEY, payload text ); CREATE TABLE =$ copy test (payload) FROM program 'ruby -e "10000000.times { puts (0...50).map { (97 + rand(26)).chr }.join}"' COPY 10000000
Now, in one session I'll run:
=$ CREATE INDEX q ON test (payload);
and in another:
=$ SELECT now()::TIME(0), a.query, p.phase, p.blocks_total, p.blocks_done, p.tuples_total, p.tuples_done FROM pg_stat_progress_create_index p JOIN pg_stat_activity a ON p.pid = a.pid \watch 2
Progress data visible:
21:29:10 │ create index q on test (payload); │ building index: scanning table │ 103093 │ 11808 │ 0 │ 0 21:29:12 │ create index q on test (payload); │ building index: scanning table │ 103093 │ 35696 │ 0 │ 0 21:29:14 │ create index q on test (payload); │ building index: scanning table │ 103093 │ 59613 │ 0 │ 0 21:29:16 │ create index q on test (payload); │ building index: scanning table │ 103093 │ 83361 │ 0 │ 0 21:29:18 │ create index q on test (payload); │ building index: scanning table │ 103093 │ 103093 │ 0 │ 0 21:29:20 │ create index q on test (payload); │ building index: scanning table │ 103093 │ 103093 │ 0 │ 0 21:29:22 │ create index q on test (payload); │ building index: loading tuples in tree │ 0 │ 0 │ 10000000 │ 4303927 21:29:24 │ create index q on test (payload); │ building index: loading tuples in tree │ 0 │ 0 │ 10000000 │ 10000000
Nice. There are more columns in the table, but were irrelevant for my example:
=$ SELECT * FROM pg_stat_progress_create_index; ─[ RECORD 1 ]──────┬─────────────────────────────── pid │ 14296 datid │ 17564 datname │ depesz relid │ 17681 index_relid │ 0 phase │ building INDEX: scanning TABLE lockers_total │ 0 lockers_done │ 0 current_locker_pid │ 0 blocks_total │ 103093 blocks_done │ 23989 tuples_total │ 0 tuples_done │ 0 partitions_total │ 0 partitions_done │ 0
These will be useful in more complicated examples, like with concurrent connections, or partitions.
This is AMAZING. Thanks a lot to all involved.
Thanks for highlighting this feature.
Most of the other columns are used by CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY.
In particular, the “lockers_done”, total, and current PID, I believe will be very useful for people with high-concurrency servers who sometimes wonder what is a C.I.C. waiting on, since it has to wait for concurrent transactions at various points in the process, and it’s not easy to figure out what, how many are done, how many to wait are still running, etc.
Also, we now have REINDEX CONCURRENTLY in pg12, which will wait at even more points during the process. REINDEX will also appear in the CREATE INDEX progress view, thanks to a later commit by Peter Eisentraut.
Thanks for demonstrating the advantage of the path with example!