On 23rd of January, Alvaro Herrera committed patch:
vacuumdb: enable parallel mode This mode allows vacuumdb to open several server connections to vacuum or analyze several tables simultaneously. Author: Dilip Kumar. Some reworking by Álvaro Herrera Reviewed by: Jeff Janes, Amit Kapila, Magnus Hagander, Andres Freund
This is great for multi-table vacuums/analyzes. Previously, when running “vacuumdb some_database" you got all tables vacuumed in some kind of order.
We can see it, by creating some test tables:
for a in $( pwgen -A0 6 20 ) do psql -c "create table test_$a as select generate_series(1, 1000000)::int4 as i" done
It did create for me these tables:
$ \d List OF relations Schema | Name | TYPE | Owner --------+-------------+-------+-------- public | test_aeshae | TABLE | depesz public | test_aijoox | TABLE | depesz public | test_eeyies | TABLE | depesz public | test_eicaeb | TABLE | depesz public | test_ieveov | TABLE | depesz public | test_isoexu | TABLE | depesz public | test_jaidaf | TABLE | depesz public | test_meijie | TABLE | depesz public | test_ohceec | TABLE | depesz public | test_okaeph | TABLE | depesz public | test_ooghap | TABLE | depesz public | test_oojuxo | TABLE | depesz public | test_sheeru | TABLE | depesz public | test_shofae | TABLE | depesz public | test_thaepu | TABLE | depesz public | test_uuyohl | TABLE | depesz public | test_uyuque | TABLE | depesz public | test_wainie | TABLE | depesz public | test_yiegah | TABLE | depesz public | test_zeecie | TABLE | depesz (20 ROWS)
If I'll run the vacuumdb now, normally, it will show me:
=$ vacuumdb -e vacuumdb: vacuuming DATABASE "depesz" VACUUM;
I.e. just “vacuum" was run, without any table names – it will run database-wide vacuum that will work in one backend, and do all tables sequentially.
I could have influenced order by doing something like:
vacuumdb -e -t test_zeecie -t test_yiegah -t test_wainie -t test_uyuque -t test_uuyohl -t test_thaepu -t test_shofae -t test_sheeru -t test_oojuxo -t test_ooghap -t test_okaeph -t test_ohceec -t test_meijie -t test_jaidaf -t test_isoexu -t test_ieveov -t test_eicaeb -t test_eeyies -t test_aijoox -t test_aeshae vacuumdb: vacuuming database "depesz" VACUUM test_zeecie; VACUUM test_yiegah; VACUUM test_wainie; VACUUM test_uyuque; VACUUM test_uuyohl; VACUUM test_thaepu; VACUUM test_shofae; VACUUM test_sheeru; VACUUM test_oojuxo; VACUUM test_ooghap; VACUUM test_okaeph; VACUUM test_ohceec; VACUUM test_meijie; VACUUM test_jaidaf; VACUUM test_isoexu; VACUUM test_ieveov; VACUUM test_eicaeb; VACUUM test_eeyies; VACUUM test_aijoox; VACUUM test_aeshae;
But, all tables were vacuumed in order anyway.
Now, I can add -j option, and it will parallelize processing:
$ vacuumdb -e -j8 SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns WHERE relkind IN ('r', 'm') AND c.relnamespace = ns.oid ORDER BY c.relpages DESC; VACUUM public.test_ohceec; VACUUM public.test_zeecie; VACUUM public.test_jaidaf; VACUUM public.test_aeshae; VACUUM public.test_ieveov; VACUUM public.test_yiegah; VACUUM public.test_shofae; VACUUM public.test_thaepu; VACUUM public.test_sheeru; VACUUM public.test_uuyohl; VACUUM public.test_eicaeb; VACUUM public.test_ooghap; VACUUM public.test_oojuxo; VACUUM public.test_okaeph; VACUUM public.test_meijie; VACUUM public.test_aijoox; VACUUM public.test_eeyies; VACUUM public.test_isoexu; VACUUM public.test_uyuque; VACUUM public.test_wainie; VACUUM pg_catalog.pg_proc; VACUUM pg_catalog.pg_depend; VACUUM pg_catalog.pg_attribute; VACUUM pg_catalog.pg_description; VACUUM pg_catalog.pg_statistic; VACUUM pg_catalog.pg_operator; VACUUM pg_catalog.pg_rewrite; VACUUM pg_catalog.pg_type; VACUUM pg_catalog.pg_class; VACUUM information_schema.sql_features; VACUUM pg_catalog.pg_amop; VACUUM pg_catalog.pg_amproc; VACUUM pg_catalog.pg_conversion; VACUUM pg_catalog.pg_opclass; VACUUM pg_catalog.pg_index; VACUUM pg_catalog.pg_aggregate; VACUUM pg_catalog.pg_collation; VACUUM pg_catalog.pg_opfamily; VACUUM pg_catalog.pg_cast; VACUUM pg_catalog.pg_ts_config_map; VACUUM pg_catalog.pg_language; VACUUM pg_catalog.pg_authid; VACUUM pg_catalog.pg_constraint; VACUUM pg_catalog.pg_am; VACUUM pg_catalog.pg_namespace; VACUUM pg_catalog.pg_database; VACUUM pg_catalog.pg_tablespace; VACUUM pg_catalog.pg_pltemplate; VACUUM pg_catalog.pg_shdepend; VACUUM pg_catalog.pg_shdescription; VACUUM pg_catalog.pg_ts_config; VACUUM pg_catalog.pg_ts_dict; VACUUM pg_catalog.pg_ts_parser; VACUUM pg_catalog.pg_ts_template; VACUUM pg_catalog.pg_extension; VACUUM pg_catalog.pg_range; VACUUM information_schema.sql_implementation_info; VACUUM information_schema.sql_languages; VACUUM information_schema.sql_parts; VACUUM information_schema.sql_sizing; VACUUM information_schema.sql_packages; VACUUM information_schema.sql_sizing_profiles; VACUUM pg_catalog.pg_seclabel; VACUUM pg_catalog.pg_default_acl; VACUUM pg_catalog.pg_policy; VACUUM pg_catalog.pg_foreign_table; VACUUM pg_catalog.pg_foreign_server; VACUUM pg_catalog.pg_foreign_data_wrapper; VACUUM pg_catalog.pg_auth_members; VACUUM pg_catalog.pg_db_role_setting; VACUUM pg_catalog.pg_event_trigger; VACUUM pg_catalog.pg_trigger; VACUUM pg_catalog.pg_largeobject_metadata; VACUUM pg_catalog.pg_inherits; VACUUM pg_catalog.pg_attrdef; VACUUM pg_catalog.pg_enum; VACUUM pg_catalog.pg_shseclabel; VACUUM pg_catalog.pg_user_mapping; VACUUM pg_catalog.pg_largeobject;
As we can see – it initially gets list of tables to vacuum, sorted by relpages – which is basically size of the table.
And then vacuums these in parallel, using many backends. In pg_stat_activity it looks like:
$ SELECT pid, query FROM pg_stat_activity WHERE pid <> pg_backend_pid() pid | query ------+---------------------------- 5981 | VACUUM public.test_ohceec; 5982 | VACUUM public.test_zeecie; 5983 | VACUUM public.test_jaidaf; 5984 | VACUUM public.test_aeshae; 5985 | VACUUM public.test_ieveov; 5986 | VACUUM public.test_yiegah; 5987 | VACUUM public.test_shofae; 5988 | VACUUM public.test_thaepu; (8 ROWS) Nice. Starting WITH largest TABLES FIRST IS great. Thanks guys.