Waiting for PostgreSQL 18 – Allow parallel CREATE INDEX for GIN indexes

On 3rd of March 2025, Tomas Vondra committed patch:

Allow parallel CREATE INDEX for GIN indexes
 
Allow using parallel workers to build a GIN index, similarly to BTREE
and BRIN. For large tables this may result in significant speedup when
the build is CPU-bound.
 
The work is divided so that each worker builds index entries on a subset
of the table, determined by the regular parallel scan used to read the
data. Each worker uses a local tuplesort to sort and merge the entries
for the same key. The TID lists do not overlap (for a given key), which
means the merge sort simply concatenates the two lists. The merged
entries are written into a shared tuplesort for the leader.
 
The leader needs to merge the sorted entries again, before writing them
into the index. But this way a significant part of the work happens in
the workers, and the leader is left with merging fewer large entries,
which is more efficient.
 
Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for GIN indexes
(e.g. uniqueness checks).
 
Original patch by me, with reviews and substantial improvements by
Matthias van de Meent, certainly enough to make him a co-author.
 
Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com

Continue reading Waiting for PostgreSQL 18 – Allow parallel CREATE INDEX for GIN indexes

Waiting for PostgreSQL 13 – Add leader_pid to pg_stat_activity

On 6th of February 2020, Michael Paquier committed patch:

Add leader_pid to pg_stat_activity
 
This new field tracks the PID of the group leader used with parallel
query.  For parallel workers and the leader, the value is set to the
PID of the group leader.  So, for the group leader, the value is the
same as its own PID.  Note that this reflects what PGPROC stores in
shared memory, so as leader_pid is NULL if a backend has never been
involved in parallel query.  If the backend is using parallel query or
has used it at least once, the value is set until the backend exits.
 
Author: Julien Rouhaud
Reviewed-by: Sergei Kornilov, Guillaume Lelarge, Michael Paquier, Tomas
Vondra
Discussion: https://postgr.es/m/CAOBaU_Yy5bt0vTPZ2_LUM6cUcGeqmYNoJ8-Rgto+c2+w3defYA@mail.gmail.com

Continue reading Waiting for PostgreSQL 13 – Add leader_pid to pg_stat_activity

Waiting for PostgreSQL 13 – Allow vacuum command to process indexes in parallel.

On 20th of January 2020, Amit Kapila committed patch:

Allow vacuum command to process indexes in parallel.
 
This feature allows the vacuum to leverage multiple CPUs in order to
process indexes.  This enables us to perform index vacuuming and index
cleanup with background workers.  This adds a PARALLEL option to VACUUM
command where the user can specify the number of workers that can be used
to perform the command which is limited by the number of indexes on a
table.  Specifying zero as a number of workers will disable parallelism.
This option can't be used with the FULL option.
 
Each index is processed by at most one vacuum process.  Therefore parallel
vacuum can be used when the table has at least two indexes.
 
The parallel degree is either specified by the user or determined based on
the number of indexes that the table has, and further limited by
max_parallel_maintenance_workers.  The index can participate in parallel
vacuum iff it's size is greater than min_parallel_index_scan_size.
 
Author: Masahiko Sawada and Amit Kapila
 
Mahendra Singh and Sergei Kornilov
 
Discussion:
https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com

Continue reading Waiting for PostgreSQL 13 – Allow vacuum command to process indexes in parallel.

Changes on explain.depesz.com

Recently got two bug reports:

  • plans with “COSTS OFF" do not parse, and error out (bugreport by Marc Dean Jr)
  • WorkTable Scan is not properly parsed (bugreport by Ivan Vergiliev)

Additionally, I was kinda upset because plans that include trigger calls did not display properly.

All of this has been fixed today:

Continue reading Changes on explain.depesz.com

Waiting for PostgreSQL 11 – Support parallel btree index builds.

Support parallel btree index builds.
 
 
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.
 
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.
 
Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
 
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com

Continue reading Waiting for PostgreSQL 11 – Support parallel btree index builds.

Fix for parallel scans on explain.depesz.com

On Friday, Zr40 reported on irc that explain.depesz.com doesn't show table/index names for Parallel Seq Scan nodes.

Checked it and found couple of other omissions of the same kind with other Parallel* scans.

Fixed (I hope) all of them in:

The change is not really big, but just figured I'll let you know.

Waiting for 9.6 – Support parallel aggregation.

On 21st of March, Robert Haas committed patch:

Support parallel aggregation.
 
Parallel workers can now partially aggregate the data and pass the
transition values back to the leader, which can combine the partial
results to produce the final answer.
 
David Rowley, based on earlier work by Haribabu Kommi.  Reviewed by
Álvaro Herrera, Tomas Vondra, Amit Kapila, James Sewell, and me.

Continue reading Waiting for 9.6 – Support parallel aggregation.

Waiting for 9.6 – Support parallel joins, and make related improvements.

On 20th of January, Robert Haas committed patch:

The core innovation of this patch is the introduction of the concept
of a partial path; that is, a path which if executed in parallel will
generate a subset of the output rows in each process.  Gathering a
partial path produces an ordinary (complete) path.  This allows us to
generate paths for parallel joins by joining a partial path for one
side (which at the baserel level is currently always a Partial Seq
Scan) to an ordinary path on the other side.  This is subject to
various restrictions at present, especially that this strategy seems
unlikely to be sensible for merge joins, so only nested loops and
hash joins paths are generated.
 
This also allows an Append node to be pushed below a Gather node in
the case of a partitioned table.
 
Testing revealed that early versions of this patch made poor decisions
in some cases, which turned out to be caused by the fact that the
original cost model for Parallel Seq Scan wasn't very good.  So this
patch tries to make some modest improvements in that area.
 
There is much more to be done in the area of generating good parallel
plans in all cases, but this seems like a useful step forward.
 
Patch by me, reviewed by Dilip Kumar and Amit Kapila.

Continue reading Waiting for 9.6 – Support parallel joins, and make related improvements.