Waiting for PostgreSQL 18 – Allow parallel CREATE INDEX for GIN indexes

On 3rd of March 2025, Tomas Vondra committed patch:

Allow parallel CREATE INDEX for GIN indexes
 
Allow using parallel workers to build a GIN index, similarly to BTREE
and BRIN. For large tables this may result in significant speedup when
the build is CPU-bound.
 
The work is divided so that each worker builds index entries on a subset
of the table, determined by the regular parallel scan used to read the
data. Each worker uses a local tuplesort to sort and merge the entries
for the same key. The TID lists do not overlap (for a given key), which
means the merge sort simply concatenates the two lists. The merged
entries are written into a shared tuplesort for the leader.
 
The leader needs to merge the sorted entries again, before writing them
into the index. But this way a significant part of the work happens in
the workers, and the leader is left with merging fewer large entries,
which is more efficient.
 
Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for GIN indexes
(e.g. uniqueness checks).
 
Original patch by me, with reviews and substantial improvements by
Matthias van de Meent, certainly enough to make him a co-author.
 
Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com

GIN indexes are great, but are rather slow to be created. So this change has potential to be a real game changer in some cases.

Let's try to test it. I will need some data that can be searched using GIN, so let's write some arrays:

#!/usr/bin/env ruby
# frozen_string_literal: true
arr = (1..100_000).to_a
1_000_000.times do
  puts "{#{arr.sample(rand() * 100 + 50).join(',')}}"
end

This outputs 1 million random arrays, where each array has random number of elements (from 50 to 149), where each element is number from 1 to 100,000. Saved output from this to /tmp/test.arrays.txt, and it was ~ 560MB.

Now, I can make a table, and load the data:

=$ create table test_gin (
    the_array int4[]
);
=$ \copy test_gin from /tmp/test.arrays.txt

Afterwards the table is 444MB, as reported by \dt+ test_gin.

So, I made sure that Pg will not use parallelism, and made the index:

=$ set max_parallel_maintenance_workers = 0;
SET
 
=$ create index qq on test_gin using gin (the_array);
CREATE INDEX
Time: 77111.482 ms (01:17.111)

Of course, we should see that the index is now used for queries:

=$ explain (analyze, buffers) select * from test_gin where the_array @> '{1,2}'::int4[];
                                                    QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Bitmap Heap Scan on test_gin  (cost=51.31..150.05 rows=25 width=421) (actual time=0.093..0.094 rows=1.00 loops=1)
   Recheck Cond: (the_array @> '{1,2}'::integer[])
   Heap Blocks: exact=1
   Buffers: shared hit=4 read=6
   I/O Timings: shared read=0.017
   ->  Bitmap Index Scan on qq  (cost=0.00..51.30 rows=25 width=0) (actual time=0.082..0.082 rows=1.00 loops=1)
         Index Cond: (the_array @> '{1,2}'::integer[])
         Buffers: shared hit=4 read=5
         I/O Timings: shared read=0.013
 Planning:
   Buffers: shared hit=12 read=6 dirtied=4
   I/O Timings: shared read=0.550
 Planning Time: 0.764 ms
 Execution Time: 0.112 ms
(14 rows)

Also checked, for my own sanity, size, and \di+ qq reported size of index as 1099MB.

Now, let's see if I can make it in parallel:

=$ drop index qq;
DROP INDEX
 
=$ set max_parallel_maintenance_workers = 10;
SET
 
=$ create index qq on test_gin using gin (the_array);
CREATE INDEX
Time: 53602.822 ms (00:53.603)

While index creation was ongoing, ps output showed:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
…
pgdba     361901 10.2  0.3 282400 209020 ?       Rs   12:41   2:41      \_ postgres: depesz depesz [local] CREATE INDEX
pgdba     391026 99.4  0.0 270036 60428 ?        Rs   13:07   0:29      \_ postgres: parallel worker for PID 361901

Which shows that there were only 2 parallel workers (main process and one extra). Time decreased by around 30%, which isn't bad. The weird part is this:

$ \di+ qq
                                        List of indexes
 Schema │ Name │ Type  │ Owner  │  Table   │ Persistence │ Access method │  Size  │ Description
────────┼──────┼───────┼────────┼──────────┼─────────────┼───────────────┼────────┼─────────────
 public │ qq   │ index │ depesz │ test_gin │ permanent   │ gin           │ 769 MB │ [null]
(1 row)

Index is smaller?! No idea how that happened, and this makes me somewhat anxious, but I did verify that the index works OK:

=$ explain (analyze, buffers) select * from test_gin where the_array @> '{1,2}'::int4[];
                                                    QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Bitmap Heap Scan on test_gin  (cost=34.31..133.05 rows=25 width=421) (actual time=0.152..0.153 rows=1.00 loops=1)
   Recheck Cond: (the_array @> '{1,2}'::integer[])
   Heap Blocks: exact=1
   Buffers: shared hit=4 read=6
   I/O Timings: shared read=0.026
   ->  Bitmap Index Scan on qq  (cost=0.00..34.30 rows=25 width=0) (actual time=0.143..0.144 rows=1.00 loops=1)
         Index Cond: (the_array @> '{1,2}'::integer[])
         Buffers: shared hit=4 read=5
         I/O Timings: shared read=0.023
 Planning:
   Buffers: shared hit=5 read=2 dirtied=1
   I/O Timings: shared read=0.015
 Planning Time: 0.200 ms
 Execution Time: 0.176 ms
(14 rows)

Maybe someone smarter will be able to tell me why index made with no parallelism was larger, but in any way – it looks great.

Thanks to everyone who made it possible.

2 thoughts on “Waiting for PostgreSQL 18 – Allow parallel CREATE INDEX for GIN indexes”

  1. > why the index made with no parallelism was larger

    Parallel GIN index builds use sorting to pre-arrange the data in a way that’s efficient to insert into the index (similar to -but not exactly like- btree), whereas non-parallel index builds insert data directly into the index (from the index’ point of view not very different from CREATE TABLE; CREATE INDEX; INSERT INTO TABLE)

    Because GIN stores its data in order, inserting the data in that same order will improve performance (the index insertions are sequential, not random, thus reducing random buffer IO) and will allow the GIN build process to pack tuples more compactly (the insertions generally only happen on the rightmost page, allowing for more efficient page splits)

    It’s not yet optimal (we still descend the tree for every value we insert), but it’s already much better than in the sequential case.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.