Waiting for PostgreSQL 17 – MERGE / SPLIT partitions

I thought about it for quite some time, whether I should write about it, and how. That's why there is delay since:

On 6th of April 2024, Alexander Korotkov committed patch:

Implement ALTER TABLE ... MERGE PARTITIONS ... command
 
This new DDL command merges several partitions into the one partition of the
target table.  The target partition is created using new
createPartitionTable() function with parent partition as the template.
 
This commit comprises quite naive implementation which works in single process
and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the
operations including the tuple routing.  This is why this new DDL command
can't be recommended for large partitioned tables under a high load.  However,
this implementation come in handy in certain cases even as is.
Also, it could be used as a foundation for future implementations with lesser
locking and possibly parallel.
 
Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval
Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby
Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires

and, also on 6th of April 2024, Alexander Korotkov committed patch:

Implement ALTER TABLE ... SPLIT PARTITION ... command
 
This new DDL command splits a single partition into several parititions.
Just like ALTER TABLE ... MERGE PARTITIONS ... command, new patitions are
created using createPartitionTable() function with parent partition as the
template.
 
This commit comprises quite naive implementation which works in single process
and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the
operations including the tuple routing.  This is why this new DDL command
can't be recommended for large partitioned tables under a high load.  However,
this implementation come in handy in certain cases even as is.
Also, it could be used as a foundation for future implementations with lesser
locking and possibly parallel.
 
Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval
Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby
Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires

These functionalities are great. It's basically single-command modification of partitions for a table.

I thought if/how to write about it for very simple reason: as they are now, they can't really be used in “interesting" cases. Full rewrite with full lock makes it unusable for large partitioned tables.

On the other hand, the syntax is great, and there is a chance that in future we will get concurrent versions of this.

So, let's see what we can do. Starting with what we cannot do. We cannot use this on hash-based partitions.

So, if you do stuff like:

CREATE TABLE whatever (
    id int8 generated always as identity,
    …
) PARTITION BY HASH (id);
CREATE TABLE whatever_0 partition OF whatever FOR VALUES WITH (MODULUS 50, REMAINDER 0);
CREATE TABLE whatever_1 partition OF whatever FOR VALUES WITH (MODULUS 50, REMAINDER 1);
…

then you can't split/merge using this syntax.

This leaves us with list or range.

Let's see it in action:

=$ CREATE TABLE whatever_range (
    id int8 generated always as identity,
    partkey text not null,
    payload int8
) PARTITION BY list (partkey);
CREATE TABLE
 
=$ CREATE TABLE whatever_range_abc partition OF whatever_range FOR VALUES IN ('a', 'b', 'c');
CREATE TABLE
 
=$ CREATE TABLE whatever_range_de partition OF whatever_range FOR VALUES IN ( 'd', 'e' );
CREATE TABLE

OK. Sanity check before changes:

=$ \d+ whatever_range
                                          Partitioned table "public.whatever_range"
 Column  |  Type  | Collation | Nullable |           Default            | Storage  | Compression | Stats target | Description 
---------+--------+-----------+----------+------------------------------+----------+-------------+--------------+-------------
 id      | bigint |           | not null | generated always as identity | plain    |             |              | 
 partkey | text   |           | not null |                              | extended |             |              | 
 payload | bigint |           |          |                              | plain    |             |              | 
Partition key: LIST (partkey)
Partitions: whatever_range_abc FOR VALUES IN ('a', 'b', 'c'),
            whatever_range_de FOR VALUES IN ('d', 'e')

Sweet, now, let's split value c from whatever_range_abc to separate partition:

=$ alter table whatever_range SPLIT partition whatever_range_abc into
    (
        partition whatever_range_ab FOR VALUES IN ('a', 'b'),
        partition whatever_range_c FOR VALUES IN ('c')
    );
ALTER TABLE

And layout looks like:

=$ \d+ whatever_range
                                          Partitioned table "public.whatever_range"
 Column  |  Type  | Collation | Nullable |           Default            | Storage  | Compression | Stats target | Description 
---------+--------+-----------+----------+------------------------------+----------+-------------+--------------+-------------
 id      | bigint |           | not null | generated always as identity | plain    |             |              | 
 partkey | text   |           | not null |                              | extended |             |              | 
 payload | bigint |           |          |                              | plain    |             |              | 
Partition key: LIST (partkey)
Partitions: whatever_range_ab FOR VALUES IN ('a', 'b'),
            whatever_range_c FOR VALUES IN ('c'),
            whatever_range_de FOR VALUES IN ('d', 'e')

Looks pretty cool. So let's try to merge now – add the _c to _de:

=$ alter table whatever_range merge partitions
    ( whatever_range_c, whatever_range_de )
    into whatever_range_cde;
ALTER TABLE
 
=$ \d+ whatever_range
                                          Partitioned table "public.whatever_range"
 Column  |  Type  | Collation | Nullable |           Default            | Storage  | Compression | Stats target | Description 
---------+--------+-----------+----------+------------------------------+----------+-------------+--------------+-------------
 id      | bigint |           | not null | generated always as identity | plain    |             |              | 
 partkey | text   |           | not null |                              | extended |             |              | 
 payload | bigint |           |          |                              | plain    |             |              | 
Partition key: LIST (partkey)
Partitions: whatever_range_ab FOR VALUES IN ('a', 'b'),
            whatever_range_cde FOR VALUES IN ('c', 'd', 'e')

Looks simple enough. So, let's try something more complicated – how about splitting these 2 partitions into 5?

With range based partitions it will work the same way, with one small limitations, which is kinda obvious.

All this is great. The problem is locking, and duration.

For example, let's add some data to my test case:

=$ insert into whatever_range (partkey, payload)
    select
        case when random() < .5 then 'a' else 'b' end,
        (random() * 1000000000)::int8
    from generate_series(1,10000000);
INSERT 0 10000000
Time: 12234.080 ms (00:12.234)

OK. It took 12 seconds to insert 10 million rows, and all of them went to whatever_range_ab:

=$ select tableoid::regclass, count(*) from whatever_range group by 1 order by 1;
     tableoid      │  count
───────────────────┼──────────
 whatever_range_ab │ 10000000
(1 row)

For sanity, let's also see relfilenodes, so we will be able to tell what has changed, in case of rewrite:

=$ select relname, oid, relfilenode, relkind, pg_relation_size(oid) from pg_class  where relname ~ '^whatever_range' and relkind in ('p', 'r');
      relname       │  oid  │ relfilenode │ relkind │ pg_relation_size
────────────────────┼───────┼─────────────┼─────────┼──────────────────
 whatever_range     │ 16438 │           0 │ p       │                0
 whatever_range_ab  │ 16451 │       16451 │ r       │        521789440
 whatever_range_cde │ 16461 │       16461 │ r       │                0
(3 rows)

So, now I'll split it into _a and _b, in a transaction, and will check locks:

=$ begin;
BEGIN
 
=$ alter table whatever_range SPLIT partition whatever_range_ab into
    (
        partition whatever_range_a FOR VALUES IN ('a'),
        partition whatever_range_b FOR VALUES IN ('b')
    );
ALTER TABLE
Time: 5490.501 ms (00:05.491)
 
=$ select l.relation, c.relname, c.relkind from pg_locks l left join pg_class c on l.relation = c.oid where l.pid = 78058 and l.mode ='AccessExclusiveLock' and l.locktype = 'relation' order by 2;
 relation │       relname        │ relkind
──────────┼──────────────────────┼─────────
    16480 │ pg_toast_16476_index │ i
    16485 │ pg_toast_16481_index │ i
    16438 │ whatever_range       │ p
    16476 │ whatever_range_a     │ r
    16481 │ whatever_range_b     │ r
    16454 │ [null]               │ [null]
    16451 │ [null]               │ [null]
    16455 │ [null]               │ [null]
(8 rows)

The [null] relations are original whatever_range_ab, and its toast relations for table, and index.

Afterwards pg_class data looks like:

=$ select relname, oid, relfilenode, relkind, pg_relation_size(oid) from pg_class  where relname ~ '^whatever_range' and relkind in ('p', 'r');
      relname       │  oid  │ relfilenode │ relkind │ pg_relation_size
────────────────────┼───────┼─────────────┼─────────┼──────────────────
 whatever_range     │ 16438 │           0 │ p       │                0
 whatever_range_a   │ 16476 │       16476 │ r       │        261095424
 whatever_range_b   │ 16481 │       16481 │ r       │        261095424
 whatever_range_cde │ 16461 │       16461 │ r       │                0
(4 rows)

So, as you can see _a and _b, are totally new relations, and the _ab is gone.

On larger databases, where partitions will be larger than 500MB, this process will take longer, and while it is happening – all accesses to main table (whatever_range) are blocked.

All said, it's great addition, and if/when we'll get version that works CONCURRENTLY then it will be truly revolutionary, but even without this, it will make lives of many dbas way simpler.

Thanks a lot, to everyone involved.

2 thoughts on “Waiting for PostgreSQL 17 – MERGE / SPLIT partitions”

Krystophe says:

2024-10-17 at 10:06

This feature didn’t make it in PG17 for security reasons, see https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=84f594da358861cceeaeb7a97bb58f3765eeb284

I hope it will be part of PG18.
depesz says:

2024-10-17 at 10:14

@Krystophe:
thanks for info.

Comments are closed.