On 7th of April 2018, Alvaro Herrera committed patch:
Support partition pruning at execution time Existing partition pruning is only able to work at plan time, for query quals that appear in the parsed query. This is good but limiting, as there can be parameters that appear later that can be usefully used to further prune partitions. This commit adds support for pruning subnodes of Append which cannot possibly contain any matching tuples, during execution, by evaluating Params to determine the minimum set of subnodes that can possibly match. We support more than just simple Params in WHERE clauses. Support additionally includes: 1. Parameterized Nested Loop Joins: The parameter from the outer side of the join can be used to determine the minimum set of inner side partitions to scan. 2. Initplans: Once an initplan has been executed we can then determine which partitions match the value from the initplan. Partition pruning is performed in two ways. When Params external to the plan are found to match the partition key we attempt to prune away unneeded Append subplans during the initialization of the executor. This allows us to bypass the initialization of non-matching subplans meaning they won't appear in the EXPLAIN or EXPLAIN ANALYZE output. For parameters whose value is only known during the actual execution then the pruning of these subplans must wait. Subplans which are eliminated during this stage of pruning are still visible in the EXPLAIN output. In order to determine if pruning has actually taken place, the EXPLAIN ANALYZE must be viewed. If a certain Append subplan was never executed due to the elimination of the partition then the execution timing area will state "(never executed)". Whereas, if, for example in the case of parameterized nested loops, the number of loops stated in the EXPLAIN ANALYZE output for certain subplans may appear lower than others due to the subplan having been scanned fewer times. This is due to the list of matching subnodes having to be evaluated whenever a parameter which was found to match the partition key changes. This commit required some additional infrastructure that permits the building of a data structure which is able to perform the translation of the matching partition IDs, as returned by get_matching_partitions, into the list index of a subpaths list, as exist in node types such as Append, MergeAppend and ModifyTable. This allows us to translate a list of clauses into a Bitmapset of all the subpath indexes which must be included to satisfy the clause list. Author: David Rowley, based on an earlier effort by Beena Emerson Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi, Jesper Pedersen Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
Long commit message. Long and pretty good in explaining what is going on, so let's just see the result for ourselves.
First, sample data:
CREATE TABLE users ( id serial NOT NULL, username text NOT NULL, password text ) PARTITION BY RANGE ( id ); CREATE TABLE users_0 partition OF users (id, PRIMARY KEY (id), UNIQUE (username)) FOR VALUES FROM (minvalue) TO (10); CREATE TABLE users_1 partition OF users (id, PRIMARY KEY (id), UNIQUE (username)) FOR VALUES FROM (10) TO (20); CREATE TABLE users_2 partition OF users (id, PRIMARY KEY (id), UNIQUE (username)) FOR VALUES FROM (20) TO (30); CREATE TABLE users_3 partition OF users (id, PRIMARY KEY (id), UNIQUE (username)) FOR VALUES FROM (30) TO (maxvalue); INSERT INTO users (username) SELECT 'u:' || i::text FROM generate_series(1,25) i; CREATE TABLE x (u_id INT4); INSERT INTO x (u_id) VALUES (13); ANALYZE;
Pretty straightforward. Now, let's see execution plan with hard-coded id:
=$ EXPLAIN analyze SELECT * FROM users WHERE id = 13; QUERY PLAN -------------------------------------------------------------------------------------------------------- Append (cost=0.00..1.12 ROWS=1 width=41) (actual TIME=0.012..0.017 ROWS=1 loops=1) -> Seq Scan ON users_1 (cost=0.00..1.12 ROWS=1 width=41) (actual TIME=0.009..0.011 ROWS=1 loops=1) FILTER: (id = 13) ROWS Removed BY FILTER: 9 Planning TIME: 0.549 ms Execution TIME: 0.040 ms (6 ROWS)
As we can see only one partition was scanned. All good.
Now, let's see what happens if id is taken from subselect:
=$ EXPLAIN analyze SELECT * FROM users WHERE id = (SELECT u_id FROM x); QUERY PLAN -------------------------------------------------------------------------------------------------------- Append (cost=1.01..12.51 ROWS=4 width=48) (actual TIME=0.007..0.007 ROWS=1 loops=1) InitPlan 1 (RETURNS $0) -> Seq Scan ON x (cost=0.00..1.01 ROWS=1 width=4) (actual TIME=0.002..0.002 ROWS=1 loops=1) -> Seq Scan ON users_0 (cost=0.00..1.11 ROWS=1 width=40) (never executed) FILTER: (id = $0) -> Seq Scan ON users_1 (cost=0.00..1.12 ROWS=1 width=41) (actual TIME=0.002..0.002 ROWS=1 loops=1) FILTER: (id = $0) ROWS Removed BY FILTER: 9 -> Seq Scan ON users_2 (cost=0.00..1.07 ROWS=1 width=41) (never executed) FILTER: (id = $0) -> INDEX Scan USING users_3_pkey ON users_3 (cost=0.15..8.17 ROWS=1 width=68) (never executed) INDEX Cond: (id = $0) Planning TIME: 0.104 ms Execution TIME: 0.021 ms (14 ROWS)
for comparison, in Pg 10, explain analyze for query above looked like:
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Append (cost=1.01..12.49 ROWS=4 width=48) (actual TIME=0.025..0.058 ROWS=1 loops=1) InitPlan 1 (RETURNS $0) -> Seq Scan ON x (cost=0.00..1.01 ROWS=1 width=4) (actual TIME=0.003..0.005 ROWS=1 loops=1) -> Seq Scan ON users_0 (cost=0.00..1.11 ROWS=1 width=40) (actual TIME=0.017..0.017 ROWS=0 loops=1) FILTER: (id = $0) ROWS Removed BY FILTER: 9 -> Seq Scan ON users_1 (cost=0.00..1.12 ROWS=1 width=41) (actual TIME=0.003..0.005 ROWS=1 loops=1) FILTER: (id = $0) ROWS Removed BY FILTER: 9 -> Seq Scan ON users_2 (cost=0.00..1.07 ROWS=1 width=41) (actual TIME=0.003..0.003 ROWS=0 loops=1) FILTER: (id = $0) ROWS Removed BY FILTER: 6 -> INDEX Scan USING users_3_pkey ON users_3 (cost=0.15..8.17 ROWS=1 width=68) (actual TIME=0.023..0.023 ROWS=0 loops=1) INDEX Cond: (id = $0) Planning TIME: 0.238 ms Execution TIME: 0.090 ms (16 ROWS)
The important part is that when old Pg (well, 10.3 is not so old, but it's not the git HEAD) had loops=1 on all partitions, in Pg 11 only one partition – the correct one, of course, was actually scanned. In case of all other partitions we can see “never executed", which is great sign that Pg didn't waste any cycles on work that it could skip.
Great stuff, thanks a lot to all involved.