Waiting for PostgreSQL 12 – Allow user control of CTE materialization, and change the default behavior.

On 16th of February 2019, Tom Lane committed patch:

Allow user control of CTE materialization, and change the default behavior.
 
Historically we've always materialized the full output of a CTE query,
treating WITH as an optimization fence (so that, for example, restrictions
from the outer query cannot be pushed into it).  This is appropriate when
the CTE query is INSERT/UPDATE/DELETE, or is recursive; but when the CTE
query is non-recursive and side-effect-free, there's no hazard of changing
the query results by pushing restrictions down.
 
Another argument for materialization is that it can avoid duplicate
computation of an expensive WITH query --- but that only applies if
the WITH query is called more than once in the outer query.  Even then
it could still be a net loss, if each call has restrictions that
would allow just a small part of the WITH query to be computed.
 
Hence, let's change the behavior for WITH queries that are non-recursive
and side-effect-free.  By default, we will inline them into the outer
query (removing the optimization fence) if they are called just once.
If they are called more than once, we will keep the old behavior by
default, but the user can override this and force inlining by specifying
NOT MATERIALIZED.  Lastly, the user can force the old behavior by
specifying MATERIALIZED; this would mainly be useful when the query had
deliberately been employing WITH as an optimization fence to prevent a
poor choice of plan.
 
Andreas Karlsson, Andrew Gierth, David Fetter
 
Discussion: https://postgr.es/m/87sh48ffhb.fsf@news-spur.riddles.org.uk

This is pretty important change, as it has the chance to cause performance issues in some cases.

As you perhaps know, PostgreSQL has so called CTEs – basically virtual tables/views defined in a query using WITH syntax.

These, up until now, were optimization fences, which is fancy way of saying – PostgreSQL treated whatever is inside as black magic and didn't try to do it's own planner things with regards to it.

This had the benefit that it allowed us (dbas) to write queries that were forcing planner to do its own in specific way. The drawback was that it could lead to sub optimal performance.

Let's consider this simple example:

  1. =$ EXPLAIN analyze
  2. WITH x AS (
  3.     SELECT relkind, COUNT(*) FROM pg_class GROUP BY relkind
  4. )
  5. SELECT * FROM x WHERE relkind = 'r';
  6.                                                      QUERY PLAN
  7. ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  8.  CTE Scan ON x  (cost=16.17..16.31 ROWS=1 width=9) (actual TIME=0.922..0.937 ROWS=1 loops=1)
  9.    FILTER: (relkind = 'r'::"char")
  10.    ROWS Removed BY FILTER: 3
  11.    CTE x
  12.      ->  HashAggregate  (cost=16.11..16.17 ROWS=6 width=9) (actual TIME=0.916..0.922 ROWS=4 loops=1)
  13.            GROUP KEY: pg_class.relkind
  14.            ->  Seq Scan ON pg_class  (cost=0.00..14.41 ROWS=341 width=1) (actual TIME=0.006..0.442 ROWS=341 loops=1)
  15.  Planning TIME: 0.118 ms
  16.  Execution TIME: 0.999 ms
  17. (9 ROWS)

Please note that in line 12 we can see that HashAggregate returned 4 rows (actual…rows=4).

This means that PostgreSQL scanned whole pg_class, did grouping, and count, and only then applied external condition of relkind = ‘r'.

While pg_class is trivially small, this example shows that if we could somehow push the relkind condition inside CTE, it could be faster.

Now, in PostgreSQL 12, the same query generates this explain:

  1. =$ EXPLAIN analyze
  2. WITH x AS (
  3.     SELECT relkind, COUNT(*) FROM pg_class GROUP BY relkind
  4. )
  5. SELECT * FROM x WHERE relkind = 'r';
  6.                                                 QUERY PLAN                                                 
  7. -----------------------------------------------------------------------------------------------------------
  8.  GroupAggregate  (cost=0.00..18.21 ROWS=4 width=9) (actual TIME=0.255..0.255 ROWS=1 loops=1)
  9.    GROUP KEY: pg_class.relkind
  10.    ->  Seq Scan ON pg_class  (cost=0.00..17.82 ROWS=69 width=1) (actual TIME=0.021..0.227 ROWS=70 loops=1)
  11.          FILTER: (relkind = 'r'::"char")
  12.          ROWS Removed BY FILTER: 317
  13.  Planning TIME: 0.989 ms
  14.  Execution TIME: 0.548 ms
  15. (7 ROWS)

Which is basically the same as for:

  1. =$ EXPLAIN analyze
  2. SELECT relkind, COUNT(*) FROM pg_class WHERE relkind = 'r' GROUP BY relkind
  3.                                                 QUERY PLAN                                                 
  4. -----------------------------------------------------------------------------------------------------------
  5.  GroupAggregate  (cost=0.00..18.21 ROWS=4 width=9) (actual TIME=0.275..0.275 ROWS=1 loops=1)
  6.    GROUP KEY: relkind
  7.    ->  Seq Scan ON pg_class  (cost=0.00..17.82 ROWS=69 width=1) (actual TIME=0.022..0.248 ROWS=70 loops=1)
  8.          FILTER: (relkind = 'r'::"char")
  9.          ROWS Removed BY FILTER: 317
  10.  Planning TIME: 0.700 ms
  11.  Execution TIME: 0.540 ms
  12. (7 ROWS)

Please note that in here, in lines 10-11 in the example with CTE, and lines 7-8 without CTE, condition is applied directly to Scan on table.

In case you'd like to get previous behavior, you can use new MATERIALIZED keyword for CTE definition, like this:

  1. =$ EXPLAIN analyze
  2. WITH x AS MATERIALIZED (
  3.     SELECT relkind, COUNT(*) FROM pg_class GROUP BY relkind
  4. )
  5. SELECT * FROM x WHERE relkind = 'r';
  6.                                                      QUERY PLAN                                                      
  7. ---------------------------------------------------------------------------------------------------------------------
  8.  CTE Scan ON x  (cost=18.83..18.92 ROWS=1 width=9) (actual TIME=0.452..0.458 ROWS=1 loops=1)
  9.    FILTER: (relkind = 'r'::"char")
  10.    ROWS Removed BY FILTER: 3
  11.    CTE x
  12.      ->  HashAggregate  (cost=18.79..18.83 ROWS=4 width=9) (actual TIME=0.446..0.449 ROWS=4 loops=1)
  13.            GROUP KEY: pg_class.relkind
  14.            ->  Seq Scan ON pg_class  (cost=0.00..16.86 ROWS=386 width=1) (actual TIME=0.018..0.124 ROWS=387 loops=1)
  15.  Planning TIME: 0.954 ms
  16.  Execution TIME: 0.769 ms
  17. (9 ROWS)

All of this is great. I'm kinda uneasy about the fact that by default CTEs will be run in NOT MATERIALIZED way, and if you want to preserve older way of working, you have to modify your queries. But – it's definitely a progress, so I can't really complain.

Thanks to all involved, great work.

6 thoughts on “Waiting for PostgreSQL 12 – Allow user control of CTE materialization, and change the default behavior.”

  1. Just to be clear: this only influences query planning, it shouldn’t change any behavior except performance ? So… it’s a query planner hint ?

  2. Yes it does not change the results, but in some cases for large datasets, carefully using WITH clauses as optimization fences can mean the difference between a query that runs 10 seconds or 10 hours… (no I don’t have an example at hand)

    The fact that the results are the same is not really a consolation when the time it takes explodes because of bad planning…

    On other cases, inlining might accelerate existing queries though… there is no rule fits all here…

  3. Not all CTEs will be run without materialization. Commit message says:
    > … By default, we will inline them into the outer
    > query (removing the optimization fence) if they are called just once.
    > If they are called more than once, we will keep the old behavior by
    > default.

    Also, this applies only to the “side-effect-free” queries, I read this as read-only, but perhaps scope is slightly different.

  4. Do anyone have an example of a query this will help that isn’t just a poorly written query? I’m really struggling to find the good in this change considering that the optimization wall is the entire point of most of my CTE use. I suppose I can just throw the hint on all our existing queries if we don’t have time to performance test each one when switching to 12, but I’d like to understand the situation where this change is good.

    It seems like there must be some major upside since Tom bought into adding what are essentially query planner hints, but I’m failing to see it. I keep thinking that any query that can correctly optimize away a CTE materialization for a speed win could have just been written better without the CTE in the first place. Am I wrong about that? I probably am, but it’s not obvious to me what that query would be.

  5. Dan I think you have misunderstood.
    This change is great news.
    CTEs are most certainly not used (by most) for materialization.
    They are used because they are much easier to construct and to read and to understand by others and your future self – than complex nested sub queries.
    I am baffled as to why anyone could object to keeping this nice syntax AND getting the better performance of sub queries And with the option of keeping the old behaviour for edge cases where the query planner makes a mistake.
    So well done pg hackers, all we need now is incremental auto refresh of mat view then Postgres will be nigh on perfect.

  6. Unfortunately, even though this feature added more user control, the change forced our company to update hundreds of read-only queries that all had multiple CTEs, and were being run against a vendor’s pg instance. Most of them were “fixed” by just adding MATERIALIZED to them, but some needed more in-depth rewrites. (No comment on the quality of the original queries, they pre-dated me).

Comments are closed.