In this, hopefully 2nd to last, post in the series, I will cover the rest of usually happening operations that you can see in your explain outputs.
Unique
Name seems to be clear about what's going on here – it removes duplicate data.
This might happen, for example, when you're doing:
SELECT DISTINCT FIELD FROM TABLE
But in newer Pgs this query will usually be done using HashAggregate.
The problem with Unique is that is requires data to be sorted. Not because it needs data in any particular order – but it needs it so that all rows with the same value will be “together".
This makes it really cool (when possible to use) because it doesn't use virtually any memory. It just checks if value in previous row is the same as in current, and if yes – discards it. That's all.
So, we can force usage of it, by pre-sorting data:
$ EXPLAIN SELECT DISTINCT relkind FROM (SELECT relkind FROM pg_class ORDER BY relkind) AS x; QUERY PLAN ----------------------------------------------------------------------- UNIQUE (cost=22.88..27.26 ROWS=4 width=1) -> Sort (cost=22.88..23.61 ROWS=292 width=1) Sort KEY: pg_class.relkind -> Seq Scan ON pg_class (cost=0.00..10.92 ROWS=292 width=1) (4 ROWS)
Append
This plan simply runs multiple sub-operations, and returns all the rows that were returned as one resultset.
This is used by UNION/UNION ALL queries:
$ EXPLAIN SELECT oid FROM pg_class UNION ALL SELECT oid FROM pg_proc UNION ALL SELECT oid FROM pg_database; QUERY PLAN ----------------------------------------------------------------- Append (cost=0.00..104.43 ROWS=2943 width=4) -> Seq Scan ON pg_class (cost=0.00..10.92 ROWS=292 width=4) -> Seq Scan ON pg_proc (cost=0.00..92.49 ROWS=2649 width=4) -> Seq Scan ON pg_database (cost=0.00..1.02 ROWS=2 width=4) (4 ROWS)
In here you can see append running three scans on three tables and returning all the rows together.
Please note that I used UNION ALL. If I'd used UNION, we would get:
$ EXPLAIN SELECT oid FROM pg_class UNION SELECT oid FROM pg_proc UNION SELECT oid FROM pg_database; QUERY PLAN ----------------------------------------------------------------------- HashAggregate (cost=141.22..170.65 ROWS=2943 width=4) -> Append (cost=0.00..133.86 ROWS=2943 width=4) -> Seq Scan ON pg_class (cost=0.00..10.92 ROWS=292 width=4) -> Seq Scan ON pg_proc (cost=0.00..92.49 ROWS=2649 width=4) -> Seq Scan ON pg_database (cost=0.00..1.02 ROWS=2 width=4) (5 ROWS)
This is because UNION removes duplicate rows – which is, in this case, done using HashAggregate operation.
Result
This happens mostly in very simple test queries. This operation is used when your query selects some constant value (or values):
$ EXPLAIN SELECT 1, 2; QUERY PLAN ------------------------------------------ RESULT (cost=0.00..0.01 ROWS=1 width=0) (1 ROW)
Aside from test queries it can be sometimes seen in queries that do “insert, but don't if it would be duplicate" kind of thing:
$ EXPLAIN INSERT INTO t (i) SELECT 1 WHERE NOT EXISTS (SELECT * FROM t WHERE i = 1); QUERY PLAN --------------------------------------------------------------------- INSERT ON t (cost=3.33..3.35 ROWS=1 width=4) -> RESULT (cost=3.33..3.34 ROWS=1 width=0) One-TIME FILTER: (NOT $0) InitPlan 1 (RETURNS $0) -> Seq Scan ON t t_1 (cost=0.00..40.00 ROWS=12 width=0) FILTER: (i = 1) (6 ROWS)
Values Scan
Just like Result above, Values Scan is for returning simple, entered in query, data, but this time – it can be whole recordset, based on VALUES() functionality.
In case you don't know, you can select multiple rows with multiple columns, without any table, just by using VALUES syntax, like here:
$ SELECT * FROM ( VALUES (1, 'hubert'), (2, 'depesz'), (3, 'lubaczewski') ) AS t (a,b); a | b ---+------------- 1 | hubert 2 | depesz 3 | lubaczewski (3 ROWS)
Such query plan looks like:
QUERY PLAN -------------------------------------------------------------- VALUES Scan ON "*VALUES*" (cost=0.00..0.04 ROWS=3 width=36) (1 ROW)
It is also most commonly used in INSERTs, but it has other uses too, like custom sorting.
GroupAggregate
This is similar to previously described HashAggregate.
The difference is that for GroupAggregate to work data has to be sorted using whatever column(s) you used for your GROUP BY clause.
Just like Unique – GroupAggregate uses very little memory, but forces ordering of data.
Example:
$ EXPLAIN SELECT relkind, COUNT(*) FROM (SELECT relkind FROM pg_class ORDER BY relkind) x GROUP BY relkind; QUERY PLAN ----------------------------------------------------------------------- GroupAggregate (cost=22.88..28.03 ROWS=4 width=1) -> Sort (cost=22.88..23.61 ROWS=292 width=1) Sort KEY: pg_class.relkind -> Seq Scan ON pg_class (cost=0.00..10.92 ROWS=292 width=1) (4 ROWS)
HashSetOp
This operation is used by INTERSECT/EXCEPT operations (with optional “ALL" modifier).
It works by running sub-operation of Append for a pair of sub-queries, and then, based on result and optional ALL modifier, it figures which rows should be returned. I haven't digged in the source code so I can't tell you exactly how it works, but given the name and the operation, it looks like a simple counter-based solution.
In here we can see that unlike UNION, these operations work on two sources of data:
$ EXPLAIN SELECT * FROM (SELECT oid FROM pg_Class ORDER BY oid) x INTERSECT ALL SELECT * FROM (SELECT oid FROM pg_proc ORDER BY oid) y; QUERY PLAN ------------------------------------------------------------------------------------------------------------- HashSetOp INTERSECT ALL (cost=0.15..170.72 ROWS=292 width=4) -> Append (cost=0.15..163.36 ROWS=2941 width=4) -> Subquery Scan ON "*SELECT* 1" (cost=0.15..18.37 ROWS=292 width=4) -> INDEX ONLY Scan USING pg_class_oid_index ON pg_class (cost=0.15..12.53 ROWS=292 width=4) -> Subquery Scan ON "*SELECT* 2" (cost=0.28..145.00 ROWS=2649 width=4) -> INDEX ONLY Scan USING pg_proc_oid_index ON pg_proc (cost=0.28..92.02 ROWS=2649 width=4) (6 ROWS)
but with three sources, we get more complex tree:
$ EXPLAIN SELECT * FROM (SELECT oid FROM pg_Class ORDER BY oid) x INTERSECT ALL SELECT * FROM (SELECT oid FROM pg_proc ORDER BY oid) y INTERSECT ALL SELECT * FROM (SELECT oid FROM pg_database ORDER BY oid) AS w; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- HashSetOp INTERSECT ALL (cost=1.03..172.53 ROWS=2 width=4) -> Append (cost=1.03..171.79 ROWS=294 width=4) -> Subquery Scan ON "*SELECT* 3" (cost=1.03..1.07 ROWS=2 width=4) -> Sort (cost=1.03..1.03 ROWS=2 width=4) Sort KEY: pg_database.oid -> Seq Scan ON pg_database (cost=0.00..1.02 ROWS=2 width=4) -> RESULT (cost=0.15..170.72 ROWS=292 width=4) -> HashSetOp INTERSECT ALL (cost=0.15..170.72 ROWS=292 width=4) -> Append (cost=0.15..163.36 ROWS=2941 width=4) -> Subquery Scan ON "*SELECT* 1" (cost=0.15..18.37 ROWS=292 width=4) -> INDEX ONLY Scan USING pg_class_oid_index ON pg_class (cost=0.15..12.53 ROWS=292 width=4) -> Subquery Scan ON "*SELECT* 2" (cost=0.28..145.00 ROWS=2649 width=4) -> INDEX ONLY Scan USING pg_proc_oid_index ON pg_proc (cost=0.28..92.02 ROWS=2649 width=4) (13 ROWS)
CTE Scan
This is similar to previously mentioned Materialized operation. It runs a part of a query, and stores output so that it can be used by other part (or parts) of the query.
Example:
$ EXPLAIN analyze WITH x AS (SELECT relname, relkind FROM pg_class) SELECT relkind, COUNT(*), (SELECT COUNT(*) FROM x) FROM x GROUP BY relkind;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
HashAggregate (cost=24.80..26.80 ROWS=200 width=1) (actual TIME=0.466..0.468 ROWS=6 loops=1)
CTE x
-> Seq Scan ON pg_class (cost=0.00..10.92 ROWS=292 width=65) (actual TIME=0.009..0.127 ROWS=295 loops=1)
InitPlan 2 (RETURNS $1)
-> Aggregate (cost=6.57..6.58 ROWS=1 width=0) (actual TIME=0.085..0.085 ROWS=1 loops=1)
-> CTE Scan ON x x_1 (cost=0.00..5.84 ROWS=292 width=0) (actual TIME=0.000..0.055 ROWS=295 loops=1)
-> CTE Scan ON x (cost=0.00..5.84 ROWS=292 width=1) (actual TIME=0.012..0.277 ROWS=295 loops=1)
Total runtime: 0.524 ms
(8 ROWS)
Please note that pg_class is scanned only once – line #6. But it's results are stored in “x", and then scanned twice – inside aggregate (line #9) and HashAggregate (10).
How is it different from Materialize? To answer fully, one would need to jump into sources, but I would say that the difference stems from simple fact that CTE's are user defined. While Materialize is helper operation that Pg chooses to use when (it thinks) it makes sense.
The very important thing is that CTEs are ran just as specified. So they can be used to circumvent some not-so-good optimizations that planner normally can do.
InitPlan
This plan happens whenever there is a part of your query that can (or have to) be calculated before anything else, and it doesn't depend on anything in the rest of your query.
For example, let's assume you'd want such query:
$ EXPLAIN SELECT * FROM pg_class WHERE relkind = (SELECT relkind FROM pg_class ORDER BY random() LIMIT 1); QUERY PLAN ------------------------------------------------------------------------------------------ Seq Scan ON pg_class (cost=13.11..24.76 ROWS=73 width=203) FILTER: (relkind = $0) InitPlan 1 (RETURNS $0) -> LIMIT (cost=13.11..13.11 ROWS=1 width=1) -> Sort (cost=13.11..13.84 ROWS=292 width=1) Sort KEY: (random()) -> Seq Scan ON pg_class pg_class_1 (cost=0.00..11.65 ROWS=292 width=1) (7 ROWS)
In this case – getting the limit/sort/seq-scan is needed to run before normal seq scan on pg_class – because Pg will have to compare relkind value with the value returned by subquery.
On the other hand, if I'd write:
$ EXPLAIN SELECT *, (SELECT LENGTH('depesz')) FROM pg_class; QUERY PLAN ------------------------------------------------------------- Seq Scan ON pg_class (cost=0.01..10.93 ROWS=292 width=203) InitPlan 1 (RETURNS $0) -> RESULT (cost=0.00..0.01 ROWS=1 width=0) (3 ROWS)
Pg correctly sees that the subselect column does not depend on any data from pg_class table, so it can be run just once, and doesn't have to redo the length-calculation for every row.
Of course you can have many init plans, like in here:
$ EXPLAIN SELECT *, (SELECT LENGTH('depesz')) FROM pg_class WHERE relkind = (SELECT relkind FROM pg_class ORDER BY random() LIMIT 1); QUERY PLAN ------------------------------------------------------------------------------------------ Seq Scan ON pg_class (cost=13.12..24.77 ROWS=73 width=203) FILTER: (relkind = $1) InitPlan 1 (RETURNS $0) -> RESULT (cost=0.00..0.01 ROWS=1 width=0) InitPlan 2 (RETURNS $1) -> LIMIT (cost=13.11..13.11 ROWS=1 width=1) -> Sort (cost=13.11..13.84 ROWS=292 width=1) Sort KEY: (random()) -> Seq Scan ON pg_class pg_class_1 (cost=0.00..11.65 ROWS=292 width=1) (9 ROWS)
There is one important thing, though – numbering of init plans within single query is “global", and not “per operation".
SubPlan
SubPlans are a bit similar to NestedLoop. In this way that these can be called many times.
SubPlan is called to calculate data from a subquery, that actually does depends on current row.
For example:
$ EXPLAIN analyze SELECT c.relname, c.relkind, (SELECT COUNT(*) FROM pg_Class x WHERE c.relkind = x.relkind) FROM pg_Class c; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Seq Scan ON pg_class c (cost=0.00..3468.93 ROWS=292 width=65) (actual TIME=0.135..26.717 ROWS=295 loops=1) SubPlan 1 -> Aggregate (cost=11.83..11.84 ROWS=1 width=0) (actual TIME=0.090..0.090 ROWS=1 loops=295) -> Seq Scan ON pg_class x (cost=0.00..11.65 ROWS=73 width=0) (actual TIME=0.010..0.081 ROWS=93 loops=295) FILTER: (c.relkind = relkind) ROWS Removed BY FILTER: 202 Total runtime: 26.783 ms (7 ROWS)
For every row that is returned by scan on “pg_class as c", Pg has to run SubPlan, which checks how many rows in pg_class have the same (as currently processed row) value in relkind column.
Please note “loops=295" in the “Seq Scan on pg_class x" line, and matching “rows=295" in the earlier “Seq Scan on pg_class c" node.
Other ?
Yes. There are other operations. Some of them are too rare to care about (especially since you do have the great source of knowledge: sources), some are (I suspect) older versions of newer nodes.
If you have a plan with operation I did not cover, and you don't understand it – please let me know in comments – link to explain output on explain.depesz.com, operation name, and Pg version that you have it in. Will comment on such cases with whatever information I might find.
This makes it really cool (when possible to use) because it doesn’t use virtually any memory.
What if work_mem is enough to make quicksort run in memory?
@kim:
sorry, but I don’t understand what you’re asking about. The blogpost is rather long, and described many different execution nodes, but none of these are sorts, so I just don’t get the context you’re asking in.
I’m not sure if this is the right place to ask. It’s related to the source code.
I’m studying queries that involve initplan and subplan. Nodes such as hash joins, seq scan are executed by the function ExecProcNode(). But I can’t find which function executes a subplan or initplan .
@Yang Liu:
sorry, can’t answer. but for source-code level questions, I can suggest that you ask on pgsql-hackers mailing list.
I know it’s too late, but @kim is talking about a UNIQUE operation.
What if work_mem is enough to make quicksort run in memory?
It does.
Depesz pre-sorted data just to get UNIQUE in the plan.