On 7th of February 2018, Tom Lane committed patch:
Support all SQL:2011 options for window frame clauses. This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING" frame boundaries in window functions. We'd punted on that back in the original patch to add window functions, because it was not clear how to do it in a reasonably data-type-extensible fashion. That problem is resolved here by adding the ability for btree operator classes to provide an "in_range" support function that defines how to add or subtract the RANGE offset value. Factoring it this way also allows the operator class to avoid overflow problems near the ends of the datatype's range, if it wishes to expend effort on that. (In the committed patch, the integer opclasses handle that issue, but it did not seem worth the trouble to avoid overflow failures for datetime types.) The patch includes in_range support for the integer_ops opfamily (int2/int4/int8) as well as the standard datetime types. Support for other numeric types has been requested, but that seems like suitable material for a follow-on patch. In addition, the patch adds GROUPS mode which counts the offset in ORDER-BY peer groups rather than rows, and it adds the frame_exclusion options specified by SQL:2011. As far as I can see, we are now fully up to spec on window framing options. Existing behaviors remain unchanged, except that I changed the errcode for a couple of existing error reports to meet the SQL spec's expectation that negative "offset" values should be reported as SQLSTATE 22013. Internally and in relevant parts of the documentation, we now consistently use the terminology "offset PRECEDING/FOLLOWING" rather than "value PRECEDING/FOLLOWING", since the term "value" is confusingly vague. Oliver Ford, reviewed and whacked around some by me Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com
Window functions has been around since PostgreSQL 8.4. I do use them, but to be honest, I rarely use complex window frame clauses. Usually just some order by, maybe partition by.
So first, lets create some sample data, and see what was available before:
=$ CREATE TABLE test ( id serial PRIMARY KEY, for_group text NOT NULL, some_val int4 NOT NULL ); CREATE TABLE =$ INSERT INTO test (for_group, some_val) SELECT 'group 1', generate_series(1,10,1) i; INSERT 0 10 =$ INSERT INTO test (for_group, some_val) SELECT 'group 2', generate_series(2,20,2) i; INSERT 0 10
Data looks like this:
=$ SELECT * FROM test ORDER BY for_group, some_val; id | for_group | some_val ----+-----------+---------- 1 | GROUP 1 | 1 2 | GROUP 1 | 2 3 | GROUP 1 | 3 4 | GROUP 1 | 4 5 | GROUP 1 | 5 6 | GROUP 1 | 6 7 | GROUP 1 | 7 8 | GROUP 1 | 8 9 | GROUP 1 | 9 10 | GROUP 1 | 10 11 | GROUP 2 | 2 12 | GROUP 2 | 4 13 | GROUP 2 | 6 14 | GROUP 2 | 8 15 | GROUP 2 | 10 16 | GROUP 2 | 12 17 | GROUP 2 | 14 18 | GROUP 2 | 16 19 | GROUP 2 | 18 20 | GROUP 2 | 20 (20 ROWS)
In older Pgs we could use following criteria as window frames:
- { RANGE | ROWS } frame_start
- { RANGE | ROWS } BETWEEN frame_start AND frame_end
where frame_start and frame_end can be one of:
- UNBOUNDED PRECEDING
- value PRECEDING
- CURRENT ROW
- value FOLLOWING
- UNBOUNDED FOLLOWING
with two separate defaults:
- default frame_end is CURRENT ROW
- default window frame definition is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Which basically means that this select will generate results that match column names:
=$ SELECT id, for_group, some_val, SUM(some_val) OVER (partition BY for_group ORDER BY id) AS sum_so_far_in_group, SUM(some_val) OVER (partition BY for_group) AS sum_in_group, SUM(some_val) OVER (partition BY for_group ORDER BY id range 3 PRECEDING) AS sum_current_and_3_preceeding, SUM(some_val) OVER (partition BY for_group ORDER BY id range BETWEEN 3 PRECEDING AND 3 FOLLOWING) AS sum_current_and_3_preceeding_and_3_following, SUM(some_val) OVER (partition BY for_group ORDER BY id range BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS sum_current_and_all_following FROM test ORDER BY for_group, id; id | for_group | some_val | sum_so_far_in_group | sum_in_group | sum_current_and_3_preceeding | sum_current_and_3_preceeding_and_3_following | sum_current_and_all_following ----+-----------+----------+---------------------+--------------+------------------------------+----------------------------------------------+------------------------------- 1 | 0 | 1 | 1 | 5 | 1 | 3 | 5 2 | 0 | 2 | 3 | 5 | 3 | 3 | 4 11 | 0 | 2 | 5 | 5 | 2 | 2 | 2 3 | 1 | 3 | 3 | 16 | 3 | 12 | 16 4 | 1 | 4 | 7 | 16 | 7 | 12 | 13 5 | 1 | 5 | 12 | 16 | 12 | 12 | 9 12 | 1 | 4 | 16 | 16 | 4 | 4 | 4 6 | 2 | 6 | 6 | 35 | 6 | 21 | 35 7 | 2 | 7 | 13 | 35 | 13 | 21 | 29 8 | 2 | 8 | 21 | 35 | 21 | 21 | 22 13 | 2 | 6 | 27 | 35 | 6 | 14 | 14 14 | 2 | 8 | 35 | 35 | 14 | 14 | 8 9 | 3 | 9 | 9 | 29 | 9 | 19 | 29 10 | 3 | 10 | 19 | 29 | 19 | 19 | 20 15 | 3 | 10 | 29 | 29 | 10 | 10 | 10 16 | 4 | 12 | 12 | 26 | 12 | 26 | 26 17 | 4 | 14 | 26 | 26 | 26 | 26 | 14 18 | 5 | 16 | 16 | 16 | 16 | 16 | 16 19 | 6 | 18 | 18 | 38 | 18 | 38 | 38 20 | 6 | 20 | 38 | 38 | 38 | 38 | 20 (20 ROWS)
There is some difference between frames in RANGE and
OK. So this covers basically what we had before PostgreSQL 11. What's new then?
New window frames can be:
- { RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
- { RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]
where frame_start and frame_end are the same as before, but we got new part frame_exclusion, which can be:
- EXCLUDE CURRENT ROW
- EXCLUDE GROUP
- EXCLUDE TIES
- EXCLUDE NO OTHERS
Now, to explain what are these, I will need to modify my table, to contain one more column that will be used for so called groups. And, since we have for_group column, I'll rename it to avoid confusion:
=$ ALTER TABLE test RENAME COLUMN for_group TO for_paritition; =$ ALTER TABLE test ADD COLUMN for_group int4;
We also need to understand concept of peers. Rows are considered to be peers if they belong to the same partition, and according to order by window clause – they have the same position.
Now, let's make sure that the values in for_group are usable for my needs:
=$ UPDATE test SET for_group = some_val / 3;
This makes our table contain this data:
=$ SELECT * FROM test ORDER BY for_paritition, id; id | for_paritition | some_val | for_group ----+----------------+----------+----------- 1 | GROUP 1 | 1 | 0 2 | GROUP 1 | 2 | 0 3 | GROUP 1 | 3 | 1 4 | GROUP 1 | 4 | 1 5 | GROUP 1 | 5 | 1 6 | GROUP 1 | 6 | 2 7 | GROUP 1 | 7 | 2 8 | GROUP 1 | 8 | 2 9 | GROUP 1 | 9 | 3 10 | GROUP 1 | 10 | 3 11 | GROUP 2 | 2 | 0 12 | GROUP 2 | 4 | 1 13 | GROUP 2 | 6 | 2 14 | GROUP 2 | 8 | 2 15 | GROUP 2 | 10 | 3 16 | GROUP 2 | 12 | 4 17 | GROUP 2 | 14 | 4 18 | GROUP 2 | 16 | 5 19 | GROUP 2 | 18 | 6 20 | GROUP 2 | 20 | 6 (20 ROWS)
If I'd order by for_group, then rows with the same for_group would be considered peers.
Now, we can try new frame clauses:
=$ SELECT id, for_paritition, for_group, some_val, array_agg(id) OVER ( partition BY for_paritition ORDER BY for_group RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS all_ids_in_partition, array_agg(id) OVER ( partition BY for_paritition ORDER BY for_group RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING EXCLUDE CURRENT ROW ) AS exclude_current_row, array_agg(id) OVER ( partition BY for_paritition ORDER BY for_group RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING EXCLUDE GROUP ) AS exclude_group, array_agg(id) OVER ( partition BY for_paritition ORDER BY for_group RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING EXCLUDE TIES ) AS exclude_ties, array_agg(id) OVER ( partition BY for_paritition ORDER BY for_group RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING EXCLUDE NO OTHERS ) AS exclude_no_others FROM test ORDER BY for_paritition, for_group, id; id | for_paritition | for_group | some_val | all_ids_in_partition | exclude_current_row | exclude_group | exclude_ties | exclude_no_others ----+----------------+-----------+----------+---------------------------------+------------------------------+------------------------------+---------------------------------+--------------------------------- 1 | GROUP 1 | 0 | 1 | {1,2,3,4,5,6,7,8,9,10} | {2,3,4,5,6,7,8,9,10} | {3,4,5,6,7,8,9,10} | {1,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8,9,10} 2 | GROUP 1 | 0 | 2 | {1,2,3,4,5,6,7,8,9,10} | {1,3,4,5,6,7,8,9,10} | {3,4,5,6,7,8,9,10} | {2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8,9,10} 3 | GROUP 1 | 1 | 3 | {1,2,3,4,5,6,7,8,9,10} | {1,2,4,5,6,7,8,9,10} | {1,2,6,7,8,9,10} | {1,2,3,6,7,8,9,10} | {1,2,3,4,5,6,7,8,9,10} 4 | GROUP 1 | 1 | 4 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,5,6,7,8,9,10} | {1,2,6,7,8,9,10} | {1,2,4,6,7,8,9,10} | {1,2,3,4,5,6,7,8,9,10} 5 | GROUP 1 | 1 | 5 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,6,7,8,9,10} | {1,2,6,7,8,9,10} | {1,2,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8,9,10} 6 | GROUP 1 | 2 | 6 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,7,8,9,10} | {1,2,3,4,5,9,10} | {1,2,3,4,5,6,9,10} | {1,2,3,4,5,6,7,8,9,10} 7 | GROUP 1 | 2 | 7 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,8,9,10} | {1,2,3,4,5,9,10} | {1,2,3,4,5,7,9,10} | {1,2,3,4,5,6,7,8,9,10} 8 | GROUP 1 | 2 | 8 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,9,10} | {1,2,3,4,5,9,10} | {1,2,3,4,5,8,9,10} | {1,2,3,4,5,6,7,8,9,10} 9 | GROUP 1 | 3 | 9 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8,10} | {1,2,3,4,5,6,7,8} | {1,2,3,4,5,6,7,8,9} | {1,2,3,4,5,6,7,8,9,10} 10 | GROUP 1 | 3 | 10 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8,9} | {1,2,3,4,5,6,7,8} | {1,2,3,4,5,6,7,8,10} | {1,2,3,4,5,6,7,8,9,10} 11 | GROUP 2 | 0 | 2 | {11,12,13,14,15,16,17,18,19,20} | {12,13,14,15,16,17,18,19,20} | {12,13,14,15,16,17,18,19,20} | {11,12,13,14,15,16,17,18,19,20} | {11,12,13,14,15,16,17,18,19,20} 12 | GROUP 2 | 1 | 4 | {11,12,13,14,15,16,17,18,19,20} | {11,13,14,15,16,17,18,19,20} | {11,13,14,15,16,17,18,19,20} | {11,12,13,14,15,16,17,18,19,20} | {11,12,13,14,15,16,17,18,19,20} 13 | GROUP 2 | 2 | 6 | {11,12,13,14,15,16,17,18,19,20} | {11,12,14,15,16,17,18,19,20} | {11,12,15,16,17,18,19,20} | {11,12,13,15,16,17,18,19,20} | {11,12,13,14,15,16,17,18,19,20} 14 | GROUP 2 | 2 | 8 | {11,12,13,14,15,16,17,18,19,20} | {11,12,13,15,16,17,18,19,20} | {11,12,15,16,17,18,19,20} | {11,12,14,15,16,17,18,19,20} | {11,12,13,14,15,16,17,18,19,20} 15 | GROUP 2 | 3 | 10 | {11,12,13,14,15,16,17,18,19,20} | {11,12,13,14,16,17,18,19,20} | {11,12,13,14,16,17,18,19,20} | {11,12,13,14,15,16,17,18,19,20} | {11,12,13,14,15,16,17,18,19,20} 16 | GROUP 2 | 4 | 12 | {11,12,13,14,15,16,17,18,19,20} | {11,12,13,14,15,17,18,19,20} | {11,12,13,14,15,18,19,20} | {11,12,13,14,15,16,18,19,20} | {11,12,13,14,15,16,17,18,19,20} 17 | GROUP 2 | 4 | 14 | {11,12,13,14,15,16,17,18,19,20} | {11,12,13,14,15,16,18,19,20} | {11,12,13,14,15,18,19,20} | {11,12,13,14,15,17,18,19,20} | {11,12,13,14,15,16,17,18,19,20} 18 | GROUP 2 | 5 | 16 | {11,12,13,14,15,16,17,18,19,20} | {11,12,13,14,15,16,17,19,20} | {11,12,13,14,15,16,17,19,20} | {11,12,13,14,15,16,17,18,19,20} | {11,12,13,14,15,16,17,18,19,20} 19 | GROUP 2 | 6 | 18 | {11,12,13,14,15,16,17,18,19,20} | {11,12,13,14,15,16,17,18,20} | {11,12,13,14,15,16,17,18} | {11,12,13,14,15,16,17,18,19} | {11,12,13,14,15,16,17,18,19,20} 20 | GROUP 2 | 6 | 20 | {11,12,13,14,15,16,17,18,19,20} | {11,12,13,14,15,16,17,18,19} | {11,12,13,14,15,16,17,18} | {11,12,13,14,15,16,17,18,20} | {11,12,13,14,15,16,17,18,19,20} (20 ROWS)
As you can see:
- exclude_current_row has all ids in a partition, with the exception of current row
- exclude_group – excludes all rows that are peers of current row
- exclude_ties – excludes all rows that are peers of current row, but leaves current row in the window frame
- exclude_no_others basically means no exclusion, and it's the default clause if you didn't provide one
Now, with that in place, what is the difference between RANGE, ROWS, and GROUPS frame types?
=$ SELECT id, for_paritition, for_group, some_val, array_agg(id) OVER ( partition BY for_paritition ORDER BY for_group RANGE BETWEEN CURRENT ROW AND 3 FOLLOWING ) AS window_range, array_agg(id) OVER ( partition BY for_paritition ORDER BY some_val RANGE BETWEEN CURRENT ROW AND 3 FOLLOWING ) AS window_range_unique, array_agg(id) OVER ( partition BY for_paritition ORDER BY for_group ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING ) AS window_rows, array_agg(id) OVER ( partition BY for_paritition ORDER BY some_val ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING ) AS window_rows_unique, array_agg(id) OVER ( partition BY for_paritition ORDER BY for_group GROUPS BETWEEN CURRENT ROW AND 3 FOLLOWING ) AS window_groups, array_agg(id) OVER ( partition BY for_paritition ORDER BY some_val GROUPS BETWEEN CURRENT ROW AND 3 FOLLOWING ) AS window_groups_unique FROM test ORDER BY for_paritition, for_group, id; id | for_paritition | for_group | some_val | window_range | window_range_unique | window_rows | window_rows_unique | window_groups | window_groups_unique ----+----------------+-----------+----------+---------------+----------------------+---------------+--------------------+------------------------+---------------------- 1 | GROUP 1 | 0 | 1 | | | {1,2,3,4} | {1,2,3,4} | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4} 2 | GROUP 1 | 0 | 2 | {1,2} | {2,3,4,5,6,7,8,9,10} | {2,3,4,5} | {2,3,4,5} | {1,2,3,4,5,6,7,8,9,10} | {2,3,4,5} 3 | GROUP 1 | 1 | 3 | | {3,4,5,6,7,8,9,10} | {3,4,5,6} | {3,4,5,6} | {3,4,5,6,7,8,9,10} | {3,4,5,6} 4 | GROUP 1 | 1 | 4 | | {4,5,6,7,8,9,10} | {4,5,6,7} | {4,5,6,7} | {3,4,5,6,7,8,9,10} | {4,5,6,7} 5 | GROUP 1 | 1 | 5 | {3,4,5} | {5,6,7,8,9,10} | {5,6,7,8} | {5,6,7,8} | {3,4,5,6,7,8,9,10} | {5,6,7,8} 6 | GROUP 1 | 2 | 6 | | {6,7,8,9,10} | {6,7,8,9} | {6,7,8,9} | {6,7,8,9,10} | {6,7,8,9} 7 | GROUP 1 | 2 | 7 | | {7,8,9,10} | {7,8,9,10} | {7,8,9,10} | {6,7,8,9,10} | {7,8,9,10} 8 | GROUP 1 | 2 | 8 | | {8,9,10} | {8,9,10} | {8,9,10} | {6,7,8,9,10} | {8,9,10} 9 | GROUP 1 | 3 | 9 | | {9,10} | {9,10} | {9,10} | {9,10} | {9,10} 10 | GROUP 1 | 3 | 10 | {9,10} | {10} | {10} | {10} | {9,10} | {10} 11 | GROUP 2 | 0 | 2 | {11} | | {11,12,13,14} | {11,12,13,14} | {11,12,13,14,15} | {11,12,13,14} 12 | GROUP 2 | 1 | 4 | | | {12,13,14,15} | {12,13,14,15} | {12,13,14,15,16,17} | {12,13,14,15} 13 | GROUP 2 | 2 | 6 | {13,14,15,16} | | {13,14,15,16} | {13,14,15,16} | {13,14,15,16,17,18} | {13,14,15,16} 14 | GROUP 2 | 2 | 8 | {13,14,15,16} | {14} | {14,15,16,17} | {14,15,16,17} | {13,14,15,16,17,18} | {14,15,16,17} 15 | GROUP 2 | 3 | 10 | {15,16} | | {15,16,17,18} | {15,16,17,18} | {15,16,17,18,19,20} | {15,16,17,18} 16 | GROUP 2 | 4 | 12 | {16} | | {16,17,18,19} | {16,17,18,19} | {16,17,18,19,20} | {16,17,18,19} 17 | GROUP 2 | 4 | 14 | {16} | | {17,18,19,20} | {17,18,19,20} | {16,17,18,19,20} | {17,18,19,20} 18 | GROUP 2 | 5 | 16 | | | {18,19,20} | {18,19,20} | {18,19,20} | {18,19,20} 19 | GROUP 2 | 6 | 18 | | | {19,20} | {19,20} | {19,20} | {19,20} 20 | GROUP 2 | 6 | 20 | | | {20} | {20} | {19,20} | {20} (20 ROWS)
To be perfectly honest, I'm not entirely sure why RANGE results are the way they are – maybe someone can enlighten me in comments?
As for other clauses:
- rows seems to be easy – it starts with current row, and continues for 5 rows, so aside from some final rows in a partition, we'll always get 6 rows in window frame
- groups is a bit different. When I was sorting using unique key (well, some_val) – it behaves like rows. But when the order by clause is not unique, it works like this: it gets all peers of current row, and 3 next such groups of peers.
All in all, it's a great addition, and I am very grateful for it, though would like to see some explanation for the RANGE frame type. Especially since, based on docs this clause should allow cool things like interval based frames. But currently – I can't get a grip on how to use it.
As always – thanks to all involved.
I’m in trouble with the RANGE keyword
These two queries must return the same result but it’s not the case
SELECT
id,
for_paritition,
for_group,
some_val,
array_agg(id) over (
— partition by for_paritition
order by for_group
RANGE BETWEEN CURRENT ROW AND 3 FOLLOWING
) as window_range
FROM
test
WHERE for_paritition=’group 1′
ORDER BY
for_paritition, for_group, id;
SELECT
id,
for_paritition,
for_group,
some_val,
array_agg(id) over (
partition by for_paritition
order by for_group
RANGE BETWEEN CURRENT ROW AND 3 FOLLOWING
) as window_range
FROM
test
WHERE for_paritition=’group 1′
ORDER BY
for_paritition, for_group, id;
I think there’s a problem with the two keyword PARTITION and RANGE…
Best regards
Bruno
In your first query with window functions you had only integers in `for_group` column and it seems that you updated this table after executing this query.
Nice overview, thanks!
The result for the first query :
id | for_paritition | for_group | some_val | window_range
—-+—————-+———–+———-+————————
1 | group 1 | 0 | 1 | {1,2,3,4,5,6,7,8,9,10}
2 | group 1 | 0 | 2 | {1,2,3,4,5,6,7,8,9,10}
3 | group 1 | 1 | 3 | {3,4,5,6,7,8,9,10}
4 | group 1 | 1 | 4 | {3,4,5,6,7,8,9,10}
5 | group 1 | 1 | 5 | {3,4,5,6,7,8,9,10}
6 | group 1 | 2 | 6 | {6,7,8,9,10}
7 | group 1 | 2 | 7 | {6,7,8,9,10}
8 | group 1 | 2 | 8 | {6,7,8,9,10}
9 | group 1 | 3 | 9 | {9,10}
10 | group 1 | 3 | 10 | {9,10}
(10 rows)
The result of de second query :
id | for_paritition | for_group | some_val | window_range
—-+—————-+———–+———-+————–
1 | group 1 | 0 | 1 | NULL
2 | group 1 | 0 | 2 | {1,2}
3 | group 1 | 1 | 3 | NULL
4 | group 1 | 1 | 4 | NULL
5 | group 1 | 1 | 5 | NULL
6 | group 1 | 2 | 6 | {6}
7 | group 1 | 2 | 7 | {6}
8 | group 1 | 2 | 8 | {6}
9 | group 1 | 3 | 9 | NULL
10 | group 1 | 3 | 10 | NULL
(10 rows)
I don’t understamp why these results are different…
Best regards
Bruno
> To be perfectly honest, I’m not entirely sure why RANGE results are the way they are – maybe someone can enlighten me in comments?
offset PRECEDING|FOLLOWING in RANGE mode is only supported since PG 11. So it’s maybe a bug worth reporting
I take it this does not include support for IGNORE NULLS / RESPECT NULLS?
Window functions has been around since PostgreSQL 9.4. should be PostgreSQL 8.4. The link points to the correct version though. Thanks for http://www.depesz.com. Love it.
@Marius:
thanks, fixed.