On 12th of February Tom Lane committed patch by Hitoshi Harada:
Log Message: ----------- Extend the set of frame options supported for window functions. This patch allows the frame to start from CURRENT ROW (in either RANGE or ROWS mode), and it also adds support for ROWS n PRECEDING and ROWS n FOLLOWING start and end points. (RANGE value PRECEDING/FOLLOWING isn't there yet --- the grammar works, but that's all.) Hitoshi Harada, reviewed by Pavel Stehule
The description is pretty clear, but let's look at what exactly it is.
First, let's create simple test table:
CREATE TABLE test ( id serial PRIMARY KEY, some_grouping int4, some_value int4 );
Now, let's insert some random data:
INSERT INTO test (some_grouping, some_value) SELECT i, random() * 100 + 50 AS j FROM generate_series(1,2) i, generate_series(1,10) k;
The data in it doesn't look spectacular, but it serves the purpose:
SELECT * FROM test ORDER BY id ASC; id | some_grouping | some_value ----+---------------+------------ 1 | 1 | 94 2 | 1 | 86 3 | 1 | 77 4 | 1 | 86 5 | 1 | 146 6 | 1 | 145 7 | 1 | 68 8 | 1 | 71 9 | 1 | 84 10 | 1 | 105 11 | 2 | 126 12 | 2 | 88 13 | 2 | 90 14 | 2 | 50 15 | 2 | 104 16 | 2 | 68 17 | 2 | 61 18 | 2 | 107 19 | 2 | 127 20 | 2 | 112 (20 ROWS)
Now. If you're not familiar with window functions, and specyfically with framing, let me show you simple example:
SELECT id, some_grouping, some_value, SUM(some_value) OVER ( PARTITION BY some_grouping RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS sum_per_group, SUM(some_value) OVER ( PARTITION BY some_grouping ORDER BY id RANGE UNBOUNDED PRECEDING ) AS cumulative_sum_per_group FROM test ORDER BY some_grouping, id; id | some_grouping | some_value | sum_per_group | cumulative_sum_per_group ----+---------------+------------+---------------+-------------------------- 1 | 1 | 94 | 962 | 94 2 | 1 | 86 | 962 | 180 3 | 1 | 77 | 962 | 257 4 | 1 | 86 | 962 | 343 5 | 1 | 146 | 962 | 489 6 | 1 | 145 | 962 | 634 7 | 1 | 68 | 962 | 702 8 | 1 | 71 | 962 | 773 9 | 1 | 84 | 962 | 857 10 | 1 | 105 | 962 | 962 11 | 2 | 126 | 933 | 126 12 | 2 | 88 | 933 | 214 13 | 2 | 90 | 933 | 304 14 | 2 | 50 | 933 | 354 15 | 2 | 104 | 933 | 458 16 | 2 | 68 | 933 | 526 17 | 2 | 61 | 933 | 587 18 | 2 | 107 | 933 | 694 19 | 2 | 127 | 933 | 821 20 | 2 | 112 | 933 | 933 (20 ROWS)
Now. Previously it would be difficult to calculate rolling average. For example something like this:
SELECT id, some_grouping, some_value, ( lag(some_value, 1) OVER (GROUPING) + some_value + lead(some_value, 1) OVER (GROUPING) ) / 3 AS rolling_average FROM test WINDOW GROUPING AS ( PARTITION BY some_grouping ORDER BY id ) ORDER BY some_grouping, id;
It returns, more or less as expected:
id | some_grouping | some_value | rolling_average ----+---------------+------------+----------------- 1 | 1 | 94 | [null] 2 | 1 | 86 | 85 3 | 1 | 77 | 83 4 | 1 | 86 | 103 5 | 1 | 146 | 125 6 | 1 | 145 | 119 7 | 1 | 68 | 94 8 | 1 | 71 | 74 9 | 1 | 84 | 86 10 | 1 | 105 | [null] 11 | 2 | 126 | [null] 12 | 2 | 88 | 101 13 | 2 | 90 | 76 14 | 2 | 50 | 81 15 | 2 | 104 | 74 16 | 2 | 68 | 77 17 | 2 | 61 | 78 18 | 2 | 107 | 98 19 | 2 | 127 | 115 20 | 2 | 112 | [null] (20 rows)
As you can see first and last row for any given window don't have rolling average – i.e. they have nulls. That's because for first row lag(some_value) is null, and for last row – lead(some_value) is null.
Now, let's see how it works with this new patch by Hitoshi Harada:
SELECT id, some_grouping, some_value, CAST( avg(some_value) OVER ( GROUPING ROWS BETWEEN 1 preceding AND 1 following ) AS int4 ) AS rolling_average FROM test WINDOW GROUPING AS ( PARTITION BY some_grouping ORDER BY id ) ORDER BY some_grouping, id;
and result is:
id | some_grouping | some_value | rolling_average ----+---------------+------------+----------------- 1 | 1 | 94 | 90 2 | 1 | 86 | 86 3 | 1 | 77 | 83 4 | 1 | 86 | 103 5 | 1 | 146 | 126 6 | 1 | 145 | 120 7 | 1 | 68 | 95 8 | 1 | 71 | 74 9 | 1 | 84 | 87 10 | 1 | 105 | 95 11 | 2 | 126 | 107 12 | 2 | 88 | 101 13 | 2 | 90 | 76 14 | 2 | 50 | 81 15 | 2 | 104 | 74 16 | 2 | 68 | 78 17 | 2 | 61 | 79 18 | 2 | 107 | 98 19 | 2 | 127 | 115 20 | 2 | 112 | 120 (20 rows)
This is very cool.
While I can write query without using this new frames that will calculate correct values for first and last rows:
SELECT id, some_grouping, some_value, ( COALESCE( lag(some_value, 1) OVER (GROUPING), 0 ) + some_value + COALESCE( lead(some_value, 1) OVER (GROUPING), 0 ) ) / ( 1 + CASE WHEN lag(some_value, 1) OVER (GROUPING) IS NULL THEN 0 ELSE 1 END + CASE WHEN lead(some_value, 1) OVER (GROUPING) IS NULL THEN 0 ELSE 1 END ) AS rolling_average FROM test WINDOW GROUPING AS ( PARTITION BY some_grouping ORDER BY id ) ORDER BY some_grouping, id;
It definitely isn't nice, and would very quickly become nightmarish for ranges larger than 3 rows.
Of course the patch also added other options besides “BETWEEN n PRECEDING AND n FOLLOWING" – you can check full list of possible framings in dev docs.