On 15th of December Tom Lane committed patch by Andrew Gierth (aka RhodiumToad), which adds interesting capability:
Log Message: ----------- Support ORDER BY within aggregate function calls, at long last providing a non-kluge method for controlling the order in which values are fed to an aggregate function. At the same time eliminate the old implementation restriction that DISTINCT was only supported for single-argument aggregates. Possibly release-notable behavioral change: formerly, agg(DISTINCT x) dropped null values of x unconditionally. Now, it does so only if the agg transition function is strict; otherwise nulls are treated as DISTINCT normally would, ie, you get one copy. Andrew Gierth, reviewed by Hitoshi Harada
Description in log message is pretty clear, so let's just test it.
Let's assume we have orders table with following structure:
# \d orders Table "public.orders" Column | Type | Modifiers ---------+---------+----------------------------------------------------- id | integer | not null default nextval('orders_id_seq'::regclass) buyer | text | not null ordered | date | not null total | integer | Indexes: "orders_pkey" PRIMARY KEY, btree (id)
Within these table we have these data:
# SELECT * FROM orders; id | buyer | ordered | total ----+-------+------------+------- 1 | Alice | 2009-05-12 | 57 2 | Eve | 2009-07-07 | 131 3 | Alice | 2009-08-15 | 177 4 | Carol | 2009-04-28 | 232 ... 38 | Eve | 2009-02-01 | 225 (38 ROWS)
Data was generated using this query:
INSERT INTO orders (buyer, ordered, total) SELECT n[j], now() - '1 year'::INTERVAL * random(), 50 + random() * 200 FROM generate_series(1,10) i, generate_series(1,5) j, (SELECT '{Alice,Bob,Carol,Dave,Eve}'::text[] AS n) x WHERE random() < 0.6;
Now. We'd like to get, for each buyer, sum of order values, and list of dates when they ordered. This is simple:
# SELECT buyer, SUM(total), array_agg( ordered) FROM orders GROUP BY buyer ORDER BY buyer; buyer | SUM | array_agg -------+------+------------------------------------------------------------------------------------------------------ Alice | 1057 | {2009-05-12,2009-08-15,2009-11-22,2009-05-08,2009-02-14,2009-03-25,2009-03-06,2009-08-16} Bob | 905 | {2009-01-29,2009-08-22,2009-05-28,2009-05-12,2009-02-10,2009-08-17} Carol | 1118 | {2009-04-28,2009-03-12,2009-01-10,2009-03-30,2009-06-27,2009-09-19,2009-12-14,2009-09-06,2009-09-01} Dave | 1239 | {2009-07-15,2009-07-27,2009-02-07,2009-12-19,2009-01-13,2009-05-28,2009-08-27} Eve | 1222 | {2009-07-07,2009-04-07,2009-10-02,2009-02-05,2009-09-26,2009-04-18,2009-08-14,2009-02-01} (5 ROWS)
The problem in here is that dates in array_agg are not ordered. Previously, we had to do stuff like this:
# SELECT buyer, SUM(total), array_agg( ordered) FROM ( SELECT * FROM orders ORDER BY buyer, ordered ) x GROUP BY buyer ORDER BY buyer; buyer | SUM | array_agg -------+------+------------------------------------------------------------------------------------------------------ Alice | 1057 | {2009-02-14,2009-03-06,2009-03-25,2009-05-08,2009-05-12,2009-08-15,2009-08-16,2009-11-22} Bob | 905 | {2009-01-29,2009-02-10,2009-05-12,2009-05-28,2009-08-17,2009-08-22} Carol | 1118 | {2009-01-10,2009-03-12,2009-03-30,2009-04-28,2009-06-27,2009-09-01,2009-09-06,2009-09-19,2009-12-14} Dave | 1239 | {2009-01-13,2009-02-07,2009-05-28,2009-07-15,2009-07-27,2009-08-27,2009-12-19} Eve | 1222 | {2009-02-01,2009-02-05,2009-04-07,2009-04-18,2009-07-07,2009-08-14,2009-09-26,2009-10-02} (5 ROWS)
Which works, but the problem is that the order is not guaranteed – it is ordered because that's how it works with this plan. Should PostgreSQL choose another plan to run the query – it might become unordered again.
Now, thanks to Andrew, we can force the ordering:
# SELECT buyer, SUM(total), array_agg( ordered ORDER BY ordered ) FROM orders GROUP BY buyer; buyer | SUM | array_agg -------+------+------------------------------------------------------------------------------------------------------ Alice | 1057 | {2009-02-14,2009-03-06,2009-03-25,2009-05-08,2009-05-12,2009-08-15,2009-08-16,2009-11-22} Bob | 905 | {2009-01-29,2009-02-10,2009-05-12,2009-05-28,2009-08-17,2009-08-22} Carol | 1118 | {2009-01-10,2009-03-12,2009-03-30,2009-04-28,2009-06-27,2009-09-01,2009-09-06,2009-09-19,2009-12-14} Dave | 1239 | {2009-01-13,2009-02-07,2009-05-28,2009-07-15,2009-07-27,2009-08-27,2009-12-19} Eve | 1222 | {2009-02-01,2009-02-05,2009-04-07,2009-04-18,2009-07-07,2009-08-14,2009-09-26,2009-10-02} (5 ROWS)
What's more – we can sort by value in some other field – for example – sort orders by their value:
# SELECT buyer, SUM(total), array_agg( ordered ORDER BY total DESC ) FROM orders GROUP BY buyer; buyer | SUM | array_agg -------+------+------------------------------------------------------------------------------------------------------ Alice | 1057 | {2009-05-08,2009-08-15,2009-03-25,2009-08-16,2009-02-14,2009-11-22,2009-03-06,2009-05-12} Bob | 905 | {2009-02-10,2009-01-29,2009-08-17,2009-05-12,2009-08-22,2009-05-28} Carol | 1118 | {2009-04-28,2009-09-01,2009-03-30,2009-06-27,2009-12-14,2009-09-19,2009-03-12,2009-01-10,2009-09-06} Dave | 1239 | {2009-05-28,2009-07-27,2009-02-07,2009-07-15,2009-08-27,2009-01-13,2009-12-19} Eve | 1222 | {2009-02-01,2009-08-14,2009-09-26,2009-04-07,2009-07-07,2009-02-05,2009-04-18,2009-10-02} (5 ROWS)
So, we can check:
# SELECT * FROM orders WHERE buyer = 'Bob' ORDER BY total DESC; id | buyer | ordered | total ----+-------+------------+------- 24 | Bob | 2009-02-10 | 249 11 | Bob | 2009-01-29 | 215 26 | Bob | 2009-08-17 | 156 19 | Bob | 2009-05-12 | 105 12 | Bob | 2009-08-22 | 95 17 | Bob | 2009-05-28 | 85 (6 ROWS)
Great. Works as described 🙂 Of course ordering doesn't make sense for all aggregates – count(*) for example 🙂 But the sole fact that we have it, means we can calculate median in plain SQL:
# WITH aggregated AS ( SELECT buyer, array_agg(total ORDER BY total) AS list, COUNT(*), COUNT(*) / 2 AS half FROM orders GROUP BY buyer ) SELECT buyer, list, CASE WHEN COUNT % 2 = 1 THEN list[ half + 1 ] ELSE ( list[ half ] + list[ half + 1 ] ) / 2 END AS median FROM aggregated; buyer | list | median -------+-----------------------------------+-------- Alice | {57,70,83,119,156,170,177,225} | 137 Bob | {85,95,105,156,215,249} | 130 Carol | {52,77,86,99,105,107,159,201,232} | 105 Dave | {98,131,154,155,219,241,241} | 155 Eve | {66,85,121,131,182,202,210,225} | 156 (5 ROWS)
How sweet it that?
Note that one psql block in your post is not monospaced:
select buyer, sum(total), array_agg( ordered order by total desc ) from orders group by buyer …
@Marti: thanks, fixed.
Great! Then ordered aggregates is another feature of Muldis D that will now be implementable over Postgres without much circumlocution. The PostgreSQL developers continue to make my job of implementing Muldis D easier with every release.