Grouping data into array of sums – fun with custom aggregates

Was asked recently about optimization of interesting case. There was table like:

=$ CREATE TABLE input_data (
    category_id        INT8,
    object_id          INT8,
    interaction_ts     timestamptz,
    interaction_type    TEXT,
    interaction_count  INT4
);

And there was a code that was grouping it all by sum()ing interaction_count per category, object, interaction_type, and timestamp truncated to hour.

Basically, storing somewhere result of:

=$ SELECT
    category_id,
    object_id,
    date_trunc( 'hour', interaction_ts ) AS ts,
    SUM(interaction_count) FILTER (WHERE interaction_type = 'a') AS a_count,
    SUM(interaction_count) FILTER (WHERE interaction_type = 'b') AS b_count
FROM
    input_data
GROUP BY 1, 2, 3;

While talking about optimizations, one idea that came was to store whole day of counts in single row, as array. So the resulting count table would be:

=$ CREATE TABLE results (
    category_id      int8,
    object_id        int8
    interaction_day  DATE,
    a_counts         int4[],
    b_counts         int4[]
);

Where a_counts, and b_counts would always have 24 elements, one for each hour.

Now, how to roll it up like this?

Continue reading Grouping data into array of sums – fun with custom aggregates

Fix for displaying aggregates on explain.depesz.com

Couple of days ago RhodiumToad reported, on irc, a bug in explain.depesz.com.

Specifically – if explain was done using JSON/XML/YAML formats, and node type was Aggregate, the site didn't extract full info.

Continue reading Fix for displaying aggregates on explain.depesz.com

Getting first and last values per group

Every so often someone needs solution to getting first (or couple of first) values for given column. Or last.

For some cases (when there is not many groups) you can use recursive queries. But it's not always the best choice.

Let's try to implement first() and last() aggregates, so these could be easily used by anybody.

Continue reading Getting first and last values per group

Waiting for PostgreSQL 10 – postgres_fdw: Push down aggregates to remote servers.

On 21st of October, Robert Haas committed patch:

postgres_fdw: Push down aggregates to remote servers.
 
Now that the upper planner uses paths, and now that we have proper hooks
to inject paths into the upper planning process, it's possible for
foreign data wrappers to arrange to push aggregates to the remote side
instead of fetching all of the rows and aggregating them locally.  This
figures to be a massive win for performance, so teach postgres_fdw to
do it.
 
Jeevan Chalke and Ashutosh Bapat.  Reviewed by Ashutosh Bapat with
additional testing by Prabhat Sahu.  Various mostly cosmetic changes
by me.

Continue reading Waiting for PostgreSQL 10 – postgres_fdw: Push down aggregates to remote servers.

Getting list of unique elements in table, per group

Today, on irc, someone asked interesting question.

Basically she ran a query like:

SELECT a, b, c, d, e, f FROM TABLE ORDER BY a

then, she processed the query to get, for each a array of unique values of b, c, d, e, and f, and then he inserted it back to database, to some other table.

It was a problem, because the table had many rows (millions I would assume), and the whole process was slow.

So, how to make it faster?

Continue reading Getting list of unique elements in table, per group

Waiting for 9.6 – Support parallel aggregation.

On 21st of March, Robert Haas committed patch:

Support parallel aggregation.
 
Parallel workers can now partially aggregate the data and pass the
transition values back to the leader, which can combine the partial
results to produce the final answer.
 
David Rowley, based on earlier work by Haribabu Kommi.  Reviewed by
Álvaro Herrera, Tomas Vondra, Amit Kapila, James Sewell, and me.

Continue reading Waiting for 9.6 – Support parallel aggregation.

Filling in the blanks

Some time ago someone on irc asked interesting question. One that I couldn't answer then (didn't have an immediate idea, and didn't have time to spend on looking into it).

Now, I have some more time, and despite the fact that the person that had this problem no longer cares about it (he found some solution himself if I recall correctly), decided to look into it.

Continue reading Filling in the blanks

Waiting for 9.4 – Provide moving-aggregate support for a bunch of aggregates.

On 13th of April, Tom Lane committed patch:

Provide moving-aggregate support for a bunch of numerical aggregates.
 
First installment of the promised moving-aggregate support in built-in
aggregates: count(), sum(), avg(), stddev() and variance() for
assorted datatypes, though not for float4/float8.
 
In passing, remove a 2001-vintage kluge in interval_accum(): interval
array elements have been properly aligned since around 2003, but
nobody remembered to take out this workaround.  Also, fix a thinko
in the opr_sanity tests for moving-aggregate catalog entries.
 
David Rowley and Florian Pflug, reviewed by Dean Rasheed

On the same day he also committed:

Provide moving-aggregate support for boolean aggregates.
 
David Rowley and Florian Pflug, reviewed by Dean Rasheed

Continue reading Waiting for 9.4 – Provide moving-aggregate support for a bunch of aggregates.

Waiting for 9.4 – Support ordered-set (WITHIN GROUP) aggregates.

On 23rd of December, Tom Lane committed patch:

Support ordered-set (WITHIN GROUP) aggregates.
 
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()).  We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.
 
Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions.  To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c.  This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need.  There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.
 
In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates.  Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.
 
Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane

Continue reading Waiting for 9.4 – Support ordered-set (WITHIN GROUP) aggregates.