Was asked recently about optimization of interesting case. There was table like:
=$ CREATE TABLE input_data ( category_id INT8, object_id INT8, interaction_ts timestamptz, interaction_type TEXT, interaction_count INT4 );
And there was a code that was grouping it all by sum()ing interaction_count per category, object, interaction_type, and timestamp truncated to hour.
Basically, storing somewhere result of:
=$ SELECT category_id, object_id, date_trunc( 'hour', interaction_ts ) AS ts, SUM(interaction_count) FILTER (WHERE interaction_type = 'a') AS a_count, SUM(interaction_count) FILTER (WHERE interaction_type = 'b') AS b_count FROM input_data GROUP BY 1, 2, 3;
While talking about optimizations, one idea that came was to store whole day of counts in single row, as array. So the resulting count table would be:
=$ CREATE TABLE results ( category_id int8, object_id int8 interaction_day DATE, a_counts int4[], b_counts int4[] );
Where a_counts, and b_counts would always have 24 elements, one for each hour.
Now, how to roll it up like this?
Continue reading Grouping data into array of sums – fun with custom aggregates