Was asked recently about optimization of interesting case. There was table like:
=$ CREATE TABLE input_data ( category_id INT8, object_id INT8, interaction_ts timestamptz, interaction_type TEXT, interaction_count INT4 );
And there was a code that was grouping it all by sum()ing interaction_count per category, object, interaction_type, and timestamp truncated to hour.
Basically, storing somewhere result of:
=$ select category_id, object_id, date_trunc( 'hour', interaction_ts ) as ts, sum(interaction_count) filter (where interaction_type = 'a') as a_count, sum(interaction_count) filter (where interaction_type = 'b') as b_count from input_data group by 1, 2, 3;
While talking about optimizations, one idea that came was to store whole day of counts in single row, as array. So the resulting count table would be:
=$ create table results ( category_id int8, object_id int8 interaction_day date, a_counts int4[], b_counts int4[] );
Where a_counts, and b_counts would always have 24 elements, one for each hour.
Now, how to roll it up like this?
Continue reading Grouping data into array of sums – fun with custom aggregates