Grouping data into array of sums – fun with custom aggregates

Was asked recently about optimization of interesting case. There was table like:

=$ CREATE TABLE input_data (
    category_id        INT8,
    object_id          INT8,
    interaction_ts     timestamptz,
    interaction_type    TEXT,
    interaction_count  INT4
);

And there was a code that was grouping it all by sum()ing interaction_count per category, object, interaction_type, and timestamp truncated to hour.

Basically, storing somewhere result of:

=$ SELECT
    category_id,
    object_id,
    date_trunc( 'hour', interaction_ts ) AS ts,
    SUM(interaction_count) FILTER (WHERE interaction_type = 'a') AS a_count,
    SUM(interaction_count) FILTER (WHERE interaction_type = 'b') AS b_count
FROM
    input_data
GROUP BY 1, 2, 3;

While talking about optimizations, one idea that came was to store whole day of counts in single row, as array. So the resulting count table would be:

=$ CREATE TABLE results (
    category_id      int8,
    object_id        int8
    interaction_day  DATE,
    a_counts         int4[],
    b_counts         int4[]
);

Where a_counts, and b_counts would always have 24 elements, one for each hour.

Now, how to roll it up like this?

Continue reading Grouping data into array of sums – fun with custom aggregates

Waiting for PostgreSQL 18 – Add SQL function array_reverse()

On 1st of November 2024, Michael Paquier committed patch:

Add SQL function array_reverse()
 
This function takes in input an array, and reverses the position of all
its elements.  This operation only affects the first dimension of the
array, like array_shuffle().
 
The implementation structure is inspired by array_shuffle(), with a
subroutine called array_reverse_n() that may come in handy in the
future, should more functions able to reverse portions of arrays be
introduced.
 
Bump catalog version.
 
Author: Aleksander Alekseev
Reviewed-by: Ashutosh Bapat, Tom Lane, Vladlen Popolitov
Discussion: https://postgr.es/m/CAJ7c6TMpeO_ke+QGOaAx9xdJuxa7r=49-anMh3G5476e3CX1CA@mail.gmail.com

Continue reading Waiting for PostgreSQL 18 – Add SQL function array_reverse()

New way to search PostgreSQL documentation

PostgreSQL documentation is, generally speaking, great. But it isn't the easiest thing to search in. Over the years I memorized urls to certain docs, but there is a limit to it.

What's more, there are certain inconsistencies. For example – most pages that describe program have name that starts with app-. But not all. Some programs have names that start with pg_, and some don't have this _ thing. Their docs usually get rid of _ in file name, but, again, not always, sometimes it's changed to .

I've been aware of all these things, and even tried to look for a way to fix it in docs, but never could get around to learn docbook.

So, figured I can provide a tool that will search the way I think it should…

Continue reading New way to search PostgreSQL documentation

SQL/JSON is here! (kinda “Waiting for Pg 17”)

Amazing. Awesome. Well, but what is it? We could store json data in Pg since PostgreSQL 9.2 – so it's been there for over 12 years now. How is the new shiny thing different? What does it allow you to do?

Let's see if I can shed some light on it…

Continue reading SQL/JSON is here! (kinda “Waiting for Pg 17")

Waiting for PostgreSQL 18 – Add temporal FOREIGN KEY contraints

On 17th of September 2024, Peter Eisentraut committed patch:

Add temporal FOREIGN KEY contraints
 
Add PERIOD clause to foreign key constraint definitions.  This is
supported for range and multirange types.  Temporal foreign keys check
for range containment instead of equality.
 
This feature matches the behavior of the SQL standard temporal foreign
keys, but it works on PostgreSQL's native ranges instead of SQL's
"periods", which don't exist in PostgreSQL (yet).
 
Reference actions ON {UPDATE,DELETE} {CASCADE,SET NULL,SET DEFAULT}
are not supported yet.
 
(previously committed as 34768ee3616, reverted by 8aee330af55; this is
essentially unchanged from those)
 
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com

Continue reading Waiting for PostgreSQL 18 – Add temporal FOREIGN KEY contraints

Waiting for PostgreSQL 18 – Add temporal PRIMARY KEY and UNIQUE constraints

On 17th of September 2024, Peter Eisentraut committed patch:

Add temporal PRIMARY KEY and UNIQUE constraints
 
Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints.
These are backed by GiST indexes instead of B-tree indexes, since they
are essentially exclusion constraints with = for the scalar parts of
the key and && for the temporal part.
 
(previously committed as 46a0cd4cefb, reverted by 46a0cd4cefb; the new
part is this:)
 
Because 'empty' && 'empty' is false, the temporal PK/UQ constraint
allowed duplicates, which is confusing to users and breaks internal
expectations.  For instance, when GROUP BY checks functional
dependencies on the PK, it allows selecting other columns from the
table, but in the presence of duplicate keys you could get the value
from any of their rows.  So we need to forbid empties.
 
This all means that at the moment we can only support ranges and
multiranges for temporal PK/UQs, unlike the original patch (above).
Documentation and tests for this are added.  But this could
conceivably be extended by introducing some more general support for
the notion of "empty" for other types.
 
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com

Continue reading Waiting for PostgreSQL 18 – Add temporal PRIMARY KEY and UNIQUE constraints

What tables were touched within given range of wal LSN?

We had a case recently where one of our DBs failed over to a new primary. To this server (old primary, and then new primary) we had connection from some kind of CDC tool (Debezium, I think).

The thing is that while there was failover, this tool read (using logical decoding) changes on old primary to location 1F1F/4ADC3348, and on new db, it connected from location 1F1F/4ADC4038. Which means that there was some WAL that was not decoded and sent to Debezium.

This is fixable, but we need to know what tables were touched in the WAL stream, that Debezium didn't see, if any. Perhaps we could also get ids/location of the rows that were there?

Continue reading What tables were touched within given range of wal LSN?

How much speed you’re leaving at the table if you use default locale?

I've been to PGConf.dev recently, and one of the talks was about collations.

The whole talk was interesting (to put it mildly), but the thing that stuck with me is that we really shouldn't be using default collation provider (libc with locale based collation), unless it's really needed, because we're wasting performance. But how much of a hit is it?

Continue reading How much speed you're leaving at the table if you use default locale?