Waiting for PostgreSQL 18 – Support LIKE with nondeterministic collations

On 27th of November 2024, Peter Eisentraut committed patch:

Support LIKE with nondeterministic collations
 
This allows for example using LIKE with case-insensitive collations.
There was previously no internal implementation of this, so it was met
with a not-supported error.  This adds the internal implementation and
removes the error.  The implementation follows the specification of
the SQL standard for this.
 
Unlike with deterministic collations, the LIKE matching cannot go
character by character but has to go substring by substring.  For
example, if we are matching against LIKE 'foo%bar', we can't start by
looking for an 'f', then an 'o', but instead with have to find
something that matches 'foo'.  This is because the collation could
consider substrings of different lengths to be equal.  This is all
internal to MatchText() in like_match.c.
 
The changes in GenericMatchText() in like.c just pass through the
locale information to MatchText(), which was previously not needed.
This matches exactly Generic_Text_IC_like() below.
 
ILIKE is not affected.  (It's unclear whether ILIKE makes sense under
nondeterministic collations.)
 
This also updates match_pattern_prefix() in like_support.c to support
optimizing the case of an exact pattern with nondeterministic
collations.  This was already alluded to in the previous code.
 
(includes documentation examples from Daniel Vérité and test cases
from Paul A Jungwirth)
 
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/700d2e86-bf75-4607-9cf2-f5b7802f6e88@eisentraut.org

Continue reading Waiting for PostgreSQL 18 – Support LIKE with nondeterministic collations

Waiting for PostgreSQL 18 – psql: Add more information about service name

On 18th of December 2024, Michael Paquier committed patch:

psql: Add more information about service name
 
This commit adds support for the following items in psql, able to show a
service name, when available:
- Variable SERVICE.
- Substitution %s in PROMPT{1,2,3}.
 
This relies on 4b99fed7541e, that has made the service name available in
PGconn for libpq.
 
Author: Michael Banck
Reviewed-by: Greg Sabino Mullane
Discussion: https://postgr.es/m/6723c612.050a0220.1567f4.b94a@mx.google.com

Continue reading Waiting for PostgreSQL 18 – psql: Add more information about service name

Waiting for PostgreSQL 18 – Add UUID version 7 generation function.

On 11st of December 2024, Masahiko Sawada committed patch:

Add UUID version 7 generation function.
 
This commit introduces the uuidv7() SQL function, which generates UUID
version 7 as specified in RFC 9652. UUIDv7 combines a Unix timestamp
in milliseconds and random bits, offering both uniqueness and
sortability.
 
In our implementation, the 12-bit sub-millisecond timestamp fraction
is stored immediately after the timestamp, in the space referred to as
"rand_a" in the RFC. This ensures additional monotonicity within a
millisecond. The rand_a bits also function as a counter. We select a
sub-millisecond timestamp so that it monotonically increases for
generated UUIDs within the same backend, even when the system clock
goes backward or when generating UUIDs at very high
frequency. Therefore, the monotonicity of generated UUIDs is ensured
within the same backend.
 
This commit also expands the uuid_extract_timestamp() function to
support UUID version 7.
 
Additionally, an alias uuidv4() is added for the existing
gen_random_uuid() SQL function to maintain consistency.
 
Bump catalog version.
 
Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Przemysław Sztoch, Nikolay Samokhvalov
Reviewed-by: Peter Eisentraut, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Masahiko Sawada, Lukas Fittl, Michael Paquier, Japin Li
Reviewed-by: Marcos Pegoraro, Junwang Zhao, Stepan Neretin
Reviewed-by: Daniel Vérité
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com

Continue reading Waiting for PostgreSQL 18 – Add UUID version 7 generation function.

SQL best practices – don’t compare count(*) with 0

Every now and then I see something like this:

SELECT u.* FROM users u
WHERE 0 = (SELECT COUNT(*) FROM addresses a WHERE a.user_id = u.id);

and it kinda pains me. So figured, I'll write about what is the problem with it, and how to avoid such constructs.

Continue reading SQL best practices – don't compare count(*) with 0

How can I send mail or HTTP request from database?

This question happens every now and then in one of PostgreSQL support places.

Whenever it happens, I just suggest to not try, as interacting with outside world from database can be problematic, and instead use LISTEN/NOTIFY.

But it occurred to me, that while I know how to do it, I don't think I actually did it. It being: handle listen/notify in real life code. So let's try. I will not be writing actual email sending or http requesting, but will make sure that my program will get, from database, information when NOTIFY happens. How hard could it be?

Continue reading How can I send mail or HTTP request from database?

Changes on pgdoc.link

Based on checking logs, and my own personal needs I added more categories of keywords to pgdoc.link:

This brought the total number of known keyword to 2410.

As a side note – while I generally like PostgreSQL docs, state of contrib module documentation is …, well, challenging. Every doc has its own approach to listing stuff. That was “fun" to work on, and it's the main reason why I'm sure not all functions/modules are handled. If you will notice something that isn't – please let me know, I'll do my best to add it.

Grouping data into array of sums – fun with custom aggregates

Was asked recently about optimization of interesting case. There was table like:

=$ CREATE TABLE input_data (
    category_id        INT8,
    object_id          INT8,
    interaction_ts     timestamptz,
    interaction_type    TEXT,
    interaction_count  INT4
);

And there was a code that was grouping it all by sum()ing interaction_count per category, object, interaction_type, and timestamp truncated to hour.

Basically, storing somewhere result of:

=$ SELECT
    category_id,
    object_id,
    date_trunc( 'hour', interaction_ts ) AS ts,
    SUM(interaction_count) FILTER (WHERE interaction_type = 'a') AS a_count,
    SUM(interaction_count) FILTER (WHERE interaction_type = 'b') AS b_count
FROM
    input_data
GROUP BY 1, 2, 3;

While talking about optimizations, one idea that came was to store whole day of counts in single row, as array. So the resulting count table would be:

=$ CREATE TABLE results (
    category_id      int8,
    object_id        int8
    interaction_day  DATE,
    a_counts         int4[],
    b_counts         int4[]
);

Where a_counts, and b_counts would always have 24 elements, one for each hour.

Now, how to roll it up like this?

Continue reading Grouping data into array of sums – fun with custom aggregates

Waiting for PostgreSQL 18 – Add SQL function array_reverse()

On 1st of November 2024, Michael Paquier committed patch:

Add SQL function array_reverse()
 
This function takes in input an array, and reverses the position of all
its elements.  This operation only affects the first dimension of the
array, like array_shuffle().
 
The implementation structure is inspired by array_shuffle(), with a
subroutine called array_reverse_n() that may come in handy in the
future, should more functions able to reverse portions of arrays be
introduced.
 
Bump catalog version.
 
Author: Aleksander Alekseev
Reviewed-by: Ashutosh Bapat, Tom Lane, Vladlen Popolitov
Discussion: https://postgr.es/m/CAJ7c6TMpeO_ke+QGOaAx9xdJuxa7r=49-anMh3G5476e3CX1CA@mail.gmail.com

Continue reading Waiting for PostgreSQL 18 – Add SQL function array_reverse()