Waiting for PostgreSQL 18 – Support LIKE with nondeterministic collations

On 27th of November 2024, Peter Eisentraut committed patch:

Support LIKE with nondeterministic collations
 
This allows for example using LIKE with case-insensitive collations.
There was previously no internal implementation of this, so it was met
with a not-supported error.  This adds the internal implementation and
removes the error.  The implementation follows the specification of
the SQL standard for this.
 
Unlike with deterministic collations, the LIKE matching cannot go
character by character but has to go substring by substring.  For
example, if we are matching against LIKE 'foo%bar', we can't start by
looking for an 'f', then an 'o', but instead with have to find
something that matches 'foo'.  This is because the collation could
consider substrings of different lengths to be equal.  This is all
internal to MatchText() in like_match.c.
 
The changes in GenericMatchText() in like.c just pass through the
locale information to MatchText(), which was previously not needed.
This matches exactly Generic_Text_IC_like() below.
 
ILIKE is not affected.  (It's unclear whether ILIKE makes sense under
nondeterministic collations.)
 
This also updates match_pattern_prefix() in like_support.c to support
optimizing the case of an exact pattern with nondeterministic
collations.  This was already alluded to in the previous code.
 
(includes documentation examples from Daniel Vérité and test cases
from Paul A Jungwirth)
 
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/700d2e86-bf75-4607-9cf2-f5b7802f6e88@eisentraut.org

Continue reading Waiting for PostgreSQL 18 – Support LIKE with nondeterministic collations

How much speed you’re leaving at the table if you use default locale?

I've been to PGConf.dev recently, and one of the talks was about collations.

The whole talk was interesting (to put it mildly), but the thing that stuck with me is that we really shouldn't be using default collation provider (libc with locale based collation), unless it's really needed, because we're wasting performance. But how much of a hit is it?

Continue reading How much speed you're leaving at the table if you use default locale?

Why is “depesz” between “de luca” and “de vil”?

Every so often someone asks why sorting behaves irrational. Like here:

$ SELECT string FROM test ORDER BY string;
  string
----------
 dean
 deer
 de luca
 depesz
 de vil
 dyslexia
(6 ROWS)

Why aren't “de luca" and “de vil" together?

Continue reading Why is “depesz" between “de luca" and “de vil"?

Waiting for 9.1 – Per-column collation support

On 8th of February, Peter Eisentraut committed patch:

Per-column collation support
 
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
 
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
 
Branch
------
master

Continue reading Waiting for 9.1 – Per-column collation support