On 29th of March 2021, Peter Eisentraut committed patch:
Add unistr function This allows decoding a string with Unicode escape sequences. It is similar to Unicode escape strings, but offers some more flexibility. Author: Pavel Stehule <pavel.stehule@gmail.com> Reviewed-by: Asif Rehman <asifr.rehman@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRA5GnKT+gDVwbVRH2ep451H_myBt+NTz8RkYUARE9+qOQ@mail.gmail.com
For a very long time we had ability to use unicoide literals without using unicode characters.
For example – instead of writing:
=$ SELECT 'żółw';
I could write:
=$ SELECT U&'\017C\00F3\0142\0077';
Or even:
=$ SELECT U&'\017C\00F3\0142w';
And even if I needed higher code points, like emoji, I could:
$ SELECT U&'\+01F603'; ?COLUMN? ────────── 😃 (1 ROW)
This is great. But now, we can use unistr function, which provides even more flexibility, and supports up to 8 hex digits in codepoint number.
Specifically unistr handles (X is single hexadecimal digit):
- \XXXX (just like U&'….'
- \+XXXXXX (just like U&'….'
- \uXXXX
- \UXXXXXXXX
Both new additions (\u and \U) seem to be the same as in Python (ans d possibly other languages), so it shouldn't be hard to grasp.
While I don't foresee now need for 32 bit (8 hex digits) codepoints, it's good to know I can take python string, and pass it through unistr to decode:
$ SELECT unistr('I \U0001f60d PostgreSQL\u203C'); unistr ───────────────── I 😍 PostgreSQL‼ (1 ROW)
Sweet, thanks to all involved 🙂
unistr has almost same possibility like string literal in Postgres. But unescaping of string literal is static – it just works only for string literal constant. With unistr it is dynamic – you can use it for any data.
Hi; I want to compile the postgres source code from github. how can I do this? is there any article about it?
@tnglm:
https://www.depesz.com/2019/05/15/how-to-play-with-upcoming-unreleased-postgresql/