Picking random element, with weights

Whenever I'm doing some testing I need sample data. Easiest way to do it is to generate data using some random/generate_series queries.

But what if I need specific frequencies?

For example, I need to generate 10,000,000 rows, where there will be 10% of ‘a', 20% of ‘b', and the rest will be split equally between ‘c' and ‘d'?

Continue reading Picking random element, with weights

Waiting for PostgreSQL 16 – Add support for regexps on database and user entries in pg_hba.conf

On 24th of October 2022, Michael Paquier committed patch:

Add support for regexps on database and user entries in pg_hba.conf
 
As of this commit, any database or user entry beginning with a slash (/)
is considered as a regular expression.  This is particularly useful for
users, as now there is no clean way to match pattern on multiple HBA
lines.  For example, a user name mapping with a regular expression needs
first to match with a HBA line, and we would skip the follow-up HBA
entries if the ident regexp does *not* match with what has matched in
the HBA line.
 
pg_hba.conf is able to handle multiple databases and roles with a
comma-separated list of these, hence individual regular expressions that
include commas need to be double-quoted.
 
At authentication time, user and database names are now checked in the
following order:
- Arbitrary keywords (like "all", the ones beginning by '+' for
membership check), that we know will never have a regexp.  A fancy case
is for physical WAL senders, we *have* to only match "replication" for
the database.
- Regular expression matching.
- Exact match.
The previous logic did the same, but without the regexp step.
 
We have discussed as well the possibility to support regexp pattern
matching for host names, but these happen to lead to tricky issues based
on what I understand, particularly with host entries that have CIDRs.
 
This commit relies heavily on the refactoring done in a903971 and
fc579e1, so as the amount of code required to compile and execute
regular expressions is now minimal.  When parsing pg_hba.conf, all the
computed regexps needs to explicitely free()'d, same as pg_ident.conf.
 
Documentation and TAP tests are added to cover this feature, including
cases where the regexps use commas (for clarity in the docs, coverage
for the parsing logic in the tests).
 
Note that this introduces a breakage with older versions, where a
database or user name beginning with a slash are treated as something to
check for an equal match.  Per discussion, we have discarded this as
being much of an issue in practice as it would require a cluster to
have database and/or role names that begin with a slash, as well as HBA
entries using these.  Hence, the consistency gained with regexps in
pg_ident.conf is more appealing in the long term.
 
**This compatibility change should be mentioned in the release notes.**
 
Author: Bertrand Drouvot
Reviewed-by: Jacob Champion, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/fff0d7c1-8ad4-76a1-9db3-0ab6ec338bf7@amazon.com

Continue reading Waiting for PostgreSQL 16 – Add support for regexps on database and user entries in pg_hba.conf

What is LATERAL, what is it for, and how can one use it?

Lately in couple of places I recommended people that they can solve their problem with queries using LATERAL. In some cases recipient of such suggestion indicated that they had no idea what LATERAL is. Which made me think that it might be good idea to write more about them (lateral queries)…

Also – I know that some of the examples I shown in here can be done differently, I just wanted to show how one can use LATERAL, and am terrible with coming up with better usecases.

Continue reading What is LATERAL, what is it for, and how can one use it?

SQL/JSON is postponed

Back in March and April I wrote couple of blogposts about upcoming new great feature – full SQL/JSON.

Unfortunately, as of today SQL/JSON has been reverted (removed from sources), both from development version 16, but also from “almost ready" version 15.

There is lengthy discussion about it which you can definitely read if you're so inclined (side note, I didn't catch it immediately, so for future me – RMT is Release Management Team.

Obviously the feature will make its comeback. Whether it will be in 16 or later – can't tell now, but it will definitely reappear.

New SQL pretty printer – based on parsing, and not regexps

For a long time I was looking for SQL pretty printer.

Some queries that I had to deal with, over the years, were just insane to read, like this:

Continue reading New SQL pretty printer – based on parsing, and not regexps

Better scrolling of long explain plans on explain.depesz.com

Thanks to some discussion on Slack, and work by Alexandre Felipe viewing of large explains will be now a bit easier.

You can see it, for example, in this explain.

First of all, when you scroll down, the column headers stay in place. Plus – you can always see the horizontal scrollbar to see the rest for really long node descriptions.

Thanks a lot, I appreciate the work.