Using recursive queries to get distinct elements from table

I wrote about similar things couple of times, but recently found thread on pgsql-general mailing list that made me thing about it again.

Summary of the problem from mail is: we have a table, ~ 800 million rows, with, at least 2 columns:

  • station – 170 distinct values
  • channel – generally 1-3 channels per station

And then we want to run:

SELECT
    station,
    array_agg(DISTINCT (channel)) AS channels
FROM
    DATA
GROUP BY
    station;

Which, on Israel's (original poster) machine took ~ 5 minutes.

And this is with index on on data (station, channel).

Can we do better?

Continue reading Using recursive queries to get distinct elements from table

Waiting for PostgreSQL 15 – Revoke PUBLIC CREATE from public schema, now owned by pg_database_owner.

On 10th of September 2021, Noah Misch committed patch:

Revoke PUBLIC CREATE from public schema, now owned by pg_database_owner.
 
This switches the default ACL to what the documentation has recommended
since CVE-2018-1058.  Upgrades will carry forward any old ownership and
ACL.  Sites that declined the 2018 recommendation should take a fresh
look.  Recipes for commissioning a new database cluster from scratch may
need to create a schema, grant more privileges, etc.  Out-of-tree test
suites may require such updates.
 
Reviewed by Peter Eisentraut.
 
Discussion: https://postgr.es/m/20201031163518.GB4039133@rfd.leadboat.com

Continue reading Waiting for PostgreSQL 15 – Revoke PUBLIC CREATE from public schema, now owned by pg_database_owner.

How to get advisory lock in shell?

Recently we had interesting problem. There are some maintenance tools, that do stuff that touch database. One can be calling repack, another can be specific dump, and yet another can be applying migrations from application.

The problem is that they can step over each other toes, and cause issues.

So we needed to add some way to prevent them from running at the same time…

Continue reading How to get advisory lock in shell?

New changes on explain.depesz.com – fixed calculations of exclusive times

When you see plan on explain.depesz.com, one of the columns visible is exclusive.

In principle it's simple – take time of current node, subtract times of all sub nodes, and you get how much time was spent in this node alone.

Reality is not that simple. Two things especially make the calculations complicated. CTEs, and InitPlans.

Now, with the just pushed change to Pg::Explain (plan parsing library), and site itself, some of the bad calculations are gone.

Let's see some examples:

Continue reading New changes on explain.depesz.com – fixed calculations of exclusive times

Display “settings” from plans on explain.depesz.com

Some time ago I wrote about new options for explains – one that prints settings that were modified from default.

This looks like this:

Aggregate  (cost=35.36..35.37 rows=1 width=8)
  ->  Index Only Scan using pg_class_oid_index on pg_class  (cost=0.27..34.29 rows=429 width=0)
Settings: enable_seqscan = 'off'

Finally, today, I pushed a change that displays them on explain.depesz.com. To make it more usable, it will also provide links to documentation about each such options.

For example, check this plan.

Hope you'll find it useful.

How to get list of elements from multiranges?

So, some time ago, Pg devs added multi ranges – that is datatype that can be used to store multiple ranges in single column.

The thing is that it wasn't really simple how to get list of ranges from within such multirange. There was no operator, no way to split it.

A month ago Alexander Korotkov committed patch that added unnest() over multiranges, but it got some problems, and was reverted

It will eventually made it's way into sources, I assume, but in the mean time – a friend of mine asked how to get list of elements from multiranges. So I looked into it.

Continue reading How to get list of elements from multiranges?

Buffers I/O information on explain.depesz.com

I just released first version of change to explain.depesz.com that displays buffer I/O information, as described recently.

You can see it in here.

There are two new columns in there, showing how much data given node read from disk, and how much it wrote.

It's definitely not 100% OK now, as you can see, speed and time is not always there, but generally it should work.

There is also summarized info in stats page of the explain.

If you'd notice anything wrong, please let me know, either by mail, or just make an issue on GitLab.

A tale of making company-wide standard psqlrc

At a company we have literally thousands of Pg servers. The layout is also kinda non-obvious. Each database is named the same, but contains different data. And in front of it all, we have pgbouncers.

After some talk, it was suggested that perhaps we could make psql prompt show which database it is connected to. And perhaps some more information, like backend pid. I thought it will be simple…

Continue reading A tale of making company-wide standard psqlrc

Many changes on explain.depesz.com

Some time ago Eugen Konkov mailed me that he'd like to have some changes on explain.depesz.com.

One of the changes was actual bug, but the rest were improvements to functionality.

I kinda didn't want to do it, but when I looked closer it appeared to me that there are some subtle bugs, and when I'll be fixing them, I can add some of the things Eugen requested:

Continue reading Many changes on explain.depesz.com