Getting top-N rows per group

Yesterday on irc someone asked:

Hi, how do I get top 5 values from a column group by another column??

From further discussion, I learned that:

total rows in table is 2 million. It'll have unique words of less than 1 million.. (approx count)

I didn't have time yesterday, but decided to write a solution, or two, to the problem.

Continue reading Getting top-N rows per group

Concurrent REINDEX of all indexes in database

Recent release of new versions of PostgreSQL suggests that you do reindex of all indexes. But this will take a while, and since we don't actually have ‘REINDEX CONCURRENTLY' command – it's a bit tricky.

So, since I will be doing this on several databases, decided to write a script that will handle the work for me.

Continue reading Concurrent REINDEX of all indexes in database

Filling the gaps with window functions

Couple of days ago I had a problem that I couldn't solve after ~ 2 hours, and decided to ask on IRC. Almost immediately after asking, I figured out the solution, but David asked me to write about the solution, even though it's now (for me) completely obvious.

The problem was like this:

I had two tables, with very simple structure: event_when timestamptz, event_count int4, and wanted to show it as a single recordset with columns: event_when, event_count_a, event_count_b, but the problem was that event_when usually didn't match. Here is an example:

Continue reading Filling the gaps with window functions

Waiting for 9.3 – Implement SQL-standard LATERAL subqueries.

On 7th of August, Tom Lane committed patch:

Implement SQL-standard LATERAL subqueries.
 
This patch implements the standard syntax of LATERAL attached to a
sub-SELECT in FROM, and also allows LATERAL attached to a function in FROM,
since set-returning function calls are expected to be one of the principal
use-cases.
 
The main change here is a rewrite of the mechanism for keeping track of
which relations are visible for column references while the FROM clause is
being scanned.  The parser "namespace" lists are no longer lists of bare
RTEs, but are lists of ParseNamespaceItem structs, which carry an RTE
pointer as well as some visibility-controlling flags.  Aside from
supporting LATERAL correctly, this lets us get rid of the ancient hacks
that required rechecking subqueries and JOIN/ON and function-in-FROM
expressions for invalid references after they were initially parsed.
Invalid column references are now always correctly detected on sight.
 
In passing, remove assorted parser error checks that are now dead code by
virtue of our having gotten rid of add_missing_from, as well as some
comments that are obsolete for the same reason.  (It was mainly
add_missing_from that caused so much fudging here in the first place.)
 
The planner support for this feature is very minimal, and will be improved
in future patches.  It works well enough for testing purposes, though.
 
catversion bump forced due to new field in RangeTblEntry.

Continue reading Waiting for 9.3 – Implement SQL-standard LATERAL subqueries.

“PostgreSQL: Up and Running” by Regina Obe and Leo Hsu

Around a week ago, I got mail saying that authors wanted me to have their book. For free. That's relatively important because getting things for free means you should pay for it in other way (that's my position, it was not suggested in any way in the mails). Plus – I generally never read technical books (really – I just don't), so there was no chance I would get it myself, if it wasn't gifted.

After some mails I got mobi version of PostgreSQL: Up and Running.

Continue reading “PostgreSQL: Up and Running" by Regina Obe and Leo Hsu

Waiting for 9.3 – Event triggers

On 20th of July, Robert Haas committed patch:

Make new event trigger facility actually do something.
 
Commit 3855968f328918b6cd1401dd11d109d471a54d40 added syntax, pg_dump,
psql support, and documentation, but the triggers didn't actually fire.
With this commit, they now do.  This is still a pretty basic facility
overall because event triggers do not get a whole lot of information
about what the user is trying to do unless you write them in C; and
there's still no option to fire them anywhere except at the very
beginning of the execution sequence, but it's better than nothing,
and a good building block for future work.
 
Along the way, add a regression test for ALTER LARGE OBJECT, since
testing of event triggers reveals that we haven't got one.
 
Dimitri Fontaine and Robert Haas

This was preceded (two days earlier) by commit, also by Robert Haas, which stated:

Syntax support and documentation for event triggers.
 
They don't actually do anything yet; that will get fixed in a
follow-on commit.  But this gets the basic infrastructure in place,
including CREATE/ALTER/DROP EVENT TRIGGER; support for COMMENT,
SECURITY LABEL, and ALTER EXTENSION .. ADD/DROP EVENT TRIGGER;
pg_dump and psql support; and documentation for the anticipated
initial feature set.
 
Dimitri Fontaine, with review and a bunch of additional hacking by me.
Thom Brown extensively reviewed earlier versions of this patch set,
but there's not a whole lot of that code left in this commit, as it
turns out.

Continue reading Waiting for 9.3 – Event triggers

Waiting for 9.3 – Add array_remove() and array_replace() functions.

On 11th of July, Tom Lane committed patch:

Add array_remove() and array_replace() functions. <span class="refs"> <span class="head" title="heads/master"><a href="/gitweb/?p=postgresql.git;a=shortlog;h=refs/heads/master">master
 
These functions support removing or replacing array element value(s)
matching a given search value.  Although intended mainly to support a
future array-foreign-key feature, they seem useful in their own right.
 
Marco Nenciarini and Gabriele Bartolini, reviewed by Alex Hunsaker

Continue reading Waiting for 9.3 – Add array_remove() and array_replace() functions.

Waiting for 9.3 – Dramatically reduce System V shared memory consumption.

On 28th of June, Robert Haas committed patch:

Dramatically reduce System V shared memory consumption.
 
Except when compiling with EXEC_BACKEND, we'll now allocate only a tiny
amount of System V shared memory (as an interlock to protect the data
directory) and allocate the rest as anonymous shared memory via mmap.
This will hopefully spare most users the hassle of adjusting operating
system parameters before being able to start PostgreSQL with a
reasonable value for shared_buffers.
 
There are a bunch of documentation updates needed here, and we might
need to adjust some of the HINT messages related to shared memory as
well.  But it's not 100% clear how portable this is, so before we
write the documentation, let's give it a spin on the buildfarm and
see what turns red.

Continue reading Waiting for 9.3 – Dramatically reduce System V shared memory consumption.

Changes on explain.depesz.com

One of the features that is actually disliked is anonymization. But, regardless of the dislike – it has some users. And one of the user mailed me with information about a bug – namely – foreign table file names were not anonymized.

So, I wrote a patch, tests, released new version of underlying parsing library.

Continue reading Changes on explain.depesz.com

OmniPITR 1.0.0 released

Finally, after all these years, version 1.0.0 of OmniPITR got Released.

The reason I went to 1.0.0, and not 0.8.0 is very simple – finally, all programs in bin/ actually work 🙂

By that I mean: since beginning there was “omnipitr-monitor" – which simply didn't work, because work on it was always postponed. But now, it does. It's functionality is not all that great now, but it works, checks some basic data about replication, and can be used in production.

Now, there is still a todo but these things are less important.

I have to say that writing, and maintaining OmniPITR taught me a lot about PostgreSQL – how it works, and what really WAL is. It was really cool.