My personal PostgreSQL Wish List

During my time with PostgreSQL I found some things that I love. I found some things that I hate (very little of them). And some things that I would like to get, but, so far, I don't have them.

Of course, PostgreSQL being Open Source, I could add it myself, if only I would know some (real) C and had necessary skills. Which I don't. But anyway, decided to write my personal wishlist for PostgreSQL – maybe someone will say “hey, that would be cool, and I know how to write it" 🙂

Continue reading My personal PostgreSQL Wish List

Waiting for 9.5 – Support arrays as input to array_agg() and ARRAY(SELECT …).

On 25th of November, Tom Lane committed patch:

Support arrays as input to array_agg() and ARRAY(SELECT ...).
 
These cases formerly failed with errors about "could not find array type
for data type".  Now they yield arrays of the same element type and one
higher dimension.
 
The implementation involves creating functions with API similar to the
existing accumArrayResult() family.  I (tgl) also extended the base family
by adding an initArrayResult() function, which allows callers to avoid
special-casing the zero-inputs case if they just want an empty array as
result.  (Not all do, so the previous calling convention remains valid.)
This allowed simplifying some existing code in xml.c and plperl.c.
 
Ali Akbar, reviewed by Pavel Stehule, significantly modified by me

Continue reading Waiting for 9.5 – Support arrays as input to array_agg() and ARRAY(SELECT …).

Waiting for 9.5 – BRIN: Block Range Indexes.

On 7th of November, Alvaro Herrera committed patch:

BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.
 
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
 
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range.  This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results.  In this commit I only include minmax.
 
Catalog version bumped due to new builtin catalog entries.
 
There's more that could be done here, but this is a good step forwards.
 
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
 
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
 
PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633.

Continue reading Waiting for 9.5 – BRIN: Block Range Indexes.

How to install your own copy of explain.depesz.com

There are some cases where you might want to get your own copy of explain.depesz.com. You might not trust me with your explains. You might want to use it without internet access. Or you just want to play with it, and have total control over the site.

Installing, while obvious to me, and recently described by John Poole, is not always 100% clear. So, I decided to write about how to set it up, from scratch.

Continue reading How to install your own copy of explain.depesz.com

Changes on explain.depesz.com

Uploaded new version to the server – straight from GitHub. There are two changes – one visible, and one not really.

The invisible change, first, is one for people hosting explain.depesz.com on their own. As you perhaps know you can get sources of explain.depesz.com and install it on any box you want (as log as you can get there PostgreSQL, Perl, and some perl modules). While working on it on my own, I figured I could use a way to tell which version of module-xxx the site is running right now. So I build /info page (which is inaccessible to everyone, but manually-marked admins), which lists versions and interesting paths.

The second change – the one visible to users, is that I made explain.depesz.com commify numbers. Sometimes it can be hard to read value like 12325563, but now it will be displayed as 12,325,563 making is simpler to grasp.

This second change was suggested by Jacek Wielemborek, so if you hate it – blame him. Of course if you love the change – it's all on me 🙂

Hope you'll find it helpful.

PostgreSQL + Perl + Unicode == confusion. Why?

Yesterday I had an interesting discussion on irc.

A guy wanted to know why Perl script is causing problems when dealing with Pg
and unicode characters.

The discussion went sideways, I got (a bit) upset, and had to leave anyway, so
I didn't finish it. But it did bother me, as for me the reasons of the problem
seem obvious, yet the person I talked with was very adamant that I have the
whole thing wrong.

So, I figured I'll use my blog to elaborate a bit…

Continue reading PostgreSQL + Perl + Unicode == confusion. Why?

What logging has least overhead?

When working with PostgreSQL you generally want to get information about slow queries. The usual approach is to set log_min_duration_statement to some low(ish) value, run your app, and then analyze logs.

But you can log to many places – flat file, flat file on another disk, local syslog, remote syslog. And – perhaps, instead of log_min_duration_statement – just use pg_stat_statements?

Well, I wondered about it, and decided to test.

Continue reading What logging has least overhead?