Thanks to the company I work for OmniTI I was working on pretty cool project. Name of the project is OmniPITR, and here is what it is, why, how, and where to get it.
Tag: postgresql
What mistakes you can avoid when looking for help on IRC?
Today, there was this one person on IRC, which asked question and provided some data. While working on helping him (her?), I noticed some things, that bugged me before in other cases, but this time i decided to write about it – it's kind of rant, and if you (the reader) are the person that I'm basing my example on – please do not feel “punished" – it just so happens, that you exhibited some things that make helping others more difficult than it could be – so: you're not special, although I would really prefer if you were 🙂
Continue reading What mistakes you can avoid when looking for help on IRC?
Waiting for 9.0 – pg_upgrade
On May, 12ve, Bruce Momjian committed new contrib module for 9.0 – pg_upgrage.
As I understand – this is what was available before as pg-migrator.
If you're not familiar with it – it's a tool that allows upgrade of $PGDATA from some version to some version. What's the use case? Let's assume you have this 200GB database working as 8.3, and you'd like to go to 8.4 (or 9.0). Normal way is pg_dump + pg_restore – which will take some time. With pg-migrate/pg_upgrade it should be faster, and easier. So, let's play with it.
Tips n’ tricks – rank on changes
I got asked this: having this table:
# select * from a order by d; t | d ---+---- O | 1 O | 2 O | 3 M | 4 M | 5 M | 6 M | 7 O | 8 O | 9 O | 10 I | 11 I | 12 I | 13 (13 rows)
Is it possible to add “rank" column, that will increment whenever t changed?
Stupid tricks – hiding value of column in select *
One of the most common questions is “how do I get select * from table, but without one of the column".
Short answer is of course – name your columns, instead of using *. Or use a view.
But I decided to take a look at the problem.
Continue reading Stupid tricks – hiding value of column in select *
Getting unique elements
Let's assume you have some simple database with “articles" – each article can be in many “categories". And now you want to get list of all articles in given set of categories.
Standard approach:
select a.* from articles as a join articles_in_categories as aic on a.id = aic.article_id where aic.category_id in (14,62,70,53,138)
Will return duplicated article data if given article is in more than one from listed categories. How to remove redundant rows?
Tips n’ Tricks – using “wrong” index
More than once I've seen situation when there is a table, with serial primary key, and rows contain also some kind of creation timestamp, which is usually monotonic, or close to monotonic.
Example of such case are for example comments or posts in forums – each get it's ID, but they also have creation timestamp. And it usually is so that higher ids were added later than the lower ids.
So, let's assume you have such table, and somebody asks you to make a report on data from last month. How?
How to remove backups?
Question from title sounds weird to you? It's just a ‘rm backup_filename'? Well. I really wish it was so simple in some cases.
One of the servers I'm looking into, there is interesting situation:
- quite busy database server (2k tps is the low point of the day)
- very beefy hardware
- daily backups, each sized at about 100GB
- backups stored on ext3 filesystem with default options
- before launching daily backup, script removes oldest backup (we keep 3 days of backups on this machine)
Profiling stored procedures/functions
One database that I am monitoring uses a lot of stored procedures. Some of them are fast, some of them are not so fast. I thought – is there a sensible way to diagnose which part of stored procedure take the most time?
I mean – I could just put the logic into application, and then every query would have it's own timing in Pg logs, but this is not practical. And I also believe that using stored procedures/functions is way better than using plain SQL due to a number of reasons.
So, I'm back to question – how to check which part of function takes most of the time?
Setting WAL Replication
There are several approaches on replication/failover – you might have heard of Slony, Londiste, pgPool and some other tools.
WAL Replication is different from all of them in one aspect – it doesn't let you query slave database (until 9.0, in which you actually can run read only queries on slave.
Since you can't run queries on slave, what is it good for? Well. It's good, and great in 1 very important aspect – all things that happen in database are replicated. Schema changes. Sequence modifications. Everything.
There is also drawback – you can't (as of now) replicate just one database. You replicate whole cluster (I don't like this word in this context – let's say: whole installation) of PostgreSQL. All databases that reside in given DATA directory.
So, the question is – how to set it up?