everybody knows explain analyze.
it's well known, well understood tool.
the problem i have with it is that i don't see (at a glance) what's really eating the time in the query.
Continue reading better explain analyze
everybody knows explain analyze.
it's well known, well understood tool.
the problem i have with it is that i don't see (at a glance) what's really eating the time in the query.
Continue reading better explain analyze
just couple of days ago i read about a new, great addition to postgresql 8.3 – “return query" in pl/pgsql.
what does it do?
in set returning functions, when you wanted to return multiple rows from a given query you had to:
FOR record IN SELECT ..... LOOP RETURN NEXT record; END LOOP;
now, you can simply:
RETURN QUERY SELECT ...;
what's more – since RETURN QUERY doesn't terminate function (just like return next) you can:
RETURN QUERY SELECT something;
RETURN QUERY SELECT something else;
and then you'll get (more or less) “union all" of the queries.
additionally – return query is supposed to be faster then return next/loop.
so, let's test it.
Continue reading set returning functions in 8.3
for the long time everybody knew that you can't use index on “LIKE" operations.
then came text_pattern_ops, so we could use indexes for prefix searches:
# \d depesz_test Table "public.depesz_test" Column | Type | Modifiers --------+---------+---------------------------------------------------------- id | integer | not null default nextval('depesz_test_id_seq'::regclass) email | text | not null Indexes: "depesz_test_pkey" PRIMARY KEY, btree (id) "x" UNIQUE, btree (email text_pattern_ops)
# explain analyze select count(*) from depesz_test where email like 'dep%'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Aggregate (cost=96.71..96.72 rows=1 width=0) (actual time=0.983..0.985 rows=1 loops=1) -> Bitmap Heap Scan on depesz_test (cost=4.68..96.65 rows=24 width=0) (actual time=0.184..0.641 rows=155 loops=1) Filter: (email ~~ 'dep%'::text) -> Bitmap Index Scan on x (cost=0.00..4.67 rows=24 width=0) (actual time=0.158..0.158 rows=155 loops=1) Index Cond: ((email ~>=~ 'dep'::text) AND (email ~<~ 'deq'::text)) Total runtime: 1.067 ms (6 rows)
but what if i'd like to search for ‘%something'? not prefix, but suffix. in my example – what can i do to use indexes when searching for people from given domain?
how many times did you (or somebody in your environment) did something like this? update (or delete) without proper “where"?
it happened at least couple of times to me.
of course – using transactions solved the problem. but – on the other hand – if i can't trust myself to add proper where, how can i trust myself to add “begin"?
so, is there no hope?
are we doomed to always repeat the same mistake, and spend uncountable hours on recovering from damaged tables?
fortunatelly there is a help.
Continue reading update account set password = ‘new_password'; oops?!
mage_ from #postgresql had interesting problem today.
he has a table with 2 date fields, and he wants to have list of all years from both fields. together. as one list.
his approach:
select date_part('year', date1) from test union select date_part('year', date2) from test;
is hardly satisfactory – it takes too long.
any way to speed it up?
Continue reading postgresql tips & tricks
some time ago skype released some tools for postgresql that they developed.
one of the released tools is londiste – simple replication system. it is master to multiple slaves, without cascading.
the problem with skype projects is that documentation is not as full as we expect from pg-related project. luckily dim used the londiste, and was kind enough to write some howto/manual for this project.
all i can say is: take a look at it. it looks as a very simple to use, without all features of slony, but definitely worth to take a look at.
i was informed that jan wieck wrote another function which does cumulative sum – using cursors and srf's to get the same result.
i checked it, and here are the results.
Continue reading cumulative sum in sql – howto part2
i found lately quite interesting issue. how to calculate cumulative_sum across some dataset.
if you will search for cumulative sum, you will find some answers. most of them revolve around using subselect or join.
so, how to calculate it?
Continue reading cumulative sum in sql – howto
some time ago i wrote a piece on values(),() in postgresql 8.2, saying that multi-row inserts are very fast.
some people said that my benchmark is pretty useless as i didn't take into consideration transactions.
others asked me to translate the text to english.
so i decided to redo the test, with more test scenarios, and write it up in english. this is the summary.
at first what i used, what i tested and how.
i used a linux machine, with these things inside:
i tried to make the machine as predictable as possible, thus i stopped all daemons which were not neccessary. full ps auxwwf output is provided in results tar file. basically – there is postgresql, sshd, openvpn, dhclient and some gettys. no cron, atd, smtpd, httpd or anything like this.
then i wrote a small program which generated test files. i do not distribute test files themselves, as in total they use nearly 70gb!
then i wrote another small program – which basically ran all of the tests (3 times to get an average).
full set of results is downloadable as tar file, which contains 10598 files (tar file is 350k, unpacked directory takes 42megs).
one very important notice. all tests that i have performed inserted random data to table of this structure:
so results (especially “break-points" where there is no further gain) will be different when inserting to another tables. the only point of this benchmark is to show which approach can give which results. and what's really worth the trouble 🙂
Continue reading how to insert data to database – as fast as possible
uwaga – ta metoda jest tylko i wyłącznie dla postgresql'a, gdyż wykorzystuje niestandarodwy typ danych obecny (jako moduł w contribie) jedynie w postgresie.
jak ltree działa nie będę opisywał bo od tego jest manual do ltree.
baza do ltree jest trywialna, przykładowo, oryginalne, testowe drzewo:
zapisujemy tak:
# create table tree_ltree (
id int4 primary key,
path ltree
);
po wstawieniu naszego testowego drzewa uzyskujemy taką zawartość tabelki:
id | path |
---|---|
1 | sql |
2 | sql.postgresql |
3 | sql.oracle |
4 | sql.postgresql.linux |
5 | sql.oracle.solaris |
6 | sql.oracle.linux |
7 | sql.oracle.windows |
8 | sql.oracle.linux.glibc1 |
9 | sql.oracle.linux.glibc2 |
ok. jak się pyta taką bazę?
1. pobranie listy elementów głównych (top-levelowych)
select * from tree_ltree where path ~ '*{1}'
2. pobranie elementu bezpośrednio “nad" podanym elementem:
dane wejściowe:
select p.* from tree_ltree c join tree_ltree p on c.path <@ p.path where c.id = [ID] and c.path ~ cast(p.path::text || '.*{1}' as lquery)
zwracam uwagę, na to iż mając daną ścieżkę do elementu można mu po prostu wyciąć ostatni element (od kropki do końca) i w ten sposób uzyskać od razu ścieżkę do elementu nadrzędnego.
3. pobranie listy elementów bezpośrednio “pod" podanym elementem
dane wejściowe:
select c.* from tree_ltree c join tree_ltree p on c.path <@ p.path where p.id = [ID] and c.path ~ cast(p.path::text || '.*{1}' as lquery);
zwracam uwagę, na to iż mając daną ścieżkę do elementu można mu po prostu dokleić do niej .*{1} i wykonać zapytanie:
select * from tree_ltree where path ~ [ZMODYFIKOWANA_SCIEZKA_PARENTA]
4. pobranie listy wszystkich elementów “nad" danym elementem (wylosowanym)
dane wejściowe:
select p.* from tree_ltree c join tree_ltree p on c.path <@ p.path where c.id = [ID] AND p.id <> [ID]
5. pobranie listy wszystkich elementów “pod" danym elementem (wylosowanym)
dane wejściowe:
select c.* from tree_ltree c join tree_ltree p on c.path <@ p.path where p.id = [ID] AND c.id <> [ID]
6. sprawdzenie czy dany element jest “liściem" (czy ma pod-elementy)
dane wejściowe:
select count(*) from tree_ltree c join tree_ltree p on c.path <@ p.path where p.id = [ID] AND c.id <> [ID]
jeśli zwróci 0 – to jest to liść. w innym przypadku zwróci ilość bezpośrednich “dzieci".
7. pobranie głównego elementu w tej gałęzi drzewa w której znajduje się dany (wylosowany) element
select p.* from tree_ltree c join tree_ltree p on c.path <@ p.path where c.id = [ID] and p.path ~ '*{1}'
jeśli chodzi o zalety – najważniejszą jest szybkość pisania, intuicyjność zapytań, możliwości (indeksowane wyszukiwanie np. elementów 2 poziomy poniżej dowolnego elementu którego nazwa zaczyna się od “dep") i czytelność danych.
wada jest zasadniczo tylko jedna – przenośność. jeśli kiedykolwiek w przyszłości będziecie przenosić bazę na coś innego niż postgres, to macie problem. no tak. tylko po co przenosić bazę na coś innego niż postgres?