One database that I am monitoring uses a lot of stored procedures. Some of them are fast, some of them are not so fast. I thought – is there a sensible way to diagnose which part of stored procedure take the most time?
I mean – I could just put the logic into application, and then every query would have it's own timing in Pg logs, but this is not practical. And I also believe that using stored procedures/functions is way better than using plain SQL due to a number of reasons.
So, I'm back to question – how to check which part of function takes most of the time?
Let's assume we have very simplistic function:
CREATE OR REPLACE FUNCTION testit() RETURNS void AS $$ DECLARE x INT4; y INT4; BEGIN x := 3; y := x * 2; RETURN; END; $$ LANGUAGE plpgsql;
This functions doesn't do anything, but that's my point – I want to measure how long does it take to add profiling, and how fast this profiling is.
Generally, the idea is to put RAISE “something" that will log information to logs, without polluting client application with useless information.
Before I will go to this, let's get a base point – let's run this function – let's say 10000 times, and check time:
# SELECT COUNT(*) FROM ( SELECT testit() FROM generate_series(1,10000) ) AS x; COUNT ------- 10000 (1 ROW) TIME: 24.486 ms
Oops. A bit too fast to be sensibly comparable. Let's add some more iterations:
# SELECT COUNT(*) FROM ( SELECT testit() FROM generate_series(1,1000000) ) AS x; COUNT --------- 1000000 (1 ROW) TIME: 2801.626 ms
Better.
I ran it also 2 more times, and got results:
- 2766.251 ms
- 3153.510 ms
OK, So now, let's add profiling. First I need to find a log level that will go to logs, and not on screen:
# show client_min_messages ; client_min_messages --------------------- notice (1 row) Time: 0.093 ms (depesz@[local]:5850) 11:26:55 [depesz] # show log_min_messages ; log_min_messages ------------------ warning (1 row)
Now, thanks to fine documentation we can see that there is ‘LOG' log-level that will get logged but not shown on screen.
So, knowing this, let's add some profiling. It should be as simple as possible, yet have all necessary information:
CREATE OR REPLACE FUNCTION testit() RETURNS void AS $$ DECLARE x INT4; y INT4; BEGIN RAISE LOG 'PROFILING: testit() Step 1 : %', clock_timestamp(); x := 3; RAISE LOG 'PROFILING: testit() Step 2 : %', clock_timestamp(); y := x * 2; RAISE LOG 'PROFILING: testit() Step 3 : %', clock_timestamp(); RETURN; END; $$ LANGUAGE plpgsql;
As you can see I do log ‘PROFILING:' to be able to quickly grep the logs, name of the function (because I might be profiling more than 1 function at a time, and which step it was on. Plus a very precise timestamp.
After single run, I have in logs:
2010-03-18 11:37:00.003 CET,"depesz","depesz",7949,"[local]",4ba1fdd7.1f0d,7721795,"SELECT",2010-03-18 11:17:59 CET,2/38,0,LOG,00000,"PROFILING: testit() Step 1 : 2010-03-18 11:37:00.003352+01",,,,,,"SELECT count(*) FROM (SELECT testit() from generate_series(1,1000000)) as x;",,,"psql" 2010-03-18 11:37:00.003 CET,"depesz","depesz",7949,"[local]",4ba1fdd7.1f0d,7721796,"SELECT",2010-03-18 11:17:59 CET,2/38,0,LOG,00000,"PROFILING: testit() Step 2 : 2010-03-18 11:37:00.003367+01",,,,,,"SELECT count(*) FROM (SELECT testit() from generate_series(1,1000000)) as x;",,,"psql" 2010-03-18 11:37:00.003 CET,"depesz","depesz",7949,"[local]",4ba1fdd7.1f0d,7721797,"SELECT",2010-03-18 11:17:59 CET,2/38,0,LOG,00000,"PROFILING: testit() Step 3 : 2010-03-18 11:37:00.003383+01",,,,,,"SELECT count(*) FROM (SELECT testit() from generate_series(1,1000000)) as x;",,,"psql"
Which can be simplified to only relevant information:
7949,"PROFILING: testit() Step 1 : 2010-03-18 11:37:00.003352+01" 7949,"PROFILING: testit() Step 2 : 2010-03-18 11:37:00.003367+01" 7949,"PROFILING: testit() Step 3 : 2010-03-18 11:37:00.003383+01"
First column is important, as it is pid of backend process, and we need it so that we will not mix profiling messages from various backends.
So. How much slower it is?
SELECT COUNT(*) FROM ( SELECT testit() FROM generate_series(1,1000000) ) AS x; COUNT --------- 1000000 (1 ROW) TIME: 50819.932 ms
That seems to be a lot in comparison with 3s previously. But is it? Let's thing: within those 51 seconds, for 3 second we were doing “normal stuff, not profiling related". So it leaves with 48 seconds of overhead. 48 seconds for 1 million calls, each call was 3 raise log command – this means that single raise log took 0.016ms!
Now. In our degenerated example – it's a lot. But our function doesn't do anything. Let's try it on something *a bit* more realistic. We'll test 2 functions:
CREATE OR REPLACE FUNCTION insert_word( in_word TEXT ) RETURNS void AS $$ DECLARE use_word_id INT4; BEGIN use_word_id := NEXTVAL('words_id_seq'); INSERT INTO words (id, word) VALUES (use_word_id, in_word); LOOP UPDATE word_counts SET COUNT = COUNT + 1 WHERE word = in_word; IF found THEN RETURN; END IF; BEGIN INSERT INTO word_counts (word, COUNT) VALUES (in_word, 1); RETURN; EXCEPTION WHEN unique_violation THEN END; END LOOP; END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION insert_word2( in_word TEXT ) RETURNS void AS $$ DECLARE use_word_id INT4; BEGIN INSERT INTO words (word) VALUES (in_word) returning id INTO use_word_id; LOOP UPDATE word_counts SET COUNT = COUNT + 1 WHERE word = in_word; IF found THEN RETURN; END IF; BEGIN INSERT INTO word_counts (word, COUNT) VALUES (in_word, 1); RETURN; EXCEPTION WHEN unique_violation THEN END; END LOOP; END; $$ LANGUAGE plpgsql;
These functions relate to 2 tables that will need to be created as well:
CREATE TABLE words ( id serial PRIMARY KEY, word TEXT NOT NULL ); CREATE TABLE word_counts ( word TEXT PRIMARY KEY, COUNT INT4 NOT NULL DEFAULT 0 );
Now. Functionality of these functions is not really relevant. Let's just see how well do they behave without profiling, and then with profiling.
So, I created test table with 72900 words, and run this:
# SELECT COUNT(*) FROM ( SELECT insert_word( x.y) FROM x ) AS q;
Results?
- Time: 4638.691 ms
- Time: 4665.952 ms
- Time: 4667.546 ms
- Time: 4662.410 ms
- Time: 4660.251 ms
OK. Now the second function:
# SELECT COUNT(*) FROM ( SELECT insert_word2( x.y) FROM x ) AS q;
And its results:
- Time: 5023.776 ms
- Time: 4819.380 ms
- Time: 4656.849 ms
- Time: 4631.946 ms
- Time: 4613.677 ms
OK. Now, let's change the functions:
CREATE OR REPLACE FUNCTION insert_word( in_word TEXT ) RETURNS void AS $$ DECLARE use_word_id INT4; BEGIN RAISE LOG 'PROFILING: insert_word() Step 1 : %', clock_timestamp(); use_word_id := NEXTVAL('words_id_seq'); RAISE LOG 'PROFILING: insert_word() Step 2 : %', clock_timestamp(); INSERT INTO words (id, word) VALUES (use_word_id, in_word); RAISE LOG 'PROFILING: insert_word() Step 3 : %', clock_timestamp(); LOOP UPDATE word_counts SET COUNT = COUNT + 1 WHERE word = in_word; IF found THEN RAISE LOG 'PROFILING: insert_word() Step 4 : %', clock_timestamp(); RETURN; END IF; RAISE LOG 'PROFILING: insert_word() Step 5 : %', clock_timestamp(); BEGIN INSERT INTO word_counts (word, COUNT) VALUES (in_word, 1); RAISE LOG 'PROFILING: insert_word() Step 6 : %', clock_timestamp(); RETURN; EXCEPTION WHEN unique_violation THEN RAISE LOG 'PROFILING: insert_word() Step 7 : %', clock_timestamp(); END; END LOOP; END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION insert_word2( in_word TEXT ) RETURNS void AS $$ DECLARE use_word_id INT4; BEGIN RAISE LOG 'PROFILING: insert_word2() Step 1 : %', clock_timestamp(); INSERT INTO words (word) VALUES (in_word) returning id INTO use_word_id; RAISE LOG 'PROFILING: insert_word2() Step 2 : %', clock_timestamp(); LOOP UPDATE word_counts SET COUNT = COUNT + 1 WHERE word = in_word; IF found THEN RAISE LOG 'PROFILING: insert_word2() Step 3 : %', clock_timestamp(); RETURN; END IF; RAISE LOG 'PROFILING: insert_word2() Step 4 : %', clock_timestamp(); BEGIN INSERT INTO word_counts (word, COUNT) VALUES (in_word, 1); RAISE LOG 'PROFILING: insert_word2() Step 5 : %', clock_timestamp(); RETURN; EXCEPTION WHEN unique_violation THEN RAISE LOG 'PROFILING: insert_word2() Step 6 : %', clock_timestamp(); END; END LOOP; END; $$ LANGUAGE plpgsql;
Now. I added 6 and 7 calls to profiling per function. Since each function is called 72900 times, it should make the calls 8 (insert_word) or 7 (insert_word2) seconds slower.
How does it look in reality?
insert_word() * 72900:
- Time: 14767.158 ms
- Time: 12476.824 ms
- Time: 12517.576 ms
- Time: 12237.033 ms
- Time: 12181.778 ms
and insert_word2() * 72900:
- Time: 11001.922 ms
- Time: 11689.947 ms
- Time: 11148.601 ms
- Time: 15607.282 ms
- Time: 17824.019 ms
Quite good – practically on spot with previous calculation. There were 2 problematic runs that took much longer time – 15 and 17 seconds instead of 11, but I think it was related to some side-generated load on my machine.
If this seem to much to you, think about it: Let's assume you have 150ms function. And you add to it 20 profiling lines. Total cost of profiling is 0.3ms! So of course – it is not usable for fast functions, that run in less than 0.1ms, but these do not need profiling anyway.
Generally, I think that 10 profiling steps should be enough for every case. And let's assume that profiling can use up to 3% of function time. This means that we can sensibly profile functions as fast as 5.3ms per call! Which means – that for vast majority of uses – this kind of profiling does not put any measurable load on server.
Now. We did some profiled calls to function. What can we do with it?
First of all – assuming you're logging to csvlog (which I believe is the best option), we'll just load it to PostgreSQL.
I created a simple program that connected to database, and ran all of these 72900 inserts, in parallel (10 words), in random order, choosing randomly which function to use.
This program generated ~ 750MB of logs. But hey – it ran over 700 000 queries, each logging at the very least 3 messages (number of messages logged depends which code path have been chosen).
So. Let's load the logs to Pg. According to docs we just need a simple table, and then we can “COPY" the data to it. Let's try.
some create table, and copy, and 10 minutes later …
# SELECT COUNT(*) FROM postgres_log ; COUNT --------- 3354488 (1 ROW)
Nice.
Now. Let's get some sensible data out of it.
Final query will be a bit complicated, so let's do it step by step. Also – since I have 3.3 million lines in log, when writing intermediary queries, I usually add “limit 100" to it to get the check faster, but I'm showing query without limit to let you understand it better.
First we need to get only the lines that relate to our profiling:
SELECT l.* FROM postgres_log l WHERE l.error_severity = 'LOG' AND l.message LIKE 'PROFILING: %'
Great. Now. Let's just get relevant information from it, and – split it into many columns so that it will be easier to group later:
WITH simplified_splitted AS ( SELECT l.process_id, regexp_split_to_array( l.message, E'\\s+' ) AS message FROM postgres_log l WHERE l.error_severity = 'LOG' AND l.message LIKE 'PROFILING: %' ) SELECT process_id, message[2] AS function_name, message[4] AS step, CAST( message[6] || ' ' || message[7] AS timestamptz ) AS logged_when FROM simplified_splitted
Word of explanation – I used WITH, to do the regexp_split_to_array only once, and then in main query use already-splitted version of message.
Results of this query look like this:
process_id | function_name | step | logged_when ------------+----------------+------+------------------------------- 13219 | insert_word() | 1 | 2010-03-18 12:50:20.985081+01 13219 | insert_word() | 2 | 2010-03-18 12:50:20.985369+01 13219 | insert_word() | 3 | 2010-03-18 12:50:20.985939+01 13219 | insert_word() | 5 | 2010-03-18 12:50:20.986539+01 13219 | insert_word() | 6 | 2010-03-18 12:50:20.98677+01 13219 | insert_word2() | 1 | 2010-03-18 12:50:20.987871+01 13219 | insert_word2() | 2 | 2010-03-18 12:50:20.988092+01 13226 | insert_word() | 1 | 2010-03-18 12:50:20.988143+01 13219 | insert_word2() | 4 | 2010-03-18 12:50:20.988335+01 13226 | insert_word() | 2 | 2010-03-18 12:50:20.988383+01
Brilliant. Now, we can add some window functions to calculate times.
Generally, we need to group the record (well, grouping not as SQL “GROUP BY" but just “treat them like similar"), within group order by logged_when, and then calculate differences between previous row and current. Seems simple?
Let's do it 🙂 First, try to run it on first 20 rows:
WITH simplified_splitted AS ( SELECT l.process_id, regexp_split_to_array( l.message, E'\\s+' ) AS message FROM postgres_log l WHERE l.error_severity = 'LOG' AND l.message LIKE 'PROFILING: %' ), full_splitted AS ( SELECT process_id, message[2] AS function_name, message[4] AS step, CAST( message[6] || ' ' || message[7] AS timestamptz ) AS logged_when FROM simplified_splitted ) SELECT f.*, lag( step ) OVER ( profiling ) AS previous_step, lag( logged_when ) OVER ( profiling ) AS previous_logged_when FROM full_splitted f WINDOW profiling AS ( PARTITION BY f.process_id, f.function_name ORDER BY logged_when ASC ) ORDER BY f.process_id, f.function_name, f.logged_when
Results look like this:
process_id | function_name | step | logged_when | previous_step | previous_logged_when ------------+----------------+------+-------------------------------+---------------+------------------------------- 13219 | insert_word() | 1 | 2010-03-18 12:50:20.985081+01 | [null] | [null] 13219 | insert_word() | 2 | 2010-03-18 12:50:20.985369+01 | 1 | 2010-03-18 12:50:20.985081+01 13219 | insert_word() | 3 | 2010-03-18 12:50:20.985939+01 | 2 | 2010-03-18 12:50:20.985369+01 13219 | insert_word() | 5 | 2010-03-18 12:50:20.986539+01 | 3 | 2010-03-18 12:50:20.985939+01 13219 | insert_word() | 6 | 2010-03-18 12:50:20.98677+01 | 5 | 2010-03-18 12:50:20.986539+01 13219 | insert_word2() | 1 | 2010-03-18 12:50:20.987871+01 | [null] | [null] 13219 | insert_word2() | 2 | 2010-03-18 12:50:20.988092+01 | 1 | 2010-03-18 12:50:20.987871+01 13219 | insert_word2() | 4 | 2010-03-18 12:50:20.988335+01 | 2 | 2010-03-18 12:50:20.988092+01 13219 | insert_word2() | 5 | 2010-03-18 12:50:20.988464+01 | 4 | 2010-03-18 12:50:20.988335+01 13219 | insert_word2() | 1 | 2010-03-18 12:50:20.988905+01 | 5 | 2010-03-18 12:50:20.988464+01 13219 | insert_word2() | 2 | 2010-03-18 12:50:20.988981+01 | 1 | 2010-03-18 12:50:20.988905+01 13219 | insert_word2() | 4 | 2010-03-18 12:50:20.989045+01 | 2 | 2010-03-18 12:50:20.988981+01 13219 | insert_word2() | 5 | 2010-03-18 12:50:20.989108+01 | 4 | 2010-03-18 12:50:20.989045+01 13219 | insert_word2() | 1 | 2010-03-18 12:50:20.989436+01 | 5 | 2010-03-18 12:50:20.989108+01 13219 | insert_word2() | 2 | 2010-03-18 12:50:20.98951+01 | 1 | 2010-03-18 12:50:20.989436+01 13226 | insert_word() | 1 | 2010-03-18 12:50:20.988143+01 | [null] | [null] 13226 | insert_word() | 2 | 2010-03-18 12:50:20.988383+01 | 1 | 2010-03-18 12:50:20.988143+01 13226 | insert_word() | 3 | 2010-03-18 12:50:20.988747+01 | 2 | 2010-03-18 12:50:20.988383+01 13226 | insert_word() | 5 | 2010-03-18 12:50:20.989298+01 | 3 | 2010-03-18 12:50:20.988747+01 13226 | insert_word() | 6 | 2010-03-18 12:50:20.989448+01 | 5 | 2010-03-18 12:50:20.989298+01
Nice. Couple of notes – if current “step" is 1 – there is no point in calculating time difference because previous step in this backend, from the same function will be from different call to this function.
So, we will need to filter out rows with “step = 1" (but this has to be done in final query, not the ones that we have now in “WITH".
And besides this – let's just calculate time differences, and which step it really was:
WITH simplified_splitted AS ( SELECT l.process_id, regexp_split_to_array( l.message, E'\\s+' ) AS message FROM postgres_log l WHERE l.error_severity = 'LOG' AND l.message LIKE 'PROFILING: %' ), full_splitted AS ( SELECT process_id, message[2] AS function_name, message[4] AS step, CAST( message[6] || ' ' || message[7] AS timestamptz ) AS logged_when FROM simplified_splitted LIMIT 20 ), lagged_info AS ( SELECT f.*, lag( step ) OVER ( profiling ) AS previous_step, lag( logged_when ) OVER ( profiling ) AS previous_logged_when FROM full_splitted f WINDOW profiling AS ( PARTITION BY f.process_id, f.function_name ORDER BY logged_when ASC ) ) SELECT i.function_name, i.previous_step || ' -> ' || i.step, i.logged_when - i.previous_logged_when FROM lagged_info i WHERE i.step <> '1'
Results from some random 20 profiling rows look like this:
function_name | ?column? | ?column? ----------------+----------+----------------- insert_word() | [null] | [null] insert_word() | 2 -> 3 | 00:00:00.000071 insert_word() | 3 -> 4 | 00:00:00.000087 insert_word() | 1 -> 2 | 00:00:00.000042 insert_word() | 2 -> 3 | 00:00:00.000061 insert_word() | 3 -> 4 | 00:00:00.000067 insert_word2() | 1 -> 2 | 00:00:00.000073 insert_word2() | 2 -> 3 | 00:00:00.000108 insert_word2() | 1 -> 2 | 00:00:00.000075 insert_word2() | 2 -> 3 | 00:00:00.000204 insert_word() | 1 -> 2 | 00:00:00.000045 insert_word() | 2 -> 3 | 00:00:00.000056 insert_word() | 3 -> 4 | 00:00:00.000086 insert_word2() | 1 -> 2 | 00:00:00.000079 insert_word2() | 2 -> 3 | 00:00:00.000104 (15 rows)
So now, we can add full grouping, and present full statistics:
WITH simplified_splitted AS ( SELECT l.process_id, regexp_split_to_array( l.message, E'\\s+' ) AS message FROM postgres_log l WHERE l.error_severity = 'LOG' AND l.message LIKE 'PROFILING: %' ), full_splitted AS ( SELECT process_id, message[2] AS function_name, message[4] AS step, CAST( message[6] || ' ' || message[7] AS timestamptz ) AS logged_when FROM simplified_splitted ), lagged_info AS ( SELECT f.*, lag( step ) OVER ( profiling ) AS previous_step, lag( logged_when ) OVER ( profiling ) AS previous_logged_when FROM full_splitted f WINDOW profiling AS ( PARTITION BY f.process_id, f.function_name ORDER BY logged_when ASC ) ) SELECT i.function_name, i.previous_step || ' -> ' || i.step AS which_step, SUM( i.logged_when - i.previous_logged_when ) AS total_time, COUNT(*) AS total_count FROM lagged_info i WHERE i.step <> '1' GROUP BY function_name, which_step ORDER BY total_time DESC
Results (this time from all 3m rows) look like this:
function_name | which_step | total_time | total_count ----------------+------------+-----------------+------------- insert_word2() | 2 -> 3 | 00:00:34.921153 | 327246 insert_word() | 3 -> 4 | 00:00:34.18836 | 328851 insert_word2() | 1 -> 2 | 00:00:30.443333 | 363517 insert_word() | 2 -> 3 | 00:00:24.551921 | 365483 insert_word() | 1 -> 2 | 00:00:18.136481 | 365483 insert_word2() | 4 -> 5 | 00:00:02.722998 | 36270 insert_word2() | 2 -> 4 | 00:00:02.699597 | 36271 insert_word() | 5 -> 6 | 00:00:02.697111 | 36630 insert_word() | 3 -> 5 | 00:00:02.601562 | 36632 insert_word() | 5 -> 7 | 00:00:00.00527 | 2 insert_word2() | 4 -> 6 | 00:00:00.000399 | 1 insert_word() | 7 -> 4 | 00:00:00.000154 | 2 insert_word2() | 6 -> 3 | 00:00:00.000103 | 1 (13 rows)
Looks pretty sweet to me. Of course results are obvious, and there is not really anything I can do to make it faster with these functions, but it shows that the method really works.
Thank you for this (and all your articles)! It allowed me to check and optimize a slow function (and revealed some really stupid thinking on my part) from 12 seconds to less than one!
The edb-debugger module on pgfoundry also contains an excellent plpgsql profiling tool, which allows you to gather similar data (plus some). It’s less invasive (no code changes required) but you need to explicitly turn it on/off in each session to be traced.
I’m using it a bit to discover potential improvements in my plpgsql. By day I’m an Oracle developer, but I’ve found the postgres offering to be easier to use and more useful than Oracle’s dbms_profiler.
ever tried track_functions in postgresql 8.4?
@ASH:
sure. not sure though how’s that relevant. track_functions tracks how many times function was called. This blogpost is about checking how much time parts of function take.
Depesz WTF, this blog layout is awful. Bring back the old one.
@TOM:
no way. 2 people complained about new layout (you and metys), and all other opinions i got were: “YAY, it’s so much better than previous”.
@depesz make that 3. At least people could choose the layout before.
@Matthew:
heh, perhaps I should add a poll – “Do you like new layout” 🙂
But since, you made it to 3 – theme chooser is back 🙂
Yay! “AS 400” selected here.
That windowing stuff is a mind bender. How about this instead?
mark := clock_timestamp();
— stuff
RAISE LOG ‘Step 1 : %’, clock_timestamp() – mark;
@Brian:
Sure, it’s possible, but then you’d have to add another mark := clock_timestamp(); *after* each raise log.
Plus – you have to do more stuff on runtime (more calls to clock_timestamp, substraction), which I wanted to avoid to make the overhead as small as possible.