If you ever encountered “idle in transaction" connections, you most likely hate them. I know, I personally hate them. They interfere with most of “cool toys" like replication, vacuum, DDL queries.
So, when I saw them on a database I was looking on, I decided to act.
Easier to say, difficult to do. How to fix the problem?
Of course, I could kill all postgres processes that are in “idle in transaction" state, but this is not safe, and definitely it's not optimal solution.
The best option would be to fix these parts of code that generate idle in transaction connections. But how?
First, let's ask a simple question: what exactly is “idle in transaction" connection?
It's pretty simple: it is postgres backend process, handling connection, which started transaction, perhaps did something within this transaction, but now it's not doing anything, and the transaction is still open, waiting for COMMIT or ROLLBACK.
Checking pg_stat_activity will not tell us anything – the connection is idling. Of course, we will know what host/post the connection is from, what database it uses, and what user is logged. But in most of the cases it's not enough. The idle in transaction connections come from application servers, and despite the fact that I know which application caused them, I don't know which part of the application did it.
Luckily, with a bit of Perl hacking, we can get some information.
First, let's setup PostgreSQL, so it will log all queries with exact timestamps and query duration. For example using these settings:
name | setting ----------------------------+----------------- log_line_prefix | %m %u@%d %p %r log_min_duration_statement | 0
Logs from this look like this:
2008-08-28 16:58:31.978 CEST some_user@some_database 26001 10.3.5.1(45094) LOG: duration: 1.387 ms statement: SELECT ...
Interesting parts are timestamp at the beginning of like, and query duration.
First thing – the timestamp is from the end of query execution. Since it is ‘2008-08-28 16:58:31.978', and query took a bit over 1ms, we can assume the query started at ‘2008-08-28 16:58:31.977'.
This gives us both start and end of query.
Doing analysis of whole log, we can find out what are the time differences between end of one query and start of another.
So, I wrote a program that analyzes logs, and prints interesting information.
This is how it works:
=> zcat postgresql-2008-08-28_165831.log.gz | ./find-iit-connections.pl 2>/dev/null 162ms @ 2008-08-28 16:58:55.795 CEST (pid: 8570), queries: - begin - some_query - other_query - another - the query *after* long idle in transaction
It searches for all idle-in-transactions periods that are longer than 100ms – but this can be easily modified in this line:
my $MINIMAL_IDLE_TO_REPORT = 100;
Output shows how long the idle in transaction was (162ms), at what time it ended, what is the pid of postgres backend which was in this state, and list of all queries (from beginning of transaction) that lead to this state.
Having this information finding responsible part of the code can be much easier.
Would it be easier to load CSV logs (bottom of this page: http://www.postgresql.org/docs/current/static/runtime-config-logging.html) into an analysis database and then use SQL operations on them?
Cheers,
David.
@David Fetter:
definitely. The problem though it, that this particular database I had in mind while writing this, works on 8.2, and CSV logs were added in 8.3 (as far as I recall).
On some systems it can be quite scary to turn on statement logging for your applications. I sometimes use dtrace to get around this, but when hunting IIT issues, you have to be pretty lucky to catch all the right information. One thing you can do with your original query however, is to also look in pg_locks, to see what relations the IIT has locks against. Along with user/host/app information, this extra piece of info is often enough to narrow down which part of the application is coming from without having to log all statements.
Hey Depesz,
another excellent blogpost. Thank you!
For those who run debian you need to install libdatetime-format-http-perl module for depesz script to work.
Thomas
Hi Depesz,
I refer most of your articles and explore my learning to faslty. Thanks for that.
Coming to ‘IDLE in transaction’, I have tried with log_line_prefix=’%m-%p-%u-%d-%i’, which will pull complete query by grepping ‘IDLE in transaction’.
My log looks like this:
20078-postgres-postgres-BEGIN LOG: duration: 0.140 ms
20078-postgres-postgres-idle in transaction LOG: statement: insert into abc VALUES (3);
20078-postgres-postgres-INSERT LOG: duration: 0.364 ms
Correct me, if am missing anything here.
–Raghav
@Raghav:
your log_line_prefix seems *not* matching the log you showed – there is no information about time, for example.
but please note that “IDLE in transaction” will *never* be logged in query logs, because it is not a query.
Sorry for late response. I was under impression that the reply would come to my gmail account, but it didnt shown up in my inbox.
postgres=# show log_line_prefix;
log_line_prefix
—————–
%m-%p-%u-%d-%i
(1 row)
Here is the logs showed on one of my transaction block.
2012-06-02 19:19:42.004 PDT-894-postgres-postgres-idleLOG: statement: begin;
2012-06-02 19:20:07.156 PDT-894-postgres-postgres-idle in transactionLOG: statement: insert into test VALUES (10,’Am going IDLE’);
Also, one question, is there any way to log all DML’s before execution, because all the logs are shown after execution not the before.
–Raghav
@Raghav:
logging enabled by “log_statement” is done before execution. You can verify it by doing “select now(), pg_sleep(60)” and checking logs.
Got it. Thank you very much Depesz..!!
How can i use this script on windows? I have installed strawberry perl and try:
perl find-iit-connections.pl
and input postgresql log here.
How can i close stdin in perl console?
@Alex:
sorry, no idea about windows – don’t use them at all.
But generally – pipes should work just as on Unix, so
should work.
@depesz
I have tried
But all output i have get:
######################################################################
Should you post exmaple log, which works with your script?
David Fetter wrote that it would be easier to load CSV logs.
What advantages i can get with loading csv logs into db in comparison with doing something like this:
thanks for doing this