During last months I wrote some tools to help me with my daily duties.
I'd like to let you know you about them, as you might find them useful.
So, here we go:
pg.logs.tail
Available from OmniTI SVN.
It's a smarter “tail -f" for PostgreSQL logs. Smarter in a way, that it knows that PostgreSQL rotates its log file, so if it will see that the rotation happened, it will switch to next file.
To run the program you just do:
pg.logs.tail -d /var/log/postgresql
Or whenever you have the logs in.
If you are in logs directory, just run pg.logs.tail without options, and it should work.
Aside from -d option, there are only 2 more:
- -v – prints (to STDERR not to mess with data when grepping) whenever it switches to new file
- -f – regexp that is used to find log files – it defaults to sane postgresql-.*\\.log so generally you don't need to change it
pg.grep
Available from OmniTI SVN.
This is grep. But (also) smarter.
For starters – it understands that given “message" in log can span over multiple lines.
Also – it understands that some log messages cause other messages to be shown – for example, when you have exception/error, usually there is “STATEMENT:" log line, which is connected to the error before it.
To do it's “parsing" pg.grep needs to know what is the log_line_prefix that you're using in Pg – otherwise it wouldn't be able to tell whether given line in log is continuation of message from previous line, or a new message altogether.
So, let's see how it works. My log_line_prefix is “%m %u@%d %p %r “, and I just ran query:
$ SELECT 1 >> + >> 2 >> - >> 123; ?COLUMN? ────────── -120 (1 ROW)
In log, I have:
2012-01-22 17:27:54.929 CET [UNKNOWN]@[UNKNOWN] 17743 LOG: connection received: host=[LOCAL] 2012-01-22 17:27:54.930 CET depesz@depesz 17743 [LOCAL] LOG: connection authorized: USER=depesz DATABASE=depesz 2012-01-22 17:27:57.447 CET depesz@depesz 17743 [LOCAL] LOG: duration: 0.603 ms statement: SELECT 1 + 2 - 123; 2012-01-22 17:28:04.924 CET depesz@depesz 17743 [LOCAL] LOG: disconnection: SESSION TIME: 0:00:09.995 USER=depesz DATABASE=depesz host=[LOCAL] 2012-01-22 17:28:05.225 CET [UNKNOWN]@[UNKNOWN] 17778 LOG: connection received: host=[LOCAL] 2012-01-22 17:28:05.226 CET depesz@depesz 17778 [LOCAL] LOG: connection authorized: USER=depesz DATABASE=depesz
If I'd just grep it for “123" I would get:
=$ tail postgresql-2012-01-22_164637.log | grep 123 123;
not so helpful. But with pg.grep:
=$ tail postgresql-2012-01-22_164637.log | ~/pg.grep -p "%m %u@%d %p %r " -m 123 2012-01-22 17:27:57.447 CET depesz@depesz 17743 [LOCAL] LOG: duration: 0.603 ms statement: SELECT 1 + 2 - 123;
Now, let's consider another example:
$ SELECT 1 + >> version(); ERROR: operator does NOT exist: INTEGER + text LINE 1: SELECT 1 + ^ HINT: No operator matches the given name AND argument TYPE(s). You might need TO ADD explicit TYPE casts.
normal grep for ERRORS would show:
=$ cat postgresql-2012-01-22_173247.log | grep ERROR 2012-01-22 17:32:53.188 CET depesz@depesz 17778 [local] ERROR: operator does not exist: integer + text at character 10
but pg.grep for errors shows:
=$ cat postgresql-2012-01-22_173247.log | ~/pg.grep -p "%m %u@%d %p %r " -m ERROR 2012-01-22 17:32:53.188 CET depesz@depesz 17778 [LOCAL] ERROR: operator does NOT exist: INTEGER + text at CHARACTER 10 2012-01-22 17:32:53.188 CET depesz@depesz 17778 [LOCAL] HINT: No operator matches the given name AND argument TYPE(s). You might need TO ADD explicit TYPE casts. 2012-01-22 17:32:53.188 CET depesz@depesz 17778 [LOCAL] STATEMENT: SELECT 1 + version();
In case you'd really need just the ERROR line, you can do so by using -l (–line) option:
=$ cat postgresql-2012-01-22_173247.log | ~/pg.grep -p "%m %u@%d %p %r " -m ERROR -l 2012-01-22 17:32:53.188 CET depesz@depesz 17778 [LOCAL] ERROR: operator does NOT exist: INTEGER + text at CHARACTER 10
Aside from -p (prefix), -l (line) and -m (match) pg.grep supports also:
- -s – skip – reverse match. -s “operator does not exist" would skip messages containing this error
- -M – case sensitive -m
- -S – case sensitive -s
Of course you can use many -m/-s/-M/-S, for example like this:
=$ cat postgresql-2012-01-22_1* | ~/pg.grep -p "%m %u@%d %p %r " -m ERROR:.*syntax -l -s integer 2012-01-22 17:32:37.787 CET depesz@depesz 17778 [local] ERROR: syntax error at or near ";" at character 11
Which would show all ERRORS related to syntax, except the ones that are related to integers (like: invalid input syntax for integer: “asbc" at character 12)
fast dump and restore
Available from OmniTI SVN.
This are tools that can be used to make fast, but somehow unsafe dump of database, and restore data from the dump.
The general idea was presented by me some time ago, and used, with success, for one of clients.
The scripts that were written then, got generalized, rewritten, and are now available for you.
The “unsafe" part is very simple – to make the dump fast, the script does the work in parallel. By working in parallel it has multiple transactions open, and as such, it cannot have any guarantees (from PostgreSQL) about consistency of the data between the transactions.
So – you can easily use it, as long as there are no other (than fast.dump) connections to database – that is writing connections. As long as other connections are only reading – you're fine. But any write to database can, and most likely will, bite you when you'll work with restore.
To use the tools, just run them with –help. And always first try on some test/devel machine.
system_monitoring
Available from OmniTI SVN.
This tool is not strictly for PostgreSQL, but I use it on virtually any Pg server I have to look after, and I found it very useful.
You most likely all know tools like Cacti, Ganglia, Munin and so on.
system.monitoring is similar in the way that it gets data from system. But it's totally different in a way that it doesn't make any graphs. It just stores the data.
But, addition of new data to be stored is trivial.
Most generally system_monitoring.pl is simply job scheduler – like cron. But, unlike cron, it is not limited to 1 minute as the shortest interval. It can easily run jobs every second. Or run jobs that never end.
It will run every command you give it to run, and log its output. That's all. But since it's small, and the logged format is normal plain text, it became extremely useful for me.
Let's take a look at sample config.
At the beginning I set some variables – most importantly – where the logs will go.
Then I set environment variables that will be used for all “checks" (it can be overridden for every check). And then I set some additional variables which make running checks a bit nicer.
Then, I have the actual checks.
There are three kinds of them: persistent (command), periodic command, and periodic file.
persistent check example looks like this:
check.iostat.type=persistent check.iostat.exec=iostat -kx 5
On start, system_monitoring.pl will run iostat -kx 5, and then will log everything it will output, when it will output, to the iostat logfile.
Periodic check can looks like this:
check.ps.type=periodic check.ps.exec=ps axo user,pid,ppid,pgrp,%cpu,%mem,rss,lstart,nice,nlwp,sgi_p,cputime,tty,wchan:25,args check.ps.interval=30
This will run, every 30 seconds, this ps command, and log output.
If first character of exec for check is “<" the rest of exec line is assumed to be file that should be opened (to read), and it's content be copied to log for given check. Hence:
check.mem.type=periodic check.mem.exec=< /proc/meminfo check.mem.interval=10
Will log meminfo every 10 seconds to logs.
Logs are stored in tree structure which is based on when even happened. Exact filename for check is:
LOGDIR/YYYY/MM/DD/CHECK_NAME-HOUR.log
So, with default logdir, current meminfo log is:
/var/log/system_monitoring/2012/01/22/mem-2012-01-22-16.log
It looks like this:
2012-01-22 16:00:02 UTC BSGMA MemTotal: 528655576 kB 2012-01-22 16:00:02 UTC BSGMA MemFree: 14354712 kB 2012-01-22 16:00:02 UTC BSGMA Buffers: 1099748 kB 2012-01-22 16:00:02 UTC BSGMA Cached: 465002160 kB 2012-01-22 16:00:02 UTC BSGMA SwapCached: 5304 kB 2012-01-22 16:00:02 UTC BSGMA Active: 426886732 kB 2012-01-22 16:00:02 UTC BSGMA Inactive: 55076184 kB 2012-01-22 16:00:02 UTC BSGMA HighTotal: 0 kB 2012-01-22 16:00:02 UTC BSGMA HighFree: 0 kB 2012-01-22 16:00:02 UTC BSGMA LowTotal: 528655576 kB 2012-01-22 16:00:02 UTC BSGMA LowFree: 14354712 kB 2012-01-22 16:00:02 UTC BSGMA SwapTotal: 2048276 kB 2012-01-22 16:00:02 UTC BSGMA SwapFree: 1902552 kB 2012-01-22 16:00:02 UTC BSGMA Dirty: 5776 kB 2012-01-22 16:00:02 UTC BSGMA Writeback: 236 kB 2012-01-22 16:00:02 UTC BSGMA AnonPages: 15869780 kB 2012-01-22 16:00:02 UTC BSGMA Mapped: 107046492 kB 2012-01-22 16:00:02 UTC BSGMA Slab: 10872504 kB 2012-01-22 16:00:02 UTC BSGMA PageTables: 20694796 kB 2012-01-22 16:00:02 UTC BSGMA NFS_Unstable: 0 kB 2012-01-22 16:00:02 UTC BSGMA Bounce: 0 kB 2012-01-22 16:00:02 UTC BSGMA CommitLimit: 266376064 kB 2012-01-22 16:00:02 UTC BSGMA Committed_AS: 158592164 kB 2012-01-22 16:00:02 UTC BSGMA VmallocTotal: 34359738367 kB 2012-01-22 16:00:02 UTC BSGMA VmallocUsed: 271152 kB 2012-01-22 16:00:02 UTC BSGMA VmallocChunk: 34359466935 kB 2012-01-22 16:00:02 UTC BSGMA HugePages_Total: 0 2012-01-22 16:00:02 UTC BSGMA HugePages_Free: 0 2012-01-22 16:00:02 UTC BSGMA HugePages_Rsvd: 0 2012-01-22 16:00:02 UTC BSGMA Hugepagesize: 2048 kB 2012-01-22 16:00:12 UTC BSGPA MemTotal: 528655576 kB
And so on.
As you can see nothing really complicated.
There are 2 more features. One of them are checks that are configured so that their output is ignored and not logged – this is for cleanup jobs – check last three in example config.
The other feature are headers.
For example, let's look at pg_stat_database check. The way it's written, it logs data like:
$ tail pg_stat_database-2012-01-22-15.log 2012-01-22 15:59:06 UTC BSF9w template1 10 6 t t -1 11510 15545653 1663 \N {=c/postgres,postgres=CTc/postgres} 4689516 2012-01-22 15:59:06 UTC BSF9w postgres 10 6 f t -1 11510 15253018 1663 \N \N 42800983660 2012-01-22 15:59:06 UTC BSF9w pg_audit 10 6 f t -1 11510 15252690 1663 \N \N 4779628 2012-01-22 15:59:06 UTC BSF9w template0 10 6 t f -1 11510 3886088170 1663 \N {=c/postgres,postgres=CTc/postgres} 4583020 2012-01-22 15:59:06 UTC BSF9w magicdb 10 6 f t -1 11510 3922373574 1663 {"search_path=public, check_postgres, ltree, pgcrypto"} \N 716819345004 2012-01-22 15:59:37 UTC BSGFA template1 10 6 t t -1 11510 15545653 1663 \N {=c/postgres,postgres=CTc/postgres} 4689516 2012-01-22 15:59:37 UTC BSGFA postgres 10 6 f t -1 11510 15253018 1663 \N \N 42800983660 2012-01-22 15:59:37 UTC BSGFA pg_audit 10 6 f t -1 11510 15252690 1663 \N \N 4779628 2012-01-22 15:59:37 UTC BSGFA template0 10 6 t f -1 11510 3886088170 1663 \N {=c/postgres,postgres=CTc/postgres} 4583020 2012-01-22 15:59:37 UTC BSGFA magicdb 10 6 f t -1 11510 3922373574 1663 {"search_path=public, check_postgres, ltree, pgcrypto"} \N 716820033132
which is nice, but it would be cool to be able to know what which columns mean.
So we have “check.pg_stat_database.header" line, which makes system_montoring, to log, whenever it rotates logfile (i.e. every hour) to first log header:
$ head -n 1 pg_stat_database-2012-01-22-15.log 2012-01-22 15:00:01 UTC :h datname datdba encoding datistemplate datallowconn datconnlimit datlastsysoid datfrozenxid dattablespace datconfig datacl size
And that's about it. The one additional feature is that you can run system_monitoring and request it to show you data for given check for given time (or closest possible time after):
=$ ./system_monitoring.pl -s loadavg -t '2012-01-22 18:04:00' config-sample.cfg 2012-01-22 18:04:00 CET Kg 0.01 0.10 0.12 1/575 20320
And that's all. But thanks to its simplicity it's trivial to use the data it logs for some graphing tool, or just read it.
And that would be all for now. If you'll use any of these, have fun – they are free to use, free to modify. Sending me back patches is a nice touch, but I don't require it.
Thanks for sharing your tools. I’m looking forward to use them. Cheers!
If you’re on a system with the GNU version of tail, try:
tail -F /var/log/whatever
No need to write a new version of tail.
@yDNA:
Sure, I know about tail -F. But this only helps when file gets truncated and/or renamed.
But Pg changes name of logfile. I.e. it can switch from postgresql-2012-01-20.log to postgresql-2012-01-21.log Or anything else – based of “log_filename” GUC.
I actually use runit to start PostgreSQL and have everything log to stdout. This eliminates the need for PostgreSQL log rotation. This is another way to do it and keep PostgreSQL’s environment clean.
#postgresql/run
#!/bin/sh —
exec 2>&1
exec chpst -u postgres /usr/local/bin/postgres -D /pgdata/data
#postgresl/log/run
#!/bin/sh —
exec 2>&1
exec chpst -ulog svlogd ./main
tail -F postgresql/log/main/current
@Sean:
I wouldn’t really use unrotated logs – having hundreds of gigabytes in single file gets tedious pretty fast.
@despez:
svlogd continuously reads log data from its standard input, optionally filters log messages, and writes the data to one or more automatically rotated logs.
http://smarden.org/runit/svlogd.8.html
The current log file is always named current, so tail -F works perfectly as @Sean described.
Have the tools been moved or is there a problem with the web server?
@Matthew:
temporary server problems. Should be solved soon(ish).
I found a minor problem with pg.grep. It was showing me everything in my log files. After further investigation, I realized it was due to pg.grep not knowing what to do with %a in our log_line_prefix.
I used the following to the end of the %re hash in the prefix_to_re subroutine (YMMV):
‘a’ => ‘(?:[a-z0-9_]*|\[unknown\])’,
Don’t try copy and paste… The comment form converted all of my single ticks into MS Word style single quotes.
@Matthew:
Committed new version, which has support for (almost) all % sequences that might be in log_line_prefix. Made the %a differently, but it should work fine, please test.
Very nice. Thank you very much!
Does pg.grep work on CSV format database log? I tried but it seems it can not:
cat postgresql.log | pg.grep -p “%m,%u,%d,%p,%r,%c,%l,%i,%s,%v,%x”
@Pius:
no, it doesn’t. You can grep these at ease with normal grep. pg.grep “cool” thing is that it allows you to grep log lines even if they contain multiple “real” lines. csv logs don’t have that problem, afair.
Hi Depsez,
Too bad :(, multiple lines still exists in csv log:
2014-06-09 16:24:24.878 PDT,”postgres”,”postgres”,9119,”[local]”,53964228.239f,2,”authentication”,2014-06-09 16:24:24 PDT,2/71947,0,LOG,00000,”connection authorized: user=postgres database=postgres”,,,,,,,,,””
2014-06-09 16:24:25.004 PDT,”postgres”,”postgres”,9119,”[local]”,53964228.239f,3,”idle”,2014-06-09 16:24:24 PDT,2/71948,0,LOG,00000,”statement: SELECT d.datname as “”Name””,
pg_catalog.pg_get_userbyid(d.datdba) as “”Owner””,
pg_catalog.pg_encoding_to_char(d.encoding) as “”Encoding””,
d.datcollate as “”Collate””,
d.datctype as “”Ctype””,
pg_catalog.array_to_string(d.datacl, E’\n’) AS “”Access privileges””
FROM pg_catalog.pg_database d
ORDER BY 1;”,,,,,,,,,”psql.bin”
2014-06-09 16:24:25.284 PDT,”postgres”,”postgres”,9119,”[local]”,53964228.239f,4,”SELECT”,2014-06-09 16:24:24 PDT,2/0,0,LOG,00000,”duration: 314.140 ms”,,,,,,,,,”psql.bin”
2014-06-09 16:24:25.285 PDT,”postgres”,”postgres”,9119,”[local]”,53964228.239f,5,”idle”,2014-06-09 16:24:24 PDT,,0,LOG,00000,”disconnection: session time: 0:00:00.462 user=postgres database=postgres host=[local]”,,,,,,,,,”psql.bin”