On 17th of January 2022, Michael Paquier committed patch:
Introduce log_destination=jsonlog "jsonlog" is a new value that can be added to log_destination to provide logs in the JSON format, with its output written to a file, making it the third type of destination of this kind, after "stderr" and "csvlog". The format is convenient to feed logs to other applications. There is also a plugin external to core that provided this feature using the hook in elog.c, but this had to overwrite the output of "stderr" to work, so being able to do both at the same time was not possible. The files generated by this log format are suffixed with ".json", and use the same rotation policies as the other two formats depending on the backend configuration. This takes advantage of the refactoring work done previously in ac7c807, bed6ed3, 8b76f89 and 2d77d83 for the backend parts, and 72b76f7 for the TAP tests, making the addition of any new file-based format rather straight-forward. The documentation is updated to list all the keys and the values that can exist in this new format. pg_current_logfile() also required a refresh for the new option. Author: Sehrope Sarkuni, Michael Paquier Reviewed-by: Nathan Bossart, Justin Pryzby Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com
This is VERY, VERY cool.
I don't like usage of JSON in database. In my experience in IRC and Slack, ~ 95% of cases are abuses of JSON for things that should be rewritten.
BUT – outside of DB, for things like integration data, it is actually pretty good.
We already have csvlog, but apparently not all csv parsing libraries understand the same things as CSV (multi-line values!).
So, I'm very enthusiastic about *NOT* having to find specialized tool that handles csvs in the same way that PostgreSQL makes them (which is, obviously, the correct way :).
Let's see how that works.
I changed (in config file) just two parameters:
- logging_collector needs to be set to on
- log_destination needs to be set to jsonlog
Other log-related settings are:
$ SELECT name, setting FROM pg_settings WHERE name ~ '^log' ORDER BY 1; name │ setting ───────────────────────────────────┼──────────────────────────────── log_autovacuum_min_duration │ 600000 log_checkpoints │ ON log_connections │ ON log_destination │ jsonlog log_directory │ log log_disconnections │ ON log_duration │ off log_error_verbosity │ DEFAULT log_executor_stats │ off log_file_mode │ 0600 log_filename │ postgresql-%Y-%m-%d_%H%M%S.log logging_collector │ ON log_hostname │ off logical_decoding_work_mem │ 65536 log_line_prefix │ %m %u@%d %p %r log_lock_waits │ ON log_min_duration_sample │ -1 log_min_duration_statement │ 0 log_min_error_statement │ error log_min_messages │ warning log_parameter_max_length │ -1 log_parameter_max_length_on_error │ 0 log_parser_stats │ off log_planner_stats │ off log_recovery_conflict_waits │ off log_replication_commands │ off log_rotation_age │ 1440 log_rotation_size │ 102400 log_startup_progress_interval │ 10000 log_statement │ NONE log_statement_sample_rate │ 1 log_statement_stats │ off log_temp_files │ 0 log_timezone │ Europe/Warsaw log_transaction_sample_rate │ 0 log_truncate_on_rotation │ off (36 ROWS)
After restart I noticed in log/ directory that I have new file named *.json:
=$ ls -l log/ total 192 -rw------- 1 pgdba pgdba 178194 Jan 17 12:23 postgresql-2022-01-17_121022.log -rw------- 1 pgdba pgdba 11240 Jan 17 12:26 postgresql-2022-01-17_122308.json -rw------- 1 pgdba pgdba 175 Jan 17 12:23 postgresql-2022-01-17_122308.log
It's content it not really easy to read directly:
=$ head -n 5 log/postgresql-2022-01-17_122308.json {"timestamp":"2022-01-17 12:23:08.437 CET","pid":1560406,"session_id":"61e5519c.17cf56","line_num":1,"session_start":"2022-01-17 12:23:08 CET","txid":0,"error_severity":"LOG","message":"ending log output to stderr","hint":"Future log output will go to log destination \"jsonlog\".","backend_type":"postmaster","query_id":0} {"timestamp":"2022-01-17 12:23:08.437 CET","pid":1560406,"session_id":"61e5519c.17cf56","line_num":2,"session_start":"2022-01-17 12:23:08 CET","txid":0,"error_severity":"LOG","message":"starting PostgreSQL 15devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, 64-bit","backend_type":"postmaster","query_id":0} {"timestamp":"2022-01-17 12:23:08.437 CET","pid":1560406,"session_id":"61e5519c.17cf56","line_num":3,"session_start":"2022-01-17 12:23:08 CET","txid":0,"error_severity":"LOG","message":"listening on IPv4 address \"0.0.0.0\", port 5430","backend_type":"postmaster","query_id":0} {"timestamp":"2022-01-17 12:23:08.437 CET","pid":1560406,"session_id":"61e5519c.17cf56","line_num":4,"session_start":"2022-01-17 12:23:08 CET","txid":0,"error_severity":"LOG","message":"listening on IPv6 address \"::\", port 5430","backend_type":"postmaster","query_id":0} {"timestamp":"2022-01-17 12:23:08.444 CET","pid":1560406,"session_id":"61e5519c.17cf56","line_num":5,"session_start":"2022-01-17 12:23:08 CET","txid":0,"error_severity":"LOG","message":"listening on Unix socket \"/tmp/.s.PGSQL.5430\"","backend_type":"postmaster","query_id":0}
But luckily we have tools like jq:
=$ tail -n2 log/postgresql-2022-01-17_122308.json | jq . { "timestamp": "2022-01-17 12:28:08.549 CET", "pid": 1560408, "session_id": "61e5519c.17cf58", "line_num": 1, "session_start": "2022-01-17 12:23:08 CET", "txid": 0, "error_severity": "LOG", "message": "checkpoint starting: time", "backend_type": "checkpointer", "query_id": 0 } { "timestamp": "2022-01-17 12:28:08.574 CET", "pid": 1560408, "session_id": "61e5519c.17cf58", "line_num": 2, "session_start": "2022-01-17 12:23:08 CET", "txid": 0, "error_severity": "LOG", "message": "checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.007 s, sync=0.002 s, total=0.025 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB", "backend_type": "checkpointer", "query_id": 0 }
My immediate question is: how does it look in case of multi-line queries?
=$ printf 'select\n relname,\n relkind\nfrom\n pg_class\nlimit 1;\n' | psql -aX SELECT relname, relkind FROM pg_class LIMIT 1; relname | relkind --------------+--------- pg_statistic | r (1 ROW)
And it's data in log looks like:
=$ grep relkind log/postgresql-2022-01-17_122308.json | tail -n1 {"timestamp":"2022-01-17 12:30:27.324 CET","user":"pgdba","dbname":"pgdba","pid":1563548,"remote_host":"[local]","session_id":"61e55353.17db9c","line_num":3,"ps":"SELECT","session_start":"2022-01-17 12:30:27 CET","vxid":"4/0","txid":0,"error_severity":"LOG","message":"duration: 0.693 ms statement: select\n relname,\n relkind\nfrom\n pg_class\nlimit 1;","application_name":"psql","backend_type":"client backend","query_id":6991982717157019333} =$ grep relkind log/postgresql-2022-01-17_122308.json | tail -n1 | jq . { "timestamp": "2022-01-17 12:30:27.324 CET", "user": "pgdba", "dbname": "pgdba", "pid": 1563548, "remote_host": "[local]", "session_id": "61e55353.17db9c", "line_num": 3, "ps": "SELECT", "session_start": "2022-01-17 12:30:27 CET", "vxid": "4/0", "txid": 0, "error_severity": "LOG", "message": "duration: 0.693 ms statement: select\n relname,\n relkind\nfrom\n pg_class\nlimit 1;", "application_name": "psql", "backend_type": "client backend", "query_id": 6991982717157020000 } =$ grep relkind log/postgresql-2022-01-17_122308.json | tail -n1 | jq -r .message duration: 0.693 ms statement: select relname, relkind from pg_class limit 1;
Sweet. Looks like each query will be in single line, regardless of how many lines it was, and that the standard tool (jq) handles it perfectly.
We can also use jq to find stuff in logs:
=$ jq 'select(.error_severity != "LOG" )' log/postgresql-2022-01-17_122308.json { "timestamp": "2022-01-17 12:41:48.112 CET", "user": "depesz", "dbname": "depesz", "pid": 1561647, "remote_host": "[local]", "session_id": "61e55238.17d42f", "line_num": 10, "ps": "SELECT", "session_start": "2022-01-17 12:25:44 CET", "vxid": "3/29", "txid": 0, "error_severity": "ERROR", "state_code": "22012", "message": "division by zero", "statement": "select 1/0;", "application_name": "psql", "backend_type": "client backend", "query_id": -6975083880902025000 }
This is awesome. Huge thanks to all involved.
As for integration with other tools, it looks great. But when it comes to troubleshooting, it seems to hinder people from reading directly.
CSV format is defined in RFC, so csv parsing libraries without multiline values support are just broken by design