UPDATE
Apparently it's not going to happen.
I'm leaving the post here anyway, as I hope the code will resurface as extension, on PGXN maybe?
On 14th of May, Stephen Frost committed patch:
Add pg_audit, an auditing extension This extension provides detailed logging classes, ability to control logging at a per-object level, and includes fully-qualified object names for logged statements (DML and DDL) in independent fields of the log output. Authors: Ian Barwick, Abhijit Menon-Sen, David Steele Reviews by: Robert Haas, Tatsuo Ishii, Sawada Masahiko, Fujii Masao, Simon Riggs Discussion with: Josh Berkus, Jaime Casanova, Peter Eisentraut, David Fetter, Yeb Havinga, Alvaro Herrera, Petr Jelinek, Tom Lane, MauMau, Bruce Momjian, Jim Nasby, Michael Paquier, Fabrízio de Royes Mello, Neil Tiffin
Now, this is something that I've been dreaming about for ages.
The usecase in docs mentions: “The goal is to provide the tools needed to produce audit logs required to pass any government, financial, or ISO certification audit.".
I don't care about government, financial or ISO certification audit. What I care is finding queries that touch particular place in system (table). Currently there was no such feature that would let me log queries that touch only specific tables. Of course I could have enabled logging of all queries, but it's not always practical or even possible on production systems.
So, let's see how it works.
First, I need to load the extension itself, so:
$ psql -c "alter system set shared_preload_libraries = 'pg_audit';" $ pg_ctl restart
Now, the extension is loaded.
Normally one wouldn't be able to set params in normal sql queries, as you need superuser to change pg_audit settings, but since my account is superuser in this test database, I can play with:
$ SET pg_audit.log = 'all'; SET $ SET pg_audit.log_relation = ON; SET $ SELECT * FROM pg_class LIMIT 2; relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | reltoastrelid | relhasindex | relisshared | relpersistence | relkind | relnatts | relchecks | relhasoids | relhaspkey | relhasrules ----------------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+------------ testh | 2200 | 16788 | 0 | 16384 | 0 | 16786 | 0 | 37 | 2000 | 0 | 16789 | f | f | p | r | 3 | 0 | f | f | f pg_toast_16786 | 99 | 16790 | 0 | 16384 | 0 | 16789 | 0 | 0 | 0 | 0 | 0 | t | f | p | t | 3 | 0 | f | t | f (2 ROWS) $ \l List OF DATABASES Name | Owner | Encoding | COLLATE | Ctype | Access privileges --------------------+--------------------+----------+-------------+-------------+------------------- depesz | depesz | UTF8 | en_US.UTF-8 | en_US.UTF-8 | pgdba | pgdba | UTF8 | en_US.UTF-8 | en_US.UTF-8 | postgres | pgdba | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template0 | pgdba | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/pgdba + | | | | | pgdba=CTc/pgdba template1 | pgdba | UTF8 | en_US.UTF-8 | en_US.UTF-8 | pgdba=CTc/pgdba + | | | | | =c/pgdba (5 ROWS) $ vacuum pg_language ; VACUUM
After all of this, I can search for AUDIT logs:
=$ grep AUDIT: postgresql-2015-05-20_212346.log 2015-05-20 21:26:05.058 CEST depesz@depesz 2913 [local] LOG: AUDIT: SESSION,1,1,MISC,SET,,,set pg_audit.log = 'all';,<not logged> 2015-05-20 21:26:06.710 CEST depesz@depesz 2913 [local] LOG: AUDIT: SESSION,2,1,MISC,SET,,,set pg_audit.log_relation = on;,<not logged> 2015-05-20 21:26:11.872 CEST depesz@depesz 2913 [local] LOG: AUDIT: SESSION,3,1,READ,SELECT,TABLE,pg_catalog.pg_class,select * from pg_class limit 2;,<not logged> 2015-05-20 21:26:14.596 CEST depesz@depesz 2913 [local] LOG: AUDIT: SESSION,4,1,READ,SELECT,TABLE,pg_catalog.pg_database,"SELECT d.datname as ""Name"", 2015-05-20 21:26:20.790 CEST depesz@depesz 2913 [local] LOG: AUDIT: SESSION,5,1,READ,SELECT,TABLE,pg_catalog.pg_class,"SELECT pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class c WHERE c.relkind IN ('r', 'm') AND substring(pg_catalog.quote_ident(c.relname),1,7)='pg_lang' AND pg_catalog.pg_table_is_visible(c.oid) 2015-05-20 21:26:20.790 CEST depesz@depesz 2913 [local] LOG: AUDIT: SESSION,5,1,READ,SELECT,TABLE,pg_catalog.pg_namespace,"SELECT pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class c WHERE c.relkind IN ('r', 'm') AND substring(pg_catalog.quote_ident(c.relname),1,7)='pg_lang' AND pg_catalog.pg_table_is_visible(c.oid) 2015-05-20 21:26:20.790 CEST depesz@depesz 2913 [local] LOG: AUDIT: SESSION,5,1,READ,SELECT,TABLE,pg_catalog.pg_class,"SELECT pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class c WHERE c.relkind IN ('r', 'm') AND substring(pg_catalog.quote_ident(c.relname),1,7)='pg_lang' AND pg_catalog.pg_table_is_visible(c.oid) 2015-05-20 21:26:20.790 CEST depesz@depesz 2913 [local] LOG: AUDIT: SESSION,5,1,READ,SELECT,TABLE,pg_catalog.pg_namespace,"SELECT pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class c WHERE c.relkind IN ('r', 'm') AND substring(pg_catalog.quote_ident(c.relname),1,7)='pg_lang' AND pg_catalog.pg_table_is_visible(c.oid) 2015-05-20 21:26:20.790 CEST depesz@depesz 2913 [local] LOG: AUDIT: SESSION,5,1,READ,SELECT,TABLE,pg_catalog.pg_namespace,"SELECT pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class c WHERE c.relkind IN ('r', 'm') AND substring(pg_catalog.quote_ident(c.relname),1,7)='pg_lang' AND pg_catalog.pg_table_is_visible(c.oid) 2015-05-20 21:26:20.790 CEST depesz@depesz 2913 [local] LOG: AUDIT: SESSION,5,1,READ,SELECT,TABLE,pg_catalog.pg_namespace,"SELECT pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class c WHERE c.relkind IN ('r', 'm') AND substring(pg_catalog.quote_ident(c.relname),1,7)='pg_lang' AND pg_catalog.pg_table_is_visible(c.oid) 2015-05-20 21:26:21.298 CEST depesz@depesz 2913 [local] LOG: AUDIT: SESSION,6,1,MISC,VACUUM,,,vacuum pg_language ;,<not logged>
The values look complicated, but that's because it's all CSV (after AUDIT: prefix).
I thought I can load it to database, but the list of columns in docs doesn't seem to match the columns in the log. But that's not really important.
For simple line:
AUDIT: SESSION,3,1,READ,SELECT,TABLE,pg_catalog.pg_class,SELECT * FROM pg_class LIMIT 2;,<NOT logged>
We know that it was SESSION auditing (more on it later), it was 3rd logged command, and 1st “subcommand" for this command. The level of access was READ, and it was issued by SELECT query. Touched object was a table, and it was pg_catalog.pg_class table. Finally I got information about the query itself.
What about these subcommands?
Let's see:
CREATE FUNCTION test() RETURNS void AS $$ DECLARE r record; BEGIN SELECT COUNT(*) INTO r FROM pg_class WHERE relkind = 'r'; raise notice 'tables = %', r.count; SELECT COUNT(*) INTO r FROM pg_class WHERE relkind = 'i'; raise notice 'indexes = %', r.count; RETURN; END; $$ LANGUAGE plpgsql;
And then I'll run:
select test() in the audited session, I got in logs:
SESSION,9,1,READ,SELECT,,,SELECT test();,<NOT logged> SESSION,9,2,FUNCTION,EXECUTE,FUNCTION,public.test,SELECT test();,<NOT logged> SESSION,9,3,READ,SELECT,TABLE,pg_catalog.pg_class,SELECT COUNT(*) FROM pg_class WHERE relkind = 'r',<NOT logged> SESSION,9,4,READ,SELECT,TABLE,pg_catalog.pg_class,SELECT COUNT(*) FROM pg_class WHERE relkind = 'i',<NOT logged>
As you can see we now have subcommands.
What about the “SESSION" field?
This is where it really gets to me. Apparently you can make auditing enabled for specific objects, and not only whole sessions.
How to do it?
First we need to create a role:
$ CREATE ROLE auditor; CREATE ROLE $ GRANT SELECT, INSERT, UPDATE, DELETE ON test TO auditor; GRANT $ ALTER SYSTEM SET pg_audit.role = 'auditor'; ALTER SYSTEM $ SELECT pg_reload_conf(); pg_reload_conf ---------------- t (1 ROW)
Now, I have two tables, test and test2. Both have the same data, and the same structure.
If I'd do:
SELECT * FROM test; SELECT * FROM test2;
as “depesz" account, nothing will be logged.
But, when I'll do:
$ GRANT auditor TO depesz; GRANT ROLE
Now, the same two queries will log:
LOG: AUDIT: OBJECT,1,1,READ,SELECT,TABLE,public.test,SELECT * FROM test;,<NOT logged>
Query to test2 was not logged because auditor role has access granted only to test. It doesn't mean that I don't have access to test2 table – I have, after all, depesz account is superuser, but logged will be only queries to objects that auditors role has access to.
Or however I will name the role, given that it will be set in pg_audit.role setting.
What's cool, I can set this per column in a table:
$ GRANT SELECT (some_text) ON test2 TO auditor; GRANT
With this in place, I can run
SELECT id FROM test2;
and it will not get logged, but if I'll do:
SELECT some_text FROM test2;
or
SELECT * FROM test2;
it will immediately be logged:
AUDIT: OBJECT,2,1,READ,SELECT,TABLE,public.test2,SELECT some_text FROM test2;,<NOT logged> AUDIT: OBJECT,3,1,READ,SELECT,TABLE,public.test2,SELECT * FROM test2;,<NOT logged>
This is great. I sense a lot of really cool uses for this. There are still rough edges, mostly in docs, like – what is the last column? Should I use pg_audit.log_level, or pg_audit.log_notice?, but in any way – it's a great addition, and I look forward to having it on production DBs. Thanks guys.