If you have production DB servers, chances are you're running variant of these queries:
SELECT SUM(pg_relation_size(oid)) FROM pg_class WHERE relkind = 'i' SELECT SUM(pg_relation_size(oid)) FROM pg_class WHERE relkind = 'r'
To get summarized size of tables and/or indexes in your database (for example for graphing purposes).
This (getting pg_relation_size for rows in pg_class) has one problem – it can lock, or it can fail.
Let's see those problems. First – locking.
We will need two psql sessions, and in them:
- Session #1: create table z (i int4);
- Session #1: begin;
- Session #1: drop table z;
- Session #2: SELECT sum(pg_relation_size(oid)) from pg_class where relkind = ‘r';
and we can see that session #2 hangs until Session #1 will commit or rollback.
If it will rollback – we'll get the size. But if Session #1 will commit, in Session #2 we'll get:
ERROR: could NOT OPEN relation WITH OID 18695
Of course the OID can be different.
That far from good.
So, since I encountered this problem, I thought that there has to be better solution – perhaps not as elegant, but at the very least not breaking every now and then.
Today, I finally got fed up with alert mails about broken queries, and wrote this:
SELECT c.oid::regclass AS relation_name, c.relkind AS relation_kind, x.file_size AS relation_size FROM pg_class c JOIN ( SELECT regexp_replace( file_name, '[^0-9].*', '') AS filenode, SUM( ( pg_stat_file( path || '/' || file_name ) ).size ) AS file_size FROM ( SELECT path || '/' || file_name AS path, pg_ls_dir( path || '/' || file_name) AS file_name FROM ( SELECT location AS path, pg_ls_dir( location ) AS file_name FROM ( SELECT 'pg_tblspc/' || t || CASE WHEN current_setting( 'server_version_num' )::INT4 >= 90000 THEN '/' || ( SELECT q FROM pg_ls_dir('pg_tblspc/' || t) AS q WHERE (pg_stat_file( 'pg_tblspc/' || t||'/'||q)).isdir='true' AND q LIKE 'PG_' || regexp_replace( current_setting('server_version'), E'^([0-9]+\\.[0-9]+).*$', E'\\1') || '%' ) ELSE '' END AS location FROM pg_ls_dir('pg_tblspc') AS t WHERE (pg_stat_file('pg_tblspc/' || t)).isdir = 'true' UNION SELECT 'base' ) AS dirs ) AS db_dirs WHERE file_name = ( SELECT oid::TEXT FROM pg_database WHERE datname = current_database() ) ) AS file_list GROUP BY filenode ) AS x ON c.relfilenode::TEXT = x.filenode ;
It's not nice. It doesn't have the charm of simple: select pg_relation_size(). But it works.
Result of this query is list of all relations, their types ( the most interesting are “r" – relation, “i" – index, “t" – toast table ), and size.
It works on 8.3+ (possibly on 8.2 too, but I didn't check).
In normal circumstances it's most likely slower than normal pg_relation_size() calls, but this has the benefit of not locking/failing. Which is a huge deal for me.
The major drawback of this solution is that it needs to be ran using superuser account (due to usage of pg_ls_dir() and pg_stat_file() functions). But that shouldn't be a big problem – after all it's just for monitoring.
Put this query into a function and gave it SECURITY DEFINER and that allows non-superusers to get this data.
Thanks Depesz!
Hi Depesz,
This is quite clever, thanks for sharing!
I think the problem with the SELECT sum(…) query bailing out with “ERROR: could not open …” could be worked around by making a PL/pgSQL wrapper around the call to pg_total_relation_size(). Something like this:
CREATE OR REPLACE FUNCTION
no_err_pg_total_relation_size(v_relation regclass)
RETURNS bigint AS
$$
DECLARE
v_size bigint;
BEGIN
BEGIN
SELECT pg_total_relation_size(v_relation)
INTO v_size;
EXCEPTION
WHEN internal_error THEN
RAISE WARNING ‘Caught error: %’, SQLERRM;
return 0::bigint;
END;
RETURN v_size;
END;
$$ LANGUAGE plpgsql;
Of course, this wouldn’t help with the blocking against a concurrent DROP TABLE or similar – your solution is necessary for that.
Another option is not to use pg_relation_size at all, and use (relpages * 8192) or whatever block size you use. No, it’s not updated immediately (only when the table is analyzed), but in most cases this should not be an issue.