I was faced with interesting problem. Which schema, in my DB, uses the most disk space? Theoretically it's trivial, we have set of helpful functions:
- pg_column_size
- pg_database_size
- pg_indexes_size
- pg_relation_size
- pg_table_size
- pg_tablespace_size
- pg_total_relation_size
But in some cases it becomes more of a problem. For example – when you have thousands of tables …
For my sample DB I picked a database with over a million objects in it:
$ SELECT COUNT(*) FROM pg_class; COUNT --------- 1087322 (1 ROW)
There are over 700 schemas, each of them contains tables.
Naive query would look like:
$ SELECT n.nspname, SUM(pg_total_relation_size(c.oid)) AS total_size FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid WHERE c.relkind = 'r' GROUP BY n.nspname ORDER BY total_size DESC LIMIT 10;
The problem? Well, on my test DB, I let it ran for 3 minutes, and gave up.
It takes so long because I have so many objects. But is it possible to get this information faster? Yes. It's possible. It is not pretty, it it works.
To do it, we need to dig a bit deeper, and use file access functions.
First function I will need to use is “pg_ls_dir". It works like this:
$ SELECT * FROM pg_ls_dir('.') LIMIT 3; pg_ls_dir -------------- pg_xlog global pg_commit_ts (3 ROWS)
Now, which dir to ls? Initial idea would be “base", but if you have many tablespaces, then you might miss some files.
So, we need to read two potential places: “./base" and “./pg_tblspc".
We can start with this query:
$ WITH all_files AS ( SELECT 'base/' || l.filename AS path, x.* FROM pg_ls_dir('base/') AS l (filename), LATERAL pg_stat_file( 'base/' || l.filename) AS x UNION ALL SELECT 'pg_tblspc/' || l.filename AS path, x.* FROM pg_ls_dir('pg_tblspc/') AS l (filename), LATERAL pg_stat_file( 'pg_tblspc/' || l.filename) AS x ) SELECT * FROM all_files;
This shows first level elements in base and pg_tblspc directories. Now, we just need to do recursive descent into all directories there are …
$ WITH recursive all_files AS ( SELECT 'base/' || l.filename AS path, x.* FROM pg_ls_dir('base/') AS l (filename), LATERAL pg_stat_file( 'base/' || l.filename) AS x UNION ALL SELECT 'pg_tblspc/' || l.filename AS path, x.* FROM pg_ls_dir('pg_tblspc/') AS l (filename), LATERAL pg_stat_file( 'pg_tblspc/' || l.filename) AS x UNION ALL SELECT u.path || '/' || l.filename, x.* FROM all_files u, lateral pg_ls_dir(u.path) AS l(filename), lateral pg_stat_file( u.path || '/' || l.filename ) AS x WHERE u.isdir ) SELECT * FROM all_files;
This query, on the same server, returns ~ 1.1 million rows in ~11 seconds. Not bad. And how do the rows look like?
path | SIZE | access | modification | CHANGE | creation | isdir -----------------+---------+------------------------+------------------------+------------------------+----------+------- base/1 | 8192 | 2017-02-09 09:48:56+00 | 2017-02-09 09:50:03+00 | 2017-02-09 09:50:03+00 | [NULL] | t base/12374 | 8192 | 2017-02-09 09:48:56+00 | 2017-02-09 09:48:56+00 | 2017-02-09 09:49:28+00 | [NULL] | t base/12379 | 8192 | 2017-02-09 09:48:56+00 | 2018-02-15 18:16:44+00 | 2018-02-15 18:16:44+00 | [NULL] | t base/16401 | 8192 | 2017-02-09 09:48:57+00 | 2017-02-09 09:50:03+00 | 2017-02-09 09:50:03+00 | [NULL] | t base/16402 | 4485120 | 2017-02-09 09:48:59+00 | 2018-02-17 11:01:27+00 | 2018-02-17 11:01:27+00 | [NULL] | t base/pgsql_tmp | 6 | 2017-02-09 10:48:09+00 | 2018-02-17 12:29:24+00 | 2018-02-17 12:29:24+00 | [NULL] | t pg_tblspc/16400 | 29 | 2015-09-14 14:52:59+00 | 2017-02-09 09:35:45+00 | 2018-02-16 15:07:52+00 | [NULL] | t base/1/1255 | 581632 | 2017-02-09 09:48:56+00 | 2017-02-09 09:48:56+00 | 2017-02-09 09:49:28+00 | [NULL] | f base/1/1255_fsm | 24576 | 2017-02-09 09:48:56+00 | 2017-02-09 09:48:56+00 | 2017-02-09 09:49:28+00 | [NULL] | f base/1/1247 | 65536 | 2017-02-09 09:48:56+00 | 2017-02-09 09:48:56+00 | 2017-02-09 09:49:28+00 | [NULL] | f (10 ROWS)
This is not all that interesting, but let's filter it out, and extract what we really need.
First things first – we can only (sensibly) check files that belong to current database – otherwise we will not be able to map the file number (for example 1255) to table name. This is unfortunate, but (in my case) not a problem.
Second – we only need to care about data files – that is files which are named like “12314" or “1214.12". We don't care about _fsm or _vm files, because this are generally speaking small, and they are internal pg things.
So, let's limit what we have, and also – extract only file name from path:
$ WITH recursive all_elements AS ( SELECT 'base/' || l.filename AS path, x.* FROM pg_ls_dir('base/') AS l (filename), LATERAL pg_stat_file( 'base/' || l.filename) AS x UNION ALL SELECT 'pg_tblspc/' || l.filename AS path, x.* FROM pg_ls_dir('pg_tblspc/') AS l (filename), LATERAL pg_stat_file( 'pg_tblspc/' || l.filename) AS x UNION ALL SELECT u.path || '/' || l.filename, x.* FROM all_elements u, lateral pg_ls_dir(u.path) AS l(filename), lateral pg_stat_file( u.path || '/' || l.filename ) AS x WHERE u.isdir ), all_files AS ( SELECT path, SIZE FROM all_elements WHERE NOT isdir ) SELECT regexp_replace( regexp_replace(f.path, '.*/', ''), '\.[0-9]*$', '' ) AS filename, SUM( f.size ) FROM pg_database d, all_files f WHERE d.datname = current_database() AND f.path ~ ( '/' || d.oid || E'/[0-9]+(\\.[0-9]+)?$' ) GROUP BY filename;
This returns data in a bit nicer format:
filename | SUM -----------+------------ 897150761 | 8192 893855744 | 0 830027226 | 8192 846295375 | 0 875288146 | 16384 880671539 | 8192 890834780 | 8192 873076686 | 8192 896836699 | 49152
These numbers refer to column relfilenode in pg_class. So I can join pg_class, pg_namespace, and see how it looks:
$ WITH recursive all_elements AS ( SELECT 'base/' || l.filename AS path, x.* FROM pg_ls_dir('base/') AS l (filename), LATERAL pg_stat_file( 'base/' || l.filename) AS x UNION ALL SELECT 'pg_tblspc/' || l.filename AS path, x.* FROM pg_ls_dir('pg_tblspc/') AS l (filename), LATERAL pg_stat_file( 'pg_tblspc/' || l.filename) AS x UNION ALL SELECT u.path || '/' || l.filename, x.* FROM all_elements u, lateral pg_ls_dir(u.path) AS l(filename), lateral pg_stat_file( u.path || '/' || l.filename ) AS x WHERE u.isdir ), all_files AS ( SELECT path, SIZE FROM all_elements WHERE NOT isdir ), interesting_files AS ( SELECT regexp_replace( regexp_replace(f.path, '.*/', ''), '\.[0-9]*$', '' ) AS filename, SUM( f.size ) FROM pg_database d, all_files f WHERE d.datname = current_database() AND f.path ~ ( '/' || d.oid || E'/[0-9]+(\\.[0-9]+)?$' ) GROUP BY filename ) SELECT n.nspname, c.relname, c.relkind, f.sum AS SIZE FROM interesting_files f JOIN pg_class c ON f.filename::oid = c.relfilenode JOIN pg_namespace n ON c.relnamespace = n.oid ORDER BY SIZE DESC; nspname | relname | relkind | SIZE -----------------------+-----------------------------------------------------------------+---------+------------ pg_toast | pg_toast_805314153 | t | 3984195584 xxxxxxxxxxxxxxx_9053 | xxxxxxxx | r | 3538305024 xxxxxxxxx | xxxxxxxxxxxxxxxxxxx | r | 3062521856 xxxxxxxxxxxxxxx_11400 | xxxxxxxxxx | r | 2555461632 xxxxxxxxxxxxxxx_7860 | xxxxxxxxxxxxxxxxxxxx | r | 2443206656 xxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | i | 2237513728 xxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | i | 1667743744 xxxxxxxxxxxxxxx_8371 | xxxxxxxxxxx | r | 1553113088 pg_toast | pg_toast_806460704 | t | 1454399488 xxxxxxxxxxxxxxx_8371 | xxxxxxxxxxxxxxxxxxxx | r | 1329913856
sorry for censoring, but the table names might suggest things that are not relevant to this blogpost.
The thing is that while I got sizes of all tables (relkind = ‘r') and indexes (relkind = ‘i') – I also got, separately – sizes of toast tables (relkind = ‘t') – which are basically secondary storage for table data. And they are all in pg_toast schema, which doesn't suit me. I'd like to know original schema for each toast table, so I can sum it appropriately.
Luckily, this can be done with simple join. Finally, I get to this query:
$ WITH recursive all_elements AS ( SELECT 'base/' || l.filename AS path, x.* FROM pg_ls_dir('base/') AS l (filename), LATERAL pg_stat_file( 'base/' || l.filename) AS x UNION ALL SELECT 'pg_tblspc/' || l.filename AS path, x.* FROM pg_ls_dir('pg_tblspc/') AS l (filename), LATERAL pg_stat_file( 'pg_tblspc/' || l.filename) AS x UNION ALL SELECT u.path || '/' || l.filename, x.* FROM all_elements u, lateral pg_ls_dir(u.path) AS l(filename), lateral pg_stat_file( u.path || '/' || l.filename ) AS x WHERE u.isdir ), all_files AS ( SELECT path, SIZE FROM all_elements WHERE NOT isdir ), interesting_files AS ( SELECT regexp_replace( regexp_replace(f.path, '.*/', ''), '\.[0-9]*$', '' ) AS filename, SUM( f.size ) FROM pg_database d, all_files f WHERE d.datname = current_database() AND f.path ~ ( '/' || d.oid || E'/[0-9]+(\\.[0-9]+)?$' ) GROUP BY filename ) SELECT n.nspname AS schema_name, SUM( f.sum ) AS total_schema_size FROM interesting_files f JOIN pg_class c ON f.filename::oid = c.relfilenode LEFT OUTER JOIN pg_class dtc ON dtc.reltoastrelid = c.oid AND c.relkind = 't' JOIN pg_namespace n ON COALESCE( dtc.relnamespace, c.relnamespace ) = n.oid GROUP BY n.nspname ORDER BY total_schema_size DESC LIMIT 10
Which did return 10 most disk using schemas in less than 26 seconds.
Complicated. Not nice. Possibly still optimizable. Depending on some knowledge of filesystem layout. But works. And all done from plain SQL. I do love my PostgreSQL 🙂
I wonder why your naive approach query runs for so long. pg_total_relation_size internally does essentially the same thing – it stat()’s table files. Did you examine the plan for the query? Maybe it’s pg_class to pg_namespace join that is so slow?
Also, I think your second approach is not very careful as it doesn’t seem to take into account indices, table segments (large tables are stored on disk in separate files: 12345, 12345.1, 12345.2 and so on), fsm and vm
@Galaxy:
1. Sure my solution takes into account indexes and table parts.
2. fsm ans vm forks are very small, so i purposedly ignored them, and I even wrote about it in paragraph starting with “Second – we only need to care about data files” – the same where I mentioned I am caring about datafiles with “.x” – segments of table.
As for the “join being a suspect”, since my final query does the same join, i fail to see how it would be relevant in the first query but not last.
Sorry, I might have not read well. You are right about join, indices and I agree that vm/fsm is negligible.
Still, I cannot understand how could your second query doing the same thing but in obviously more complex way outperform the first simple query.
Was the plan more efficient in latter case? Could it be due to stat() call cache (that is you warmed the cache by your naive query and subsequent queries)?
Can you maybe try running first and second queries on freshly started db with file i/o cache purged?
@Galaxy:
I don’t have test db that I could use for this. I was running this query on basically a prod server (slave, but still prod) for a system we have 1 million tables for).
Still – you can easily repeat the test on your own if you don’t believe me.
Great Article…really helpful … 🙂
@Galaxy: I was suprised as well, especially because in the second query WITH RECURSIVE clause is used and, from my experience, it’s always a disaster (performance-wise). So I made some tests. first I’ve generated a lot of tables:
then created 2 views:
simple_one it’s the first query and recursive_one the second.
then I have ran pgbench tests. The first one with 30k empty tables:
and then added 40k more tables, so the test was ran on 70k alltogether:
As one can see, the second, recursive, more complicated query is much faster on my computer as well. It’s indeed surprising, but true 🙂