Which schema is using the most disk space?

I was faced with interesting problem. Which schema, in my DB, uses the most disk space? Theoretically it's trivial, we have set of helpful functions:

pg_column_size
pg_database_size
pg_indexes_size
pg_relation_size
pg_table_size
pg_tablespace_size
pg_total_relation_size

But in some cases it becomes more of a problem. For example – when you have thousands of tables …

For my sample DB I picked a database with over a million objects in it:

$ select count(*) from pg_class;
  count
---------
 1087322
(1 row)

There are over 700 schemas, each of them contains tables.

Naive query would look like:

$ select n.nspname, sum(pg_total_relation_size(c.oid)) as total_size
from pg_class c join pg_namespace n on c.relnamespace = n.oid
where c.relkind = 'r'
group by n.nspname order by total_size desc limit 10;

The problem? Well, on my test DB, I let it ran for 3 minutes, and gave up.

It takes so long because I have so many objects. But is it possible to get this information faster? Yes. It's possible. It is not pretty, it it works.

To do it, we need to dig a bit deeper, and use file access functions.

First function I will need to use is “pg_ls_dir". It works like this:

$ select * from pg_ls_dir('.') limit 3;
  pg_ls_dir
--------------
 pg_xlog
 global
 pg_commit_ts
(3 rows)

Now, which dir to ls? Initial idea would be “base", but if you have many tablespaces, then you might miss some files.

So, we need to read two potential places: “./base" and “./pg_tblspc".

We can start with this query:

$ with  all_files as (
    SELECT 'base/' || l.filename as path, x.*
    FROM
        pg_ls_dir('base/') as l (filename),
        LATERAL pg_stat_file( 'base/' || l.filename) as x
    UNION ALL
    SELECT 'pg_tblspc/' || l.filename as path, x.*
    FROM
        pg_ls_dir('pg_tblspc/') as l (filename),
        LATERAL pg_stat_file( 'pg_tblspc/' || l.filename) as x
)
SELECT * FROM all_files;

This shows first level elements in base and pg_tblspc directories. Now, we just need to do recursive descent into all directories there are …

$ with recursive all_files as (
    SELECT 'base/' || l.filename as path, x.*
    FROM
        pg_ls_dir('base/') as l (filename),
        LATERAL pg_stat_file( 'base/' || l.filename) as x
    UNION ALL
    SELECT 'pg_tblspc/' || l.filename as path, x.*
    FROM
        pg_ls_dir('pg_tblspc/') as l (filename),
        LATERAL pg_stat_file( 'pg_tblspc/' || l.filename) as x
    UNION ALL
    SELECT
        u.path || '/' || l.filename, x.*
    FROM
        all_files u,
        lateral pg_ls_dir(u.path) as l(filename),
        lateral pg_stat_file( u.path || '/' || l.filename ) as x
    WHERE
        u.isdir
)
SELECT * FROM all_files;

This query, on the same server, returns ~ 1.1 million rows in ~11 seconds. Not bad. And how do the rows look like?

      path       |  size   |         access         |      modification      |         change         | creation | isdir
-----------------+---------+------------------------+------------------------+------------------------+----------+-------
 base/1          |    8192 | 2017-02-09 09:48:56+00 | 2017-02-09 09:50:03+00 | 2017-02-09 09:50:03+00 | [null]   | t
 base/12374      |    8192 | 2017-02-09 09:48:56+00 | 2017-02-09 09:48:56+00 | 2017-02-09 09:49:28+00 | [null]   | t
 base/12379      |    8192 | 2017-02-09 09:48:56+00 | 2018-02-15 18:16:44+00 | 2018-02-15 18:16:44+00 | [null]   | t
 base/16401      |    8192 | 2017-02-09 09:48:57+00 | 2017-02-09 09:50:03+00 | 2017-02-09 09:50:03+00 | [null]   | t
 base/16402      | 4485120 | 2017-02-09 09:48:59+00 | 2018-02-17 11:01:27+00 | 2018-02-17 11:01:27+00 | [null]   | t
 base/pgsql_tmp  |       6 | 2017-02-09 10:48:09+00 | 2018-02-17 12:29:24+00 | 2018-02-17 12:29:24+00 | [null]   | t
 pg_tblspc/16400 |      29 | 2015-09-14 14:52:59+00 | 2017-02-09 09:35:45+00 | 2018-02-16 15:07:52+00 | [null]   | t
 base/1/1255     |  581632 | 2017-02-09 09:48:56+00 | 2017-02-09 09:48:56+00 | 2017-02-09 09:49:28+00 | [null]   | f
 base/1/1255_fsm |   24576 | 2017-02-09 09:48:56+00 | 2017-02-09 09:48:56+00 | 2017-02-09 09:49:28+00 | [null]   | f
 base/1/1247     |   65536 | 2017-02-09 09:48:56+00 | 2017-02-09 09:48:56+00 | 2017-02-09 09:49:28+00 | [null]   | f
(10 rows)

This is not all that interesting, but let's filter it out, and extract what we really need.

First things first – we can only (sensibly) check files that belong to current database – otherwise we will not be able to map the file number (for example 1255) to table name. This is unfortunate, but (in my case) not a problem.

Second – we only need to care about data files – that is files which are named like “12314" or “1214.12". We don't care about _fsm or _vm files, because this are generally speaking small, and they are internal pg things.

So, let's limit what we have, and also – extract only file name from path:

$ with recursive all_elements as (
    SELECT 'base/' || l.filename as path, x.*
    FROM
        pg_ls_dir('base/') as l (filename),
        LATERAL pg_stat_file( 'base/' || l.filename) as x
    UNION ALL
    SELECT 'pg_tblspc/' || l.filename as path, x.*
    FROM
        pg_ls_dir('pg_tblspc/') as l (filename),
        LATERAL pg_stat_file( 'pg_tblspc/' || l.filename) as x
    UNION ALL
    SELECT
        u.path || '/' || l.filename, x.*
    FROM
        all_elements u,
        lateral pg_ls_dir(u.path) as l(filename),
        lateral pg_stat_file( u.path || '/' || l.filename ) as x
    WHERE
        u.isdir
), all_files as (
    SELECT path, size FROM all_elements WHERE NOT isdir
)
SELECT
    regexp_replace(
        regexp_replace(f.path, '.*/', ''),
        '\.[0-9]*$',
        ''
    ) as filename,
    sum( f.size )
FROM
    pg_database d,
    all_files f
WHERE
    d.datname = current_database() AND
    f.path ~ ( '/' || d.oid || E'/[0-9]+(\\.[0-9]+)?$' )
group BY filename;

This returns data in a bit nicer format:

 filename  |    sum
-----------+------------
 897150761 |       8192
 893855744 |          0
 830027226 |       8192
 846295375 |          0
 875288146 |      16384
 880671539 |       8192
 890834780 |       8192
 873076686 |       8192
 896836699 |      49152

These numbers refer to column relfilenode in pg_class. So I can join pg_class, pg_namespace, and see how it looks:

$ with recursive all_elements as (
    SELECT 'base/' || l.filename as path, x.*
    FROM
        pg_ls_dir('base/') as l (filename),
        LATERAL pg_stat_file( 'base/' || l.filename) as x
    UNION ALL
    SELECT 'pg_tblspc/' || l.filename as path, x.*
    FROM
        pg_ls_dir('pg_tblspc/') as l (filename),
        LATERAL pg_stat_file( 'pg_tblspc/' || l.filename) as x
    UNION ALL
    SELECT
        u.path || '/' || l.filename, x.*
    FROM
        all_elements u,
        lateral pg_ls_dir(u.path) as l(filename),
        lateral pg_stat_file( u.path || '/' || l.filename ) as x
    WHERE
        u.isdir
), all_files as (
    SELECT path, size FROM all_elements WHERE NOT isdir
), interesting_files as (
    SELECT
        regexp_replace(
            regexp_replace(f.path, '.*/', ''),
            '\.[0-9]*$',
            ''
        ) as filename,
        sum( f.size )
    FROM
        pg_database d,
        all_files f
    WHERE
        d.datname = current_database() AND
        f.path ~ ( '/' || d.oid || E'/[0-9]+(\\.[0-9]+)?$' )
    group BY filename
)
SELECT
    n.nspname,
    c.relname,
    c.relkind,
    f.sum as size
FROM
    interesting_files f
    join pg_class c on f.filename::oid = c.relfilenode
    join pg_namespace n on c.relnamespace = n.oid
ORDER BY
    size desc;
        nspname        |                             relname                             | relkind |    size
-----------------------+-----------------------------------------------------------------+---------+------------
 pg_toast              | pg_toast_805314153                                              | t       | 3984195584
 xxxxxxxxxxxxxxx_9053  | xxxxxxxx                                                        | r       | 3538305024
 xxxxxxxxx             | xxxxxxxxxxxxxxxxxxx                                             | r       | 3062521856
 xxxxxxxxxxxxxxx_11400 | xxxxxxxxxx                                                      | r       | 2555461632
 xxxxxxxxxxxxxxx_7860  | xxxxxxxxxxxxxxxxxxxx                                            | r       | 2443206656
 xxxxxxxxx             | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                        | i       | 2237513728
 xxxxxxxxx             | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx             | i       | 1667743744
 xxxxxxxxxxxxxxx_8371  | xxxxxxxxxxx                                                     | r       | 1553113088
 pg_toast              | pg_toast_806460704                                              | t       | 1454399488
 xxxxxxxxxxxxxxx_8371  | xxxxxxxxxxxxxxxxxxxx                                            | r       | 1329913856

sorry for censoring, but the table names might suggest things that are not relevant to this blogpost.

The thing is that while I got sizes of all tables (relkind = ‘r') and indexes (relkind = ‘i') – I also got, separately – sizes of toast tables (relkind = ‘t') – which are basically secondary storage for table data. And they are all in pg_toast schema, which doesn't suit me. I'd like to know original schema for each toast table, so I can sum it appropriately.

Luckily, this can be done with simple join. Finally, I get to this query:

$ with recursive all_elements as (
    SELECT 'base/' || l.filename as path, x.*
    FROM
        pg_ls_dir('base/') as l (filename),
        LATERAL pg_stat_file( 'base/' || l.filename) as x
    UNION ALL
    SELECT 'pg_tblspc/' || l.filename as path, x.*
    FROM
        pg_ls_dir('pg_tblspc/') as l (filename),
        LATERAL pg_stat_file( 'pg_tblspc/' || l.filename) as x
    UNION ALL
    SELECT
        u.path || '/' || l.filename, x.*
    FROM
        all_elements u,
        lateral pg_ls_dir(u.path) as l(filename),
        lateral pg_stat_file( u.path || '/' || l.filename ) as x
    WHERE
        u.isdir
), all_files as (
    SELECT path, size FROM all_elements WHERE NOT isdir
), interesting_files as (
    SELECT
        regexp_replace(
            regexp_replace(f.path, '.*/', ''),
            '\.[0-9]*$',
            ''
        ) as filename,
        sum( f.size )
    FROM
        pg_database d,
        all_files f
    WHERE
        d.datname = current_database() AND
        f.path ~ ( '/' || d.oid || E'/[0-9]+(\\.[0-9]+)?$' )
    group BY filename
)
SELECT
    n.nspname as schema_name,
    sum( f.sum ) as total_schema_size
FROM
    interesting_files f
    join pg_class c on f.filename::oid = c.relfilenode
    left outer join pg_class dtc on dtc.reltoastrelid = c.oid AND c.relkind = 't'
    join pg_namespace n on coalesce( dtc.relnamespace, c.relnamespace ) = n.oid
group BY
    n.nspname
ORDER BY
    total_schema_size desc
LIMIT 10

Which did return 10 most disk using schemas in less than 26 seconds.

Complicated. Not nice. Possibly still optimizable. Depending on some knowledge of filesystem layout. But works. And all done from plain SQL. I do love my PostgreSQL 🙂

@Galaxy: I was suprised as well, especially because in the second query WITH RECURSIVE clause is used and, from my experience, it’s always a disaster (performance-wise). So I made some tests. first I’ve generated a lot of tables:

for i in {1..30000}; do psql -c "CREATE TABLE test_${i} (id int, smt text)" test; done;

then created 2 views:

test=# \dv
            List of relations
 Schema |     Name      | Type |  Owner   
--------+---------------+------+----------
 public | recursive_one | view | postgres
 public | simple_one    | view | postgres

simple_one it’s the first query and recursive_one the second.

then I have ran pgbench tests. The first one with 30k empty tables:

-bash-4.2$ pgbench test -T 600 -f simple_one.sql 
starting vacuum...end.
transaction type: simple_one.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 600 s
number of transactions actually processed: 153
latency average = 3935.144 ms
tps = 0.254120 (including connections establishing)
tps = 0.254123 (excluding connections establishing)
 
-bash-4.2$ pgbench test -T 600 -f recursive_one.sql 
starting vacuum...end.
transaction type: recursive_one.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 600 s
number of transactions actually processed: 258
latency average = 2326.971 ms
tps = 0.429743 (including connections establishing)
tps = 0.429748 (excluding connections establishing)

and then added 40k more tables, so the test was ran on 70k alltogether:

-bash-4.2$ pgbench test -T 600 -f recursive_one.sql 
starting vacuum...end.
transaction type: recursive_one.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 600 s
number of transactions actually processed: 70
latency average = 8597.467 ms
tps = 0.116313 (including connections establishing)
tps = 0.116315 (excluding connections establishing)
 
-bash-4.2$ pgbench test -T 600 -f simple_one.sql 
starting vacuum...end.
transaction type: simple_one.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 600 s
number of transactions actually processed: 15
latency average = 41760.897 ms
tps = 0.023946 (including connections establishing)
tps = 0.023946 (excluding connections establishing)

As one can see, the second, recursive, more complicated query is much faster on my computer as well. It’s indeed surprising, but true 🙂

=$
|

Which schema is using the most disk space?

6 thoughts on “Which schema is using the most disk space?”