On 23rd of January, Magnus Hagander committed patch which adds:
Add pg_basebackup tool for streaming base backups This tool makes it possible to do the pg_start_backup/ copy files/pg_stop_backup step in a single command. There are still some steps to be done before this is a complete backup solution, such as the ability to stream the required WAL logs, but it's still usable, and could do with some buildfarm coverage. In passing, make the checkpoint request optionally fast instead of hardcoding it. Magnus Hagander, reviewed by Fujii Masao and Dimitri Fontaine
It looks like it will provide a way to setup WAL slave without using any other protocol than PostgreSQL connections themselves.
In this regard it's a bit like Dimitris earlier project – pg_basebackup.
But – previous project used PL/PythonU to read the files. This time, the code uses additional logic in PostgreSQL itself (in WAL Sender as far as I recall). So let's see how it works.
I did setup normal 9.1 instance, with default parameters, and later changed following GUCs, to make the hot-standby work:
wal_level = hot_standby max_wal_senders = 5 archive_mode = on archive_command = '/bin/true' wal_keep_segments = 300
Note – I technically don't need archive_mode, as I'm not going to use wal archiving, but it makes pg_basebackup warn. In case you wonder – why it warns – well, you should have archive_mode or some non-zero wal_keep_segments, but using archiving is safer, and it's possible to setup compressed archive. On the other hand – this is just test setup, so I don't care about archive itself, just trying to suppress the warning.
Then, I created new role in master postgres, so that I can run basebackup and replication on it, and not on superuser:
# CREATE USER replica WITH replication password 'secret';
(I also added “host replication replica 127.0.0.1/32 trust" to pg_hba.conf).
With this setup, I did:
=$ rm -rf new-data-dir/; pg_basebackup -D new-data-dir -Fp -c fast -l basebackuptest -v -P -U replica -h 127.0.0.1 -p 5910 47008/47008 kB (100%) 1/1 tablespaces ( )011-01-24_142205.csv) NOTICE: pg_stop_backup complete, all required WAL segments have been archived pg_basebackup: base backup completed.
Which looks pretty cool.
Afterwards new-data-dir contained full copy of my normal data dir, including (unfortunately) pg_log/ content, but excluding (and it's good) pg_xlog content.
As for the second part – pg_xlog content – it was also missing pg_xlog/archive_status directory which can be problematic in future if we'd failover to the new server.
But anyway – let's change the port in slave system, make it hot standby, and start it.
First, in new-data-dir/postgresq.conf:
port=5911 hot_standby = on
and in new-data-dir/recovery.conf:
standby_mode = 'on' primary_conninfo = 'host=127.0.0.1 port=5910 user=replica password=secret' restore_command = '/bin/false' trigger_file = '/tmp/finish.recovery'
and now, I can start it:
=$ pg_ctl -D new-data-dir/ start server starting
And with this I can check if the replication is working on the base backup:
=$ psql -U pgdba -p 5910 -c 'select * from test order by id desc limit 2'; id | inserted_when | inserted_by -----+-------------------------------+------------- 170 | 2011-01-24 14:51:42.313384+01 | 5674 169 | 2011-01-24 14:51:12.29726+01 | 5654 (2 ROWS) =$ psql -U pgdba -p 5911 -c 'select * from test order by id desc limit 2'; id | inserted_when | inserted_by -----+-------------------------------+------------- 170 | 2011-01-24 14:51:42.313384+01 | 5674 169 | 2011-01-24 14:51:12.29726+01 | 5654 (2 ROWS)
OK. All works, seems to be good.
And I did it without any kind of ssh/rsync stuff.
The obvious question now is – what is the overhead of sending data over PostgreSQL communication channel/protocol.
So, I closed all applications that could connect to this database. Run vacuum and analyze on all databases, to avoid any additional writes, and move logs outside of $PGDATA.
This should make the datadir as static as possible without actually turning off PostgreSQL.
Afterwards, I made first pg_basebackup, to get baseline:
pg_basebackup -D baseline-copy -Fp -c fast -l basebackuptest -v -P -U replica -h 127.0.0.1 -p 5910
With this in place, I checked size of the “baseline-copy", and some tar/gz versions of it:
=$ GZIP="-9" tar czf baseline-copy-9.tar.gz baseline-copy/ =$ GZIP="-1" tar czf baseline-copy-1.tar.gz baseline-copy/ =$ tar czf baseline-copy.tar.gz baseline-copy/ =$ tar cf baseline-copy.tar baseline-copy/ =$ du -sk baseline-copy* 47396 baseline-copy 6392 baseline-copy-1.tar.gz 5488 baseline-copy-9.tar.gz 48092 baseline-copy.tar 5580 baseline-copy.tar.gz
OK. Now we need to measure how much data is being actually sent through Pg connection when doing the backup.
Well, for me the easier way is to use strace, and then just parse it's output.
So, I used this one-liner:
rm -rf test-copy{,.log}; strace -e connect,recvfrom,sendto,close -f -o test-copy.log pg_basebackup -D test-copy -Fp -l basebackuptest -U replica -h 127.0.0.1 -p 5910 &> /dev/null
and then this one to get summaries:
cat test-copy.log | perl -ne 'next unless /^\d+\s+connect\(3, .*5910/../^\d+\s+close\(3\)/; print if s/^\d+\s+(recvfrom|sendto)\(3,.* = (\d+)$/$1 $2/' | awk '{S[$1]+=$2} END{for (i in S) printf "%-15s : %d\n", i, S[i]/1024}'
(if you want to say that it's ugly – don't waste your breath/keyboard – I know. But it's just test situation. And I checked that it's filehandle 3.)
So. Results for just sending the backup without any compression, in plain format:
sendto : 0 recvfrom : 48201
Which means that virtually no data was transferred back to master (which is just a sanity check), and total size of data transferred from master to slave is slighly more than size of datadir. But not much more – just 805kB. Which is 1.7% of PGDATA size.
Let's see the same thing with compression:
First try, and I got fail:
=$ pg_basebackup -D test-copy -Fp -l basebackuptest -U replica -h 127.0.0.1 -Z 6 -p 5910 pg_basebackup: only tar mode backups can be compressed Try "pg_basebackup --help" for more information.
Hmm .. OK. Let's play with tar backups then:
=$ pg_basebackup -D test-copy -Ft -l basebackuptest -U replica -h 127.0.0.1 -Z 6 -p 5910
Afterwards, in test-copy, I found base.tar.gz file, which was 5560 kB in size. Strace showed, that this time size of transmission on the net was … 48201 kB.
Which makes some sense. Some in terms – compression is clearly done on the receiving end, and master is not being overloaded with this task. On the other hand – if your $PGDATA is large, you might still to prefer to make your base backups using normal means (tar/omnipitr-backup-master/rsync), to preserve network bandwidth.
All in all – it looks like a definitely cool tool, and I thank authors for it.
Nice writeup!
Just a FYI – compression is something we discussed, and something we eventually want to add (as an option, of course), but given how little time there is left before 9.1 freezes, it was pushed back. But it’s definitely on the radar for the future!
@Magnus:
that’s great. One question though – when done, will it be done by calls to zlib library, or external program? reason for question – i usually have lots of cpu power left, while io and network are much more problematic. and with ability to define my own compressor program, I would be able (for example) to use pigz (http://www.zlib.net/pigz/)
Or pbzip2 that i’m using quite often.