Today, I released new version of OmniPITR – 0.5.0.
This new version has one important new feature – which is so called “direct destination" for backups.
What it means? What it does? How it helps? Let's see…
Let's assume you have remote destination for backups, something like:
$ omnipitr-backup-master ... -dr gzip=storage.host:/path/to/store/backups ...
Up to 0.4.0, OmniPITR did (it's a simplification, but good enough for this example):
- $ tar czf /tmp/data.tar.gz $PGDATA
- $ scp /tmp/data.tar.gz storage.host:/path/to/store/backups/
- $ rm /tmp/data.tar.gz
- $ tar czf /tmp/xlogs.tar.gz $XLOGS-DIR
- $ scp /tmp/xlog.tar.gz storage.host:/path/to/store/backups/
- $ rm /tmp/xlog.tar.gz
This is all fine, and it's pretty standard, but it's not optimal. Why? For starters – it causes more disk I/O – we read data, and store it locally as tarball. And then we re-read the tarball to send it to remote machine.
It also causes peak in network usage – scp (or whatever is actually used) will use all the bandwidth available.
And, at the end – we have to remove two files – which can be pretty big (think hundreds of gigabytes), and rm of them, on ext3, can be pretty painful.
So, what is the solution?
It's simple, instead of running it the way I showed, why not: run tar, output from tar direct to ssh, which connects to storage.host, and then stores the file in final place? Something like:
$ tar czf - $PGDATA | ssh storage.host 'cat - > /path/to/store/backups/data.tar.gz'
In this way we avoid writing and rereading tarball on database server. And also – since transfer of tarball is done in parallel to its creation – it is (usually) limited by tar/gzip speed, which makes for slower transfer. Despite slower transfer – we get the backup earlier on backup server, because transfer started earlier.
So – it's all win.
Of course writing it like I shown above seems trivial, until you'll consider that omnipitr is very configurable. So you can have multiple destinations. Multiple compression schemata. And multiple checksums generated.
Long story short – omnipitr-backup-master (and -slave too, of course) can do direct destinations, regardless of complexity, but to do so, it needs (new requirement) bash. And the actual command that gets executed can sometimes be very unreadable. But it's ok – it's generated programmatically, so it doesn't have to be read 🙂
As a side effect of the changes, all local destinations are now processed in parallel (as are all direct destinations) while creating tarball.
Finally – thanks for the idea, and prodding to Gary of justin.tv.
wow, advanced technologies.
For one, when my engineers do such things as you guys had in 4.0 I always ask them, is it really simpler that way ? Because it doesn’t seem to be !
@gj:
Sorry, don’t understand what you’re saying. Doing remote tarballs with “tar + scp + rm” is definitely simpler than doing it via “tar + ssh + cat”. Reason is simple – it’s more common. And just try to imagine doing the “tar + ssh + cat” with checksumming, 3 remote destinations, 2 local destinations, and different compressions. (omnipitr 0.5.0 can do it).