some time ago i wrote a piece on values(),() in postgresql 8.2, saying that multi-row inserts are very fast.
some people said that my benchmark is pretty useless as i didn't take into consideration transactions.
others asked me to translate the text to english.
so i decided to redo the test, with more test scenarios, and write it up in english. this is the summary.
at first what i used, what i tested and how.
i used a linux machine, with these things inside:
- cpu: AMD Sempron(tm) Processor 2600+ (1.6ghz)
- memory: 3gb
- discs: 4 250gb hitachi sata discs (only one was used)
i tried to make the machine as predictable as possible, thus i stopped all daemons which were not neccessary. full ps auxwwf output is provided in results tar file. basically – there is postgresql, sshd, openvpn, dhclient and some gettys. no cron, atd, smtpd, httpd or anything like this.
then i wrote a small program which generated test files. i do not distribute test files themselves, as in total they use nearly 70gb!
then i wrote another small program – which basically ran all of the tests (3 times to get an average).
full set of results is downloadable as tar file, which contains 10598 files (tar file is 350k, unpacked directory takes 42megs).
one very important notice. all tests that i have performed inserted random data to table of this structure:
- id int4 primary key,
- some_text text,
- bool_1 bool,
- bool_2 bool,
- some_date date,
- some_timestamp timestamp
so results (especially “break-points" where there is no further gain) will be different when inserting to another tables. the only point of this benchmark is to show which approach can give which results. and what's really worth the trouble 🙂
Continue reading how to insert data to database – as fast as possible