On 28th of June, Robert Haas committed patch:
Dramatically reduce System V shared memory consumption. Except when compiling with EXEC_BACKEND, we'll now allocate only a tiny amount of System V shared memory (as an interlock to protect the data directory) and allocate the rest as anonymous shared memory via mmap. This will hopefully spare most users the hassle of adjusting operating system parameters before being able to start PostgreSQL with a reasonable value for shared_buffers. There are a bunch of documentation updates needed here, and we might need to adjust some of the HINT messages related to shared memory as well. But it's not 100% clear how portable this is, so before we write the documentation, let's give it a spin on the buildfarm and see what turns red.
This patch doesn't add any new functionality, but removes one thing that had caused some issues.
As you perhaps know, PostgreSQL has so called “shared_buffers". In there, it stores various data, most importantly copies of data pages (8kB blocks).
Problem with shared_buffers is that you usually start by setting them to something like 20%-25% of available RAM, which with current multi-gigabyte servers is a non-trivial amount.
And most of the systems I've seen have very conservative limits on how much shared memory there can be. For example – my desktop Ubuntu 12.04 has the limit set to:
=$ cat /proc/sys/kernel/shmmax 33554432
“Whopping" 32MB.
This means that when you configure your PostgreSQL to actually use the memory it has for good use – i.e. for shared_buffers – you have to configure your kernel too.
And if you forget, or something fails to re-configure it on reboot – PostgreSQL will not start, showing errors like:
2012-07-12 12:25:13 CEST [] [7510]: [1-1] user=,db=,e=XX000: FATAL: could not create shared memory segment: Invalid argument 2012-07-12 12:25:13 CEST [] [7510]: [2-1] user=,db=,e=XX000: DETAIL: Failed system call was shmget(key=5910001, size=3318874112, 03600). 2012-07-12 12:25:13 CEST [] [7510]: [3-1] user=,db=,e=XX000: HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 3318874112 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections. If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for. The PostgreSQL documentation contains more information about shared memory configuration.
Error message is pretty helpful, so fixing it is usually not a problem. But why have the error in the first place, when you can skip it altogether?
Roberts commit does exactly it. Instead of using so called “System V shared memory" (which is the subject to limitation in SHMMAX), it switches to use shared memory by mmap.
Thanks to this, on the same machine, with the same 32MB limit for SHMMAX, I can start PostgreSQL 9.3, with shared_buffers = 3GB, and it works:
=$ ps -u pgdba f u USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND pgdba 7993 0.0 0.7 3268388 86136 ? S 12:28 0:00 /home/pgdba/work/bin/postgres pgdba 7997 0.0 0.0 24792 560 ? Ss 12:28 0:00 \_ postgres: logger process pgdba 7999 0.0 0.0 3269928 984 ? Ss 12:28 0:00 \_ postgres: checkpointer process pgdba 8000 0.0 0.1 3269928 15772 ? Ss 12:28 0:00 \_ postgres: writer process pgdba 8001 0.0 0.0 3269928 984 ? Ss 12:28 0:00 \_ postgres: wal writer process pgdba 8002 0.0 0.0 3270932 2456 ? Ss 12:28 0:00 \_ postgres: autovacuum launcher process pgdba 8003 0.0 0.0 26888 632 ? Ss 12:28 0:00 \_ postgres: archiver process pgdba 8004 0.0 0.0 27184 1240 ? Ss 12:28 0:00 \_ postgres: stats collector process
One less thing to worry about, and one less reason why starting Pg might fail. Of course it can still fail, if you'll configure shared_buffers larger than your actual memory size, but thats much less likely.
Has there been any fallout in the buildfarm as a result of this change?
@Colin: as far as I know – no.
When this feature was committed just after the 9.2 branch was cut, a part of me cried. 9.3 can’t get here any faster. This change makes it significantly easier to administrate hosts that run many clusters on the same server.
@Sean: The patch is quite small so you can backport the relevant commits quite easily.
Does this make it impossible to use huge pages in the future?
For Oracle, huge pages have a big performance impact once its SGA goes past ~ 8GB. I would imagine that for Postgres shared_buffers the same would apply. I noticed that Tom Lane tried to use huge pages a while back with 1GB and saw no benefit, but if Oracle is any guide, that is not large enough to see benefit from huge pages.
@Scott:
Such question should be asked on pgsql-hackers – I am, by far, not a specialist when it comes to internals, and how inner elements of Pg work.
Since most platforms also ulimit() the amount of mlock()able memory Pg presumably isn’t pinning shared_buffers in RAM, which is a really nice effect of this change.
This should make running multiple clusters on a single host a lot nicer than the current pain with clusters fighting over shm and wasting resources with all that pinned shared memory.
Huge pages make a big difference with high connection counts. If you are trying to map a 16GB buffer cache into 2000 processes using 4K pages it eats up to 64GB for page tables.