(disclaimer: all the data and examples in here are on Linux – the same data can be probably obtained on other systems too, it's just that I work on Linux and don't know other systems well).
This question pops occasionally in various places – PostgreSQL is using too much memory, why is that, and how can it be mitigated?
Before we can go to “optimizing", we should understand the problem. But do we? Both standard tools – ps and top – lie. How/why? Let's see.
I have a very simple PostgreSQL instance on my desktop, configured with 4 GB of shared buffers, and 100MB of work_mem. I did run some work, and now the ps shows:
=$ ps -u pgdba uf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND pgdba 32324 0.0 0.0 79788 1900 ? S 14:26 0:00 sshd: pgdba@pts/13 pgdba 32325 0.0 0.0 25844 5788 pts/13 Ss+ 14:26 0:00 \_ -bash pgdba 27502 0.0 0.8 4344112 109724 ? S 14:18 0:00 /home/pgdba/work/bin/postgres pgdba 27506 0.0 0.0 24792 620 ? Ss 14:18 0:00 \_ postgres: logger process pgdba 27508 1.5 34.7 4346688 4274752 ? Ss 14:18 0:14 \_ postgres: checkpointer process pgdba 27509 0.2 12.1 4346164 1495780 ? Ss 14:18 0:02 \_ postgres: writer process pgdba 27510 0.3 0.1 4346164 17292 ? Ss 14:18 0:03 \_ postgres: wal writer process pgdba 27511 0.0 0.0 4347168 2408 ? Ss 14:18 0:00 \_ postgres: autovacuum launcher process pgdba 27512 0.0 0.0 26888 856 ? Ss 14:18 0:00 \_ postgres: archiver process last was 00000001000000060000004D pgdba 27513 0.0 0.0 27184 1160 ? Ss 14:18 0:00 \_ postgres: stats collector process pgdba 27713 5.6 34.8 4347268 4285716 ? Ss 14:19 0:51 \_ postgres: depesz depesz [local] idle pgdba 27722 2.6 3.1 4347412 392704 ? Ss 14:19 0:23 \_ postgres: depesz depesz [local] idle pgdba 27726 15.8 35.0 4352560 4309776 ? Ss 14:19 2:25 \_ postgres: depesz depesz [local] idle
top shows basically the same numbers.
Which is suspicious, because “free" shows:
=$ free total used free shared buffers cached Mem: 12296140 12144356 151784 0 26828 10644460 -/+ buffers/cache: 1473068 10823072 Swap: 0 0 0
i.e – only 1.5GB of ram used (and ~ 10GB used as disk cache – ready to be freed if any app would need more ram).
So how come I see such huge numbers in PS?
First of all – we need to disregard VSZ column. The one that's important is RSS. But its still not really useful:
=$ ps -u pgdba o pid,rss:8,cmd | awk 'NR>1 {A+=$2} {print} END{print "Total RSS: " A}' PID RSS CMD 27502 109724 /home/pgdba/work/bin/postgres 27506 620 postgres: logger process 27508 4274752 postgres: checkpointer process 27509 1755420 postgres: writer process 27510 17292 postgres: wal writer process 27511 2408 postgres: autovacuum launcher process 27512 856 postgres: archiver process last was 00000001000000060000004D 27513 1160 postgres: stats collector process 27713 4285716 postgres: depesz depesz [local] idle 27722 392700 postgres: depesz depesz [local] idle 27726 4309776 postgres: depesz depesz [local] idle 32324 1900 sshd: pgdba@pts/13 32325 5788 -bash Total RSS: 15158112
So, this clocks at ~ 15GB, which is more than I have in this box. So, what is the real memory usage. As in: how much memory I would gain, if I'd kill Pg?
Luckily, we can view what exactly is the RAM being used for, thanks to wonderful world of /proc directory/filesystem/mountpoint/fairy.
Each process on Linux has a directory in /proc. In this directory there is a number of files and directories. Don't be fooled by reported size of the files – they all have “0" bytes, but they do contain information. It's magic.
The one file we're interested in is “smaps".
Content of it looks like this:
=$ sudo head -n 20 /proc/27713/smaps 00400000-00914000 r-xp 00000000 09:00 3545633 /home/pgdba/work/bin/postgres Size: 5200 kB Rss: 964 kB Pss: 214 kB Shared_Clean: 964 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 964 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB 00b13000-00b14000 r--p 00513000 09:00 3545633 /home/pgdba/work/bin/postgres Size: 4 kB Rss: 4 kB Pss: 0 kB Shared_Clean: 0 kB ...
For this particular process smaps has over 2000 lines, so I will not show you all of it.
So, anyway – this process – 27713 – according to ps, uses 4285716 kilobytes of RAM. So, what is it so big? Quick grep, and we see:
=$ sudo grep -B1 -E '^Size: *[0-9]{6}' /proc/27713/smaps 7fde8dacc000-7fdf952d6000 rw-s 00000000 00:04 232882235 /SYSV005a5501 (deleted) Size: 4317224 kB
There is only one “block" that is over 100MB of size, and it's size is very close to total size of process.
Full information about it:
7fde8dacc000-7fdf952d6000 rw-s 00000000 00:04 232882235 /SYSV005a5501 (deleted) Size: 4317224 kB Rss: 4280924 kB Pss: 1245734 kB Shared_Clean: 0 kB Shared_Dirty: 4280924 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 4280924 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB
Most of this information is more or less cryptic, but we see couple of things:
- it is shared memory (first line contains rw-s, where “s" is for shared)
- by the look of it (/SYSV… deleted) it looks like the shared memory is done using mmaping deleted file – so the memory will be in “Cached", and not “Used" columns in free output.
- Size of the shared block is 4317224, and 4280924 from it is actually resident in memory
That's ok – that's shared_buffers. But the thing is – shared buffers are used by most of the backends. And, what's worse, not always to the same extent. For example, the same shared buffers data from process 27722:
=$ sudo grep -A14 7fde8dacc000-7fdf952d6000 /proc/27722/smaps 7fde8dacc000-7fdf952d6000 rw-s 00000000 00:04 232882235 /SYSV005a5501 (deleted) Size: 4317224 kB Rss: 388652 kB Pss: 95756 kB Shared_Clean: 0 kB Shared_Dirty: 388652 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 388652 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB
In here we see that this process requested/used only 388MB of the memory.
So calculating will be complex. For exaple – we might have two processes, each using 400MB of shared_buffers, but it doesn't tell us how much memory it is actually using – because it could be that they are using 100MB of the same buffers, and 300MB of different – so in total the memory usage would be 700MB.
We do know that total size of this shared_buffers block is 4317224. Which is great. But what about other things? Libraries for example – they can be shared by kernel between multiple processes.
Luckily, in 2007 Fengguang Wu sent (and previously wrote) a very cool patch for kernel – which added “Pss" info to smaps.
Basically, Pss is at most Rss, but gets decreased if the same pages of memory are used by more than one process.
That's why Pss above was so much lower than Rss/Size. For example – in the last example. Rss was given 388652, but Pss just 95756, which means that most of the pages this backend used was also used by 3 other backends.
So, now – knowing about Pss, we can finally get real memory usage of a running pg cluster:
=$ ps -u pgdba o pid= | sed 's#.*#/proc/&/smaps#' | xargs sudo grep ^Pss: | awk '{A+=$2} END{print A}' 4329040
If you just said “WTF, WHAT DID HE RUN?!", let me explain. First command:
=$ ps -u pgdba o pid= 27502 ... 32325
just returns pids of pgdba user (usually you'll want postgres, but I'm different, and run PostgreSQL as pgdba).
Second – sed – changes the pids to paths to smaps files:
=$ ps -u pgdba o pid= | sed 's#.*#/proc/&/smaps#' /proc/27502/smaps ... /proc/32325/smaps
Then I do simple grep for ^Pss lines in the files given from sed. Which returns lots of lines like:
/proc/32325/smaps:Pss: 0 kB /proc/32325/smaps:Pss: 4 kB /proc/32325/smaps:Pss: 4 kB
And then the awk summarizes 2nd column (which is the size). And I get the number 4329040 – in kilobytes.
So, theoretically, if I'd stop Pg, I would reclaim that much of RAM. Let's see if it's true:
=$ free; pg_ctl -m IMMEDIATE stop; free total used free shared buffers cached Mem: 12296140 12145424 150716 0 40708 10640968 -/+ buffers/cache: 1463748 10832392 Swap: 0 0 0 waiting FOR server TO shut down.... done server stopped total used free shared buffers cached Mem: 12296140 7781960 4514180 0 40856 6325092 -/+ buffers/cache: 1416012 10880128 Swap: 0 0 0
Used memory dropped from 12145424 to 7781960 – meaning I got back 4363464 kB of RAM. Which is even a bit higher than expected 4329040, but it's close enough. And most of it came off disk cache – as expected, because it was used for shared_buffers.
This is all good, but can this method be used to estimate how much RAM I would reclaim by killing single backend?
No and yes. Shutting down whole Pg meant that the shared memory that it was using – could have been freed. In normal environments, when you kill backend – you end up freeing only the memory that was private to this backend. And this is usually depressingly low number.
For example, on another machine, with much more impressive hardware:
=> ps uxf | grep USER.db_name | sort -nk6 | tail -n 1 | tee >( cat - >&2) | awk '{system("cat /proc/"$2"/smaps")}' | grep ^Private | awk '{A+=$2} END{print A}' postgres 5278 8.2 0.3 107465132 1727408 ? Ss 13:21 0:03 \_ postgres: USER db_name aa.bbb.cc.dd(eeeee) idle 52580
I.e. – the process has 1.7GB of RSS (as visible in ps output), but only 52MB of it is Private memory that would be freed if it got killed.
So no – you can't use the Pss for this, but you can use Private_* data from smaps to get the number.
To summarize it all – PostgreSQL uses much less memory than it looks like at first sight, and while it is possible to get quite accurate numbers – you need to do some shell scripting to get them.
And now I brace myself for comments from the people that will point all technical mistakes in this post, or (what's worse) typos. 🙂
This is pretty advanced for my understanding, but it has helped me grasp a bit more about Linux and PostgreSQL, and for that, I’m thankful.
While my memory on the subject is a bit fuzzy and me being a lousy reader, I’m afraid I can’t point out any specific mistakes.
I’d strongly suggest you to try out “smem” tool (www.selenic.com/smem/) though, which provides same valuable info in a much more *ahem* user-friendly way, as well as providing great aggregations of it.
My version of ps aligns the output, which means that the command fails if the pids have a different number of digits. (It tries to access filenames such as “/proc/ 1792/smaps”)
I rewrote the sed as ‘s#[0-9]\+#/proc/&/smaps#’ which isn’t perfect but works
It’s worth to mention some tools which show process listings with better memory metrics than `ps`. Including `meminfo.py` http://koltsoff.com/pub/meminfo/ – here private memory size is shown as URES.
Very very good.
Excellent, with the Mike fix, the command functions better. smem works equals than the super command.
Aquí les dejo la línea final que funciona.
ps -u postgres o pid= | sed ‘s#[0-9]\+#/proc/&/smaps#’ | sort | xargs grep ^Pss: | awk ‘{A+=$2} END{print A}’
Saludos
ps -u postgres o pid= | awk {‘print $1’} |sed ‘s#.*#/proc/&/smaps#’ | xargs grep ^Pss: | awk ‘{A+=$2} END{print A}’
Remarkable! Ӏtѕ in fаct awesome paragraph, Ⅰ һave ցot mujch ϲlear ifea about from tҺiѕ piece ⲟf writing.
Ηere іs my web-site … nuskin price list
Very Good.
But… When PID have only 4 decimal or less the command that list Pss not return correct value.
example return:
/proc/ 1057/smaps
/proc/ 1176/smaps
/proc/10351/smaps
/proc/10352/smaps
I only add:
sed ‘s/ //g#/proc/&/smaps#’
Complete command:
ps -u postgres o pid= | sed ‘s#.*#/proc/&/smaps#’ | sed ‘s/ //g#/proc/&/smaps#’ | xargs grep ^Pss: | awk ‘{A+=$2} END{print A}’
Is this still relevant as of today?
This was a great great great post! Finally understand whats going on with my memory!
ps -u postgres o pid= | sed ‘s#.*#/proc/&/smaps#’ | xargs sudo grep ^Pss: | awk ‘{A+=$2} END{print A}’
I am getting below error after executing above unix command. Please help me.
grep: 9873/smaps: No such file or directory
1306039
ps -u postgres o pid= | sed -r ‘s/\s+//g’ | sed ‘s#.*#/proc/&/smaps#’ | xargs sudo grep ^Pss: | awk ‘{A+=$2} END{print A}’
Very very interesting numbers. Initially got kicked but after reading it twice got it’s own meaning. Thank uu soo much