oom-killer or wt… ?!

About

The following is describing a specific situation when building Android (Android 10 / Android Q) but it is just an example of what happens when the OOM (Out Of Memory) killer steps in.
I saw the same behavior on machines especially on splunk> Heavy Forwarders as well and it is a generic issue and over all not necessarily has anything to do with free RAM at all.

So what happened? Your application (or Android build) just got killed and you have no idea why? No errors in the application/build logs? Well the very first thing is checking the system log then, i.e. on systemd systems journalctl and on non-systemd systems either in /var/log/(syslog|messages|..) or by executing the command dmesg .

Here an example of where soong_ui (a process started by android building) invoked the oom-killer and killed its child javac even though there are 15 GB of free RAM at that time:

Sep 26 13:06:55 kernel: Node 0 DMA free:15904kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
....
Sep 26 13:06:55 kernel: Out of memory: Kill process 29494 (javac) score 117 or sacrifice child
Sep 26 13:06:55 kernel: Killed process 29494 (javac) total-vm:10744744kB, anon-rss:4341072kB, file-rss:0kB
Sep 26 13:06:55 kernel: soong_ui invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0

So what? We had enough free RAM but oom-killer kills our javac process. Why? What is that oom-killer at all?

The oom-killer is implemented in the Linux kernel and “has one simple task; check if there is enough available memory to satisfy, verify that the system is truely out of memory and if so, select a process to kill.” [1]

That means the oom-killer founds that the above javac process needs to be killed because the system is out of memory .. even though we know already it was not?!

When does the oom-killer <actually> steps in?

Well it’s not that simple, the oom-killer does not check free memory and just kills a process, instead he walks through the following checklist to decide if he needs to kill [1]:

– Is there enough swap space left (nr_swap_pages > 0) ? If yes, not OOM
– Has it been more than 5 seconds since the last failure? If yes, not OOM
– Have we failed within the last second? If no, not OOM
– If there hasn’t been 10 failures at least in the last 5 seconds, we’re not OOM
– Has a process been killed within the last 5 seconds? If yes, not OOM

So there are multiple things getting checked before actually killing a process. That also means one of these conditions were not met and so the process actually got killed. There is one thing directly jumped into my eye when reading the above: SWAP. My systems does have a lot of RAM and so I generally set the swappiness factor to e.g. 10 or even 0 (sysctl vm.swappiness=10 | 0) as I disable any swap partitions anyways (or having them very small like 1 GB). As you can see you SWAP space is also shown in the same log when this happens:

Sep 26 13:06:55 kernel: 2664701 total pagecache pages
Sep 26 13:06:55 kernel: 0 pages in swap cache
Sep 26 13:06:55 kernel: Swap cache stats: add 945682, delete 945682, find 274265/303575
Sep 26 13:06:55 kernel: Free swap = 0kB
Sep 26 13:06:55 kernel: Total swap = 0kB

That made me curious and I started to compare my machine with others to find out in which constellation it might work or not. Does SWAP has anything to do with that?

Comparison results

So during my troubleshooting I tested several constellations and grabbed information from other users to identify what happens here in that specific case of android building. Again android building is just an example the same should apply to any kind of application getting killed by the oom-killer.

The following constellation failed:

PC-1: kernel 4.4.0-178 => no swap, 36 GB RAM (Ubuntu 16.04)
PC-2: kernel 5.8.10      => no swap, 10 GB RAM (Arch + docker: running an Ubuntu 20.04 Docker Image – within docker no swap)
– PC-2: kernel 5.8.10      => no swap, 12 GB RAM (Arch native)
PC-5: kernel 5.8.11      => no swap, 16 GB RAM (Fedora 32)

The following constellation worked:

PC-1: kernel 4.4.0-178 => 5 GB swap, 36 GB RAM (Ubuntu 16.04)
PC-2: kernel 5.8.10      => 6 GB swap, 12 GB RAM (Arch native – no docker)
PC-3: kernel 5.4.0-48   => 5 GB swap, 32 GB RAM (Ubuntu, 18.04)
PC-4: kernel 5.4.0-48   => no swap, 32 GB RAM (Ubuntu, 20.04.1)
PC-5: kernel 5.8.11      => 16 GB swap, 16 GB RAM (Fedora 32)

Just as a site note: the above constellations were all made with and without using ccache (USE_CCACHE=1 + CCACHE_EXEC set properly) which made (not surprisingly) no difference.

Reviewing the results / first conclusions

As you can see there is one thing very clear: without SWAP there is a a high chance of oom-killer stepping in. On all systems which had SWAP enabled the oom-killer was not killing any process. Looking back to the checklist above you will find at the very first task:

– Is there enough swap space left (nr_swap_pages > 0) ? If yes, not OOM

If you do not have any / or much swap space oom-killer will never give a “yes” here so always kill!! even when there is enough of free RAM. OOM = OMG..
But there is more: look at PC-1 and PC-4 above, PC-1 has even more RAM then PC-4 and both when executing the same tasks the one with more RAM failing. Both have no swap but they have different kernel versions!

so lets take a look about oom-kill in kernel v4.4: commits
and for kernel v5.4: commits
and for a simple comparison of oom-killer between these 2 versions: gist
or TL;DR: there were a LOT of changes between these kernel versions regarding the oom-killing process.

So we can assume that issues will very likely and at least more often occur:

– on kernel versions other then 5.4 (5.5 – 5.7 untested, 5.8 has same issues then pre-5.4, 5.9 untested)
– when no SWAP is in use on other kernels then 5.4
– just using higher kernel versions is no guarantee though for no oom-kills (e.g. kernel 5.8 has additional oom commits which might causing what we see in the results above)

Even though it sounds hard but my personal opinion on that: it is a kernel bug. .. or better said: The assumption that everyone has SWAP memory is simply outdated. The oom-killer implementation was made at a time where the average user did not had plenty of RAM available and so SWAP was absolutely required to do heavy tasks. .. as we all know that has changed, we have 2020 and RAM is like getting a chewing gum at a kiosk. cheap, fast and much of them is available. That also means as long as that is not fixed on kernel level there is not much we can do other then:

Conclusion / TL;DR

It seems to be that there are 2 valid conditions which we can assume to be safe:

  1. use kernel 5.4
  2. on any other kernel (or if you still get incorrect oom-kills on 5.4) enable SWAP (keep swappiness factor in a way that it fits your needs)

and for the crazy ones out there ;) I have to enable SWAP as I can’t easily upgrade the kernel on my main server. So I use SWAP as a file – which .. gets stored on a RAM disk.. ?! Did you get the clue? ;) (note: ofc that requires a LOT of free RAM so don’t do it if you don’t have it)

Have fun!

[1][Documentation]: https://www.kernel.org/doc/gorman/html/understand