The OOM killer **never** in the twelve years I've used Linux has triggered befor...

segfaultbuserr · on July 3, 2019

Pro-Tip: You can use Alt + SysRq + F to trigger the OOM-killer action immediately. It helps me avoiding pulling the plug of my desktops on multiple occasions over the years when I accidentally start RAM-eating programs. Just make sure SysRq is enabled in sysctl.

emmelaich · on July 3, 2019

When the machine is busy -- and even when it's not, it's very uncertain thing to try sys-rq combinations.

segfaultbuserr · on July 3, 2019

Yes, it's risky. But when the machine is swapping to death and pulling the plug is the only option, SysRq is a better alternative.

colechristensen · on July 3, 2019

In ops, I've seen this happen very many times. A linux server running happy and free because the kernel OOM killer murdered the reason that server existed, leaving alone some side process with a memory leak (usually some external service agent or maintenance service run amok). I learned well how to fix that over the years.

(spoiler alert: it was much more often about becoming ornery when devs insisted that their JVM app could have a 12 GB heap on a machine with 12 GB of memory)

mshook · on July 3, 2019

That's pretty much why we use vm.panic_on_oom = 1 as you have no idea about what's going to be killed...

colechristensen · on July 3, 2019

I am in the camp of having systemd or your service manager of choice restart any killed service or service failing health check and emitting an ERROR or higher log message.

I want my systems to heal themselves. More often than not these memory problems end up being slow leaks which can be effectively permanently resolved with periodic restarts, and the engineering time to fix them is appropriately not prioritized.

I want to know that there has been a problem but I would rather not be forced to do anything about it unless absolutely necessary.

eikenberry · on July 3, 2019

That is probably being cause by your use of swap space, not an OOM issue. I've had multiple cases of the OOM killer kicking off on my system, all without it slowing way down.

rcxdude · on July 3, 2019

Lacking swap space causes more severe symptoms in an OOM situation, not less, from my experience. I think this is because everything that can get evicted from RAM is before the OOM killer gets invoked, which means every disk access slows to a crawl.

Avamander · on July 3, 2019

No. This happens even without swap, OOM killer might as well not exist.

lulouie · on July 3, 2019

Yep, when Chromium ate up all the memory, it just hang the whole OS, waiting for OOM killer about 10+ min, then the cursor can be moved again, then freeze again...

olliej · on July 3, 2019

If you want hilarious fun: make a gl shader than takes ~30 seconds to run. GPUs are only very recently preemptable (if they are at all yet? I lose track of what is “planned” vs released).

Make it run that shader in a loop.

See how well your system appears to respond.

IIRC macOS has a 60s or something watchdog the hard resets the GPU, while the gpu is hung the screen is not updated. Everything is running fine, cpu isn’t pinned or anything, but the gpu is blocked so no compositing, and so no screen updating.

I’m not sure what Linux does in that case, and I think windows may be able to paint because the directx driver interfaces let it do ... something? I’ve always assume some way to dma straight to the framebuffer, but no real idea :)

pas · on July 3, 2019

https://lwn.net/Articles/759781/ already merged in 4.20.

https://github.com/facebookincubator/oomd/blob/master/README...

and here's a less complicated, but similar proactive daemon:

https://github.com/rfjakob/earlyoom/blob/master/README.md

Avamander · on July 3, 2019

I'm on 5.0.0 and I legit haven't noticed a difference. If I run out of RAM without swap the system freezes and if I have swap then the system freezes when both are full. The only reliable solution is having a RAM+swap usage graph on my screen at all times and then closing stuff manually.

pas · on July 3, 2019

You probably won't until the distro you use (or you manually) set up something other than the default OOM killer.

You probably have too much swap. More than ~10 sec * your I/O speed (so let's say 512M-1G) is probably the max for the reasons you mentioned.

Avamander · on July 4, 2019

The system freezes also without any swap enabled but much more suddenly (there's no slowdown before dying). It's really just that the OOM killer triggers way way way too late.

AnIdiotOnTheNet · on July 3, 2019

The OOM killer is a terrible hack that no self respecting system should have ever employed.

olliej · on July 3, 2019

The OOM killer is a “solution” to a very real, and sensible design choice: not committing physical memory and swap whenever address space is mapped - there are very good (and noticeable) reasons for not eagerly committing, but fundamentally if you have done so you have to decide what to do when you end up needing more physical space than is available.

Linux went down the “if a process is trying to do this, it must be important so I’ll prioritise it and kill something else”, and alternative is to kill that process when the commit fails.

Either is a valid option, the OOM killer ran against a regular desktop user’s idea of what is the correct course of action, but for a server it might not have been.

pjmlp · on July 3, 2019

Problem is that something else might be actually performing a critical task.

olliej · on July 3, 2019

The assumption being made is that the app going mad for memory is the critical one. Something has to die, and deciding which is hard.

Symbiote · on July 3, 2019

It triggered on one of my systems yesterday, and killed the runaway process.

When we were running tests of a new distributed system on our development (slightly underspecced) cluster, it would kill the distributed system processes when they took too much RAM.

As other write, having slow or "too much" swap can delay the OOM killer from running in reasonable time.

viraptor · on July 3, 2019

Sounds like you're starting to swap heavily. Adjusting swappiness to 0 may help there.

Avamander · on July 3, 2019

This also happens without swap.