Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The OOM killer never in the twelve years I've used Linux has triggered before my system grinds to a halt and never recovers. This problem has not been solved.


Pro-Tip: You can use Alt + SysRq + F to trigger the OOM-killer action immediately. It helps me avoiding pulling the plug of my desktops on multiple occasions over the years when I accidentally start RAM-eating programs. Just make sure SysRq is enabled in sysctl.


When the machine is busy -- and even when it's not, it's very uncertain thing to try sys-rq combinations.


Yes, it's risky. But when the machine is swapping to death and pulling the plug is the only option, SysRq is a better alternative.


In ops, I've seen this happen very many times. A linux server running happy and free because the kernel OOM killer murdered the reason that server existed, leaving alone some side process with a memory leak (usually some external service agent or maintenance service run amok). I learned well how to fix that over the years.

(spoiler alert: it was much more often about becoming ornery when devs insisted that their JVM app could have a 12 GB heap on a machine with 12 GB of memory)


That's pretty much why we use vm.panic_on_oom = 1 as you have no idea about what's going to be killed...


I am in the camp of having systemd or your service manager of choice restart any killed service or service failing health check and emitting an ERROR or higher log message.

I want my systems to heal themselves. More often than not these memory problems end up being slow leaks which can be effectively permanently resolved with periodic restarts, and the engineering time to fix them is appropriately not prioritized.

I want to know that there has been a problem but I would rather not be forced to do anything about it unless absolutely necessary.


That is probably being cause by your use of swap space, not an OOM issue. I've had multiple cases of the OOM killer kicking off on my system, all without it slowing way down.


Lacking swap space causes more severe symptoms in an OOM situation, not less, from my experience. I think this is because everything that can get evicted from RAM is before the OOM killer gets invoked, which means every disk access slows to a crawl.


No. This happens even without swap, OOM killer might as well not exist.


Yep, when Chromium ate up all the memory, it just hang the whole OS, waiting for OOM killer about 10+ min, then the cursor can be moved again, then freeze again...


If you want hilarious fun: make a gl shader than takes ~30 seconds to run. GPUs are only very recently preemptable (if they are at all yet? I lose track of what is “planned” vs released).

Make it run that shader in a loop.

See how well your system appears to respond.

IIRC macOS has a 60s or something watchdog the hard resets the GPU, while the gpu is hung the screen is not updated. Everything is running fine, cpu isn’t pinned or anything, but the gpu is blocked so no compositing, and so no screen updating.

I’m not sure what Linux does in that case, and I think windows may be able to paint because the directx driver interfaces let it do ... something? I’ve always assume some way to dma straight to the framebuffer, but no real idea :)



I'm on 5.0.0 and I legit haven't noticed a difference. If I run out of RAM without swap the system freezes and if I have swap then the system freezes when both are full. The only reliable solution is having a RAM+swap usage graph on my screen at all times and then closing stuff manually.


You probably won't until the distro you use (or you manually) set up something other than the default OOM killer.

You probably have too much swap. More than ~10 sec * your I/O speed (so let's say 512M-1G) is probably the max for the reasons you mentioned.


The system freezes also without any swap enabled but much more suddenly (there's no slowdown before dying). It's really just that the OOM killer triggers way way way too late.


The OOM killer is a terrible hack that no self respecting system should have ever employed.


The OOM killer is a “solution” to a very real, and sensible design choice: not committing physical memory and swap whenever address space is mapped - there are very good (and noticeable) reasons for not eagerly committing, but fundamentally if you have done so you have to decide what to do when you end up needing more physical space than is available.

Linux went down the “if a process is trying to do this, it must be important so I’ll prioritise it and kill something else”, and alternative is to kill that process when the commit fails.

Either is a valid option, the OOM killer ran against a regular desktop user’s idea of what is the correct course of action, but for a server it might not have been.


Problem is that something else might be actually performing a critical task.


The assumption being made is that the app going mad for memory is the critical one. Something has to die, and deciding which is hard.


It triggered on one of my systems yesterday, and killed the runaway process.

When we were running tests of a new distributed system on our development (slightly underspecced) cluster, it would kill the distributed system processes when they took too much RAM.

As other write, having slow or "too much" swap can delay the OOM killer from running in reasonable time.


Sounds like you're starting to swap heavily. Adjusting swappiness to 0 may help there.


This also happens without swap.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: