←back to thread

804 points jryio | 1 comments | | HN request time: 0s | source
Show context
speedgoose ◴[] No.45661785[source]
Looking at the htop screenshot, I notice the lack of swap. You may want to enable earlyoom, so your whole server doesn't go down when a service goes bananas. The Linux Kernel OOM killer is often a bit too late to trigger.

You can also enable zram to compress ram, so you can over-provision like the pros'. A lot of long-running software leaks memory that compresses pretty well.

Here is how I do it on my Hetzner bare-metal servers using Ansible: https://gist.github.com/fungiboletus/794a265cc186e79cd5eb2fe... It also works on VMs.

replies(15): >>45661833 #>>45662183 #>>45662569 #>>45662628 #>>45662841 #>>45662895 #>>45663091 #>>45664508 #>>45665044 #>>45665086 #>>45665226 #>>45666389 #>>45666833 #>>45673327 #>>45677907 #
TheDong ◴[] No.45664508[source]
Even better than earlyoom is systemd-oomd[0] or oomd[1].

systemd-oomd and oomd use the kernel's PSI[2] information which makes them more efficient and responsive, while earlyoom is just polling.

earlyoom keeps getting suggested, even though we have PSI now, just because people are used to using it and recommending it from back before the kernel had cgroups v2.

[0]: https://www.freedesktop.org/software/systemd/man/latest/syst...

[1]: https://github.com/facebookincubator/oomd

[2]: https://docs.kernel.org/accounting/psi.html

replies(3): >>45664907 #>>45665996 #>>45666462 #
CGamesPlay ◴[] No.45664907[source]
"earlyoom is just polling"?

> systemd-oomd periodically polls PSI statistics for the system and those cgroups to decide when to take action.

It's unclear if the docs for systemd-oomd are incorrect or misleading; I do see from the kernel.org link that the recommended usage pattern is to use the `poll` system call, which in this context would mean "not polling", if I understand correctly.

replies(2): >>45664986 #>>45665579 #
100721 ◴[] No.45664986[source]
Unrelated to the topic, it seems awfully unintuitive to name a function ‘poll’ if the result is ‘not polling.’ I’m guessing there’s some history and maybe backwards-compatible rewrites?
replies(3): >>45665384 #>>45665391 #>>45665401 #
1. unilynx ◴[] No.45665384[source]
Poll takes a timeout parameter. ‘Not polling’ is just a really long timeout