←back to thread

804 points jryio | 1 comments | | HN request time: 0s | source
Show context
speedgoose ◴[] No.45661785[source]
Looking at the htop screenshot, I notice the lack of swap. You may want to enable earlyoom, so your whole server doesn't go down when a service goes bananas. The Linux Kernel OOM killer is often a bit too late to trigger.

You can also enable zram to compress ram, so you can over-provision like the pros'. A lot of long-running software leaks memory that compresses pretty well.

Here is how I do it on my Hetzner bare-metal servers using Ansible: https://gist.github.com/fungiboletus/794a265cc186e79cd5eb2fe... It also works on VMs.

replies(15): >>45661833 #>>45662183 #>>45662569 #>>45662628 #>>45662841 #>>45662895 #>>45663091 #>>45664508 #>>45665044 #>>45665086 #>>45665226 #>>45666389 #>>45666833 #>>45673327 #>>45677907 #
levkk ◴[] No.45662183[source]
Yeah, no way. As soon as you hit swap, _most_ apps are going to have a bad, bad time. This is well known, so much so that all EC2 instances in AWS disable it by default. Sure, they want to sell you more RAM, but it's also just true that swap doesn't work for today's expectations.

Maybe back in the 90s, it was okay to wait 2-3 seconds for a button click, but today we just assume the thing is dead and reboot.

replies(16): >>45662314 #>>45662349 #>>45662398 #>>45662411 #>>45662419 #>>45662472 #>>45662588 #>>45663055 #>>45663460 #>>45664054 #>>45664170 #>>45664389 #>>45664461 #>>45666199 #>>45667250 #>>45668533 #
bayindirh ◴[] No.45662411[source]
This is a wrong belief because a) SSDs make swap almost invisible, so you can have that escape ramp if something goes wrong b) SWAP space is not solely an escape ramp which RAM overflows into anymore.

In the age of microservices and cattle servers, reboot/reinstall might be cheap, but in the long run it is not. A long running server, albeit being cattle, is always a better solution because esp. with some excess RAM, the server "warms up" with all hot data cached and will be a low latency unit in your fleet, given you pay the required attention to your software development and service configuration.

Secondly, Kernel swaps out unused pages to SWAP, relieving pressure from RAM. So, SWAP is often used even if you fill 1% of your RAM. This allows for more hot data to be cached, allowing better resource utilization and performance in the long run.

So, eff it, we ball is never a good system administration strategy. Even if everything is ephemeral and can be rebooted in three seconds.

Sure, some things like Kubernetes forces "no SWAP, period" policies because it kills pods when pressure exceeds some value, but for more traditional setups, it's still valuable.

replies(8): >>45662537 #>>45662599 #>>45662646 #>>45662687 #>>45663237 #>>45663354 #>>45664553 #>>45664705 #
db48x ◴[] No.45664705[source]
This is not really true of most SSDs. When Linux is really thrashing the swap it’ll be essentially unusable unless the disk is _really_ fast. Fast enough SSDs are available though. Note that when it’s really thrashing the swap the workload is 100% random 4KB reads and writes in equal quantities. Many SSDs have high read speeds and high write speeds but have much worse performance under mixed workloads.

I once used an Intel Optane drive as swap for a job that needed hundreds of gigabytes of ram (in a computer that maxed out at 64 gigs). The latency was so low that even while the task was running the machine was almost perfectly usable; in fact I could almost watch videos without dropping frames at the same time.

replies(3): >>45665615 #>>45668643 #>>45668647 #
ValdikSS ◴[] No.45668647{3}[source]
It's fixed since Kernel 6.1 + MGLRU, see above, or read this: https://notes.valdikss.org.ru/linux-for-old-pc-from-2007/en/...
replies(1): >>45670562 #
webstrand ◴[] No.45670562{4}[source]
Do you know how the le9 patch compares to mg_lru? The latter applies to all memory, not just files as far as I can tell. The former might still be useful in preventing eager OOM while still keeping executable file-backed pages in memory?
replies(1): >>45675774 #
1. ValdikSS ◴[] No.45675774{5}[source]
le9 is a 'simple' method to keep the fixed amount of the page cache. It works exceptionally well for what it is, but it requires manual tuning of the amount of cache in MB.

MGLRU is basically a smarter version of already existing eviction algorithm, with evicts (or keeps) both page cache and anon pages, and combined with min_ttl_ms it tries to keep current active page cache for a specified amount of time. It still takes into account swappiness and does not operate on a fixed amount of page cache, unlike le9.

Both are effective in trashing prevention, both are different. MGLRU, especially with higher min_ttl_ms, could cause OOM killer more frequently than you'd like it to be called. I find le9 more effective for desktop use on old low-end machines, but that's only because it just keeps the (large/er amounts of) page cache. It's not very preferable for embedded systems for example.