←back to thread

804 points jryio | 4 comments | | HN request time: 0.001s | source
Show context
speedgoose ◴[] No.45661785[source]
Looking at the htop screenshot, I notice the lack of swap. You may want to enable earlyoom, so your whole server doesn't go down when a service goes bananas. The Linux Kernel OOM killer is often a bit too late to trigger.

You can also enable zram to compress ram, so you can over-provision like the pros'. A lot of long-running software leaks memory that compresses pretty well.

Here is how I do it on my Hetzner bare-metal servers using Ansible: https://gist.github.com/fungiboletus/794a265cc186e79cd5eb2fe... It also works on VMs.

replies(15): >>45661833 #>>45662183 #>>45662569 #>>45662628 #>>45662841 #>>45662895 #>>45663091 #>>45664508 #>>45665044 #>>45665086 #>>45665226 #>>45666389 #>>45666833 #>>45673327 #>>45677907 #
levkk ◴[] No.45662183[source]
Yeah, no way. As soon as you hit swap, _most_ apps are going to have a bad, bad time. This is well known, so much so that all EC2 instances in AWS disable it by default. Sure, they want to sell you more RAM, but it's also just true that swap doesn't work for today's expectations.

Maybe back in the 90s, it was okay to wait 2-3 seconds for a button click, but today we just assume the thing is dead and reboot.

replies(16): >>45662314 #>>45662349 #>>45662398 #>>45662411 #>>45662419 #>>45662472 #>>45662588 #>>45663055 #>>45663460 #>>45664054 #>>45664170 #>>45664389 #>>45664461 #>>45666199 #>>45667250 #>>45668533 #
bayindirh ◴[] No.45662411[source]
This is a wrong belief because a) SSDs make swap almost invisible, so you can have that escape ramp if something goes wrong b) SWAP space is not solely an escape ramp which RAM overflows into anymore.

In the age of microservices and cattle servers, reboot/reinstall might be cheap, but in the long run it is not. A long running server, albeit being cattle, is always a better solution because esp. with some excess RAM, the server "warms up" with all hot data cached and will be a low latency unit in your fleet, given you pay the required attention to your software development and service configuration.

Secondly, Kernel swaps out unused pages to SWAP, relieving pressure from RAM. So, SWAP is often used even if you fill 1% of your RAM. This allows for more hot data to be cached, allowing better resource utilization and performance in the long run.

So, eff it, we ball is never a good system administration strategy. Even if everything is ephemeral and can be rebooted in three seconds.

Sure, some things like Kubernetes forces "no SWAP, period" policies because it kills pods when pressure exceeds some value, but for more traditional setups, it's still valuable.

replies(8): >>45662537 #>>45662599 #>>45662646 #>>45662687 #>>45663237 #>>45663354 #>>45664553 #>>45664705 #
adastra22 ◴[] No.45662646[source]
What pressure? If your ram is underutilized, what pressure are you talking about?

If the slowest drive on the machine is the SSD, how does caching to swap help?

replies(2): >>45662707 #>>45662734 #
bayindirh ◴[] No.45662707[source]
A long running Linux system uses 100% of its RAM. Every byte unused for applications will be used as a disk cache, given you read more data than your total RAM amount.

This cache is evictable, but it'll be there eventually.

Linux used to don't touch unused pages in the RAM in the older days if your RAM was not under pressure, but now it swaps out pages unused for a long time. This allows more cache space in RAM.

> how does caching to swap help?

I think I failed to convey what I tried to say. Let me retry:

Kernel doesn't cache to SSD. It swaps out unused (not accessed) but unevictable pages to SWAP, assuming that these pages will stay stale for a very long time, allowing more RAM to be used as cache.

When I look to my desktop system, in 12 days, Kernel moved 2592MB of my RAM to SWAP despite having ~20GB of free space. ~15GB of this free space is used as disk cache.

So, to have 2.5GB more disk cache, Kernel moved 2592 MB of non-accessed pages to SWAP.

replies(3): >>45662776 #>>45663196 #>>45667848 #
adastra22 ◴[] No.45663196{4}[source]
Yes, and if I am writing an API service, for example, I don’t want to suddenly add latency because I hit pages that have been swapped out. I want guarantees about my API call latency variance, at least when the server isn’t overloaded.

I DON’T WANT THE KERNEL PRIORITIZING CACHE OVER NRU PAGES.

The easiest way to do this is to disable swap.

replies(6): >>45663291 #>>45663295 #>>45664809 #>>45665015 #>>45667197 #>>45667278 #
1. dwattttt ◴[] No.45667278{5}[source]
If you're getting this far into the details of your memory usage, shouldn't you use mlock to actually lock in the parts of memory you need to stay there? Then you get to have three tiers of priority: pages you never want swapped, cache, then pages that haven't been used recently.
replies(1): >>45669131 #
2. pdimitar ◴[] No.45669131[source]
Can mlock be instructed to f.ex. "never swap pages from this pid"?
replies(1): >>45669173 #
3. bayindirh ◴[] No.45669173[source]
The application requests this itself from the Kernel. See https://man7.org/linux/man-pages/man2/mlock.2.html
replies(1): >>45674932 #
4. dwattttt ◴[] No.45674932{3}[source]
From the link, mlockall with MCL_CURRENT | MCL_FUTURE

> Lock all pages which are currently mapped into the address space of the process.

> Lock all pages which will become mapped into the address space of the process in the future.