Most active commenters
  • marginalia_nu(3)
  • throw0101c(3)

←back to thread

150 points shaunpud | 39 comments | | HN request time: 0.912s | source | bottom
1. nrdvana ◴[] No.45060203[source]
The third mitigating feature the article forgot to mention is that tmpfs can get paged out to the swap partition. If you drop a large file there and forget it, it will all end up in the swap partition if applications are demanding more memory.
replies(3): >>45060224 #>>45060756 #>>45061403 #
2. buckle8017 ◴[] No.45060224[source]
Which is a great reason to have a big swap file now.
replies(2): >>45060524 #>>45060578 #
3. gnyman ◴[] No.45060524[source]
Note though that if you don't have swap now, and enable it, you introduce the risk of thrashing [1]

If you have swap already it doesn't matter, but I've encountered enough thrashing that I now disable swap on almost all servers I work with.

It's rare but when it happens the server usually becomes completely unresponsive, so you have to hard reset it. I'd rather that the application trying to use too much memory is killed by the oom manager and I can ssh in and fix that.

[1] https://docs.redhat.com/en/documentation/red_hat_enterprise_...

replies(4): >>45060599 #>>45060656 #>>45060800 #>>45061646 #
4. ◴[] No.45060578[source]
5. baq ◴[] No.45060599{3}[source]
This is why I’m running with overcommit 2 and a different ratio per server purpose.

…though I’m not sure why we have to think about this in 2025 at all.

replies(1): >>45060864 #
6. k_bx ◴[] No.45060656{3}[source]
Disabling swap on servers is de-facto standard for serious deployments.

The swap story needs a serious upgrade. I think /tmp in memory is a great idea, but I also think that particular /tmp needs a swap support (ideally with compression, ZSWAP), but not the main system.

replies(3): >>45060682 #>>45062059 #>>45062962 #
7. ravetcofx ◴[] No.45060682{4}[source]
Swap always seemed more meant for desktop use. Servers you need to give the real memory expected of the application stack.
replies(2): >>45060812 #>>45060866 #
8. m463 ◴[] No.45060756[source]
what swap partition?

I meant this sort of jokingly. I think have a few linux systems that were never configured with swap partitions or swapfiles.

replies(1): >>45060793 #
9. edoceo ◴[] No.45060793[source]
I'm with you. I don't swap. Processes die. OOM. Linux can recover and not lose data. Just unavailable for a moment.
replies(3): >>45060941 #>>45061182 #>>45062028 #
10. ◴[] No.45060800{3}[source]
11. someothherguyy ◴[] No.45060812{5}[source]
plenty of footguns in that general advice, local in memory storage services with default config, etc
12. worthless-trash ◴[] No.45060864{4}[source]
I'm assuming that you monitor the service closely for OOM then adjust with demand ?
replies(1): >>45061818 #
13. finaard ◴[] No.45060866{5}[source]
Pretty much all the guidelines about swap partitions out there reference old allocator behaviour from way over a decade ago - where you'd indeed typically run into weird issues without having a swap partition, even if you had enough RAM.

Short (and inaccurate) summary was that it'd try to use some swap even if it didn't need it yet, which made sense in the world of enough memory being too expensive, and got fixed at the cost of making the allocator way more complicated when we started having enough memory in most cases.

Nowadays typically you don't need swap unless you work on a product with some constraints, in which case you'd hand tune low memory performance anyway. Just don't buy anything with less than 32GB, and you should be good.

14. Balinares ◴[] No.45060941{3}[source]
Swapping still occurs regardless. If there is no swap space the kernel swaps out code pages instead. So, running programs. The code pages then need to be loaded again from disk when the corresponding process is next scheduled and needs them.

This is not very efficient and is why a bit of actual swap space is generally recommended.

replies(1): >>45062376 #
15. marginalia_nu ◴[] No.45061182{3}[source]
The Linux OOM killer is kinda sketchy to rely on. It likes to freeze up your system for long periods of time as it works out how to resolve the issue. Then it starts killing random PIDs to try to reclaim RAM like a system wide russian roulette.

It's especially janky when you don't have swap. I've found adding a small swap file of ~500 MB makes it work so much better, even for systems with half a terabyte of RAM this helps reduce the freezing issues.

replies(2): >>45061255 #>>45061577 #
16. wahern ◴[] No.45061255{4}[source]
Yeah. I always disable overcommit (notwithstanding that Linux cannot provide perfectly accurate strict memory accounting), and I'd prefer not to use swap, but Linux VM maintainers have consistently stated that they've designed and tuned the VM subsystem with swap in mind. Is swap necessary in the abstract? No. Is swap necessary on Linux? No. But don't be surprised if Linux doesn't do what you'd expect in the absence of swap, and don't expect Linux to put much if any effort into improving performance in the absence of swap.

I've never ran into trouble on my personal servers, but I've worked at places that have, especially when running applications that tax the VM subsystem, e.g. the JVM and big Java apps. If one wonders why swap would be useful even if applications never allocate, even in the aggregate, more anonymous memory than system RAM, one of the reasons is the interaction with the buffer cache and eviction under pressure.

replies(1): >>45062908 #
17. guappa ◴[] No.45061403[source]
Fedora did this long before debian. I remember doing wget of an .iso file on /tmp and my entire wayland session being killed by the OOM killer.

I still think it's a terrible idea.

replies(1): >>45061462 #
18. nolist_policy ◴[] No.45061462[source]
Use `/var/tmp` of you want a disk backed tmp.
replies(1): >>45061925 #
19. mnw21cam ◴[] No.45061577{4}[source]
Install earlyoom or one of its near-equivalents. That mostly solves the problem of it freezing up the system for long periods of time.

I haven't personally seen the OOM killer kill unproductively - usually it kills either a runaway culprit or something that will actually free up enough space to help.

For your "even for systems with half a terabyte of RAM", it is logical that the larger the system, the worse this behaviour is, because when things go sideways there is a lot more stuff to sort out and that takes longer. My work server has 1.5TB of RAM, and an OOM event before I installed earlyoom was not pretty at all.

replies(4): >>45061814 #>>45062169 #>>45062516 #>>45064246 #
20. mnw21cam ◴[] No.45061646{3}[source]
That's not true. Without swap, you already have the risk of thrashing. This is because Linux views all segments of code which your processes are running as clean and evictable from the cache, and therefore basically equivalent to swap, even when you have no swap. Under low-memory conditions, Linux will happily evict all clean pages, including the ones that the next process to be scheduled needs to execute from, causing thrashing. You can still get an unresponsive server under low memory conditions due to thrashing with no swap.

Setting swappiness to zero doesn't fix this. Disabling swap doesn't fix this. Disabling overcommit does fix this, but that might have unacceptable disadvantages if some of the processes you are running allocate much more RAM than they use. Installing earlyoom to prevent real low memory conditions does fix this, and is probably the best solution.

21. marginalia_nu ◴[] No.45061814{5}[source]
> For your "even for systems with half a terabyte of RAM", it is logical that the larger the system, the worse this behaviour is, because when things go sideways there is a lot more stuff to sort out and that takes longer. My work server has 1.5TB of RAM, and an OOM event before I installed earlyoom was not pretty at all.

I meant it more in the sense that it doesn't have to be more than a few hundred MB even for large RAM. It's not the size of the swap file that makes the difference, but its presence, and advice of having it be proportional to RAM are largely outdated.

22. baq ◴[] No.45061818{5}[source]
yeah pretty much, also configuring memory limits everywhere where apps allow it. some software also handles malloc failures relatively gracefully, which helps a whole lot (thank you postgres devs)
replies(1): >>45080968 #
23. 1718627440 ◴[] No.45061925{3}[source]
I thought /var/tmp is for applications while /tmp is for the user.
replies(3): >>45062090 #>>45062856 #>>45062943 #
24. TiredOfLife ◴[] No.45062028{3}[source]
Using Desktop mode on SteamDeck before they increased the swap was fun. Launch a game, everything freezes, go for an hour long walk, see that the game has finally killed, make and drink cofee while system becomes usable again.
25. bmacho ◴[] No.45062059{4}[source]
So the ideal behaviour would be:

  - for most processes no SWAP
  - for tmpfs, use RAM until a quota
  - for tmpfs, start using a swapfile above that quota
ChatGPT doesn't think it is achievable, though it thinks cgroup2 can achieve something similar.
26. Hendrikto ◴[] No.45062090{4}[source]
> /tmp/

> The place for small temporary files. This directory is usually mounted as a tmpfs instance, and should hence not be used for larger files. (Use /var/tmp/ for larger files.) This directory is usually flushed at boot-up. Also, files that are not accessed within a certain time may be automatically deleted.

Source: https://uapi-group.org/specifications/specs/linux_file_syste...

replies(1): >>45062912 #
27. eMPee584 ◴[] No.45062169{5}[source]
nohang also has been a good one for desktops, with friendly notifications under memory stress and sane defaults.

Aside these complementary tools, the amount of systemd traps (OOM adjustment score defaults & restrictions, tmux user sessions killed by default etc etc) associated to OOM has really been taking a toll on my nerves over the years.. And kernel progress on this also has been underwhelming.

Also, why has firefox switched off automatic tab unloading when memory is low ONLY FOR LINUX? Much better ux since I turned on browser.tabs.unloadOnLowMemory ...

28. adrian_b ◴[] No.45062376{4}[source]
Unlike swapping, freeing code pages does no writing to HDD/SSD, but it only needs to reload the pages when they are needed again in the future, therefore it is more efficient than swapping.

I have stopped using swapping on all my Linux servers, desktops and laptops more than 20 years ago. At that time it was a great improvement and since then it has never caused any problems. However, I have been generous with the amount of RAM I install, for any computer having at least the NUC size there are many years since I have never used less than 32 GB, while for new computers I do not intend to use less than 64 GB.

With recent enough Linux kernels, using tmpfs for /tmp is perfectly fine. Nevertheless, for decades using tmpfs for /tmp had been dangerous, because copying a file through /tmp would lose metadata, e.g. by truncating file timestamps and by stripping the extended file attributes.

Copying files through /tmp was frequent between the users of multi-user computers where there was no other directory where all users had write access and the former behavior of Linux tmpfs was very surprising for them.

29. natebc ◴[] No.45062516{5}[source]
it's anecdata but I've had the linux OOM Killer take out OVS (Open Virtual Switch) on a kubernetes node several times.

Made me really not mind having a little swap space setup just in case.

replies(1): >>45062620 #
30. marginalia_nu ◴[] No.45062620{6}[source]
OOMKiller, as far as I understand it, will just pick a random page, figure out who owns it, and then kill that process, repeating until enough memory is available. This will bias toward processes with larger memory allocations, but may kill any process.
replies(1): >>45062752 #
31. menaerus ◴[] No.45062752{7}[source]
> If it ever becomes necessary for the OOM Killer to kill processes, the decision of which processes to kill will be made based on something called the OOM score. Each process has an OOM score associated with it.

> Every running process in Linux has an OOM score. The operating system calculates the OOM score for a process, based on several criteria - the criteria are mainly influenced by the amount of memory the process is using. Typically, the OOM score varies between -1000 and 1000. When the OOM Killer needs to kill a process, again, due to the system running low on memory, the process with the highest OOM score will be killed first!

https://learn.redhat.com/t5/Platform-Linux/Out-of-Memory-Kil...

32. styanax ◴[] No.45062856{4}[source]
Trivia: CIS Guidelines (security tasks applied to a server to pass an enhanced security audit to be compliant with a standard, in a soundbite) has an item requiring /var/tmp to be a bind mount to /tmp (as well as setting specific security options on /tmp). A server attempting to pass CIS audits (very common in my work-related experience w/Enterprises) may well not have a unique /var/tmp.
33. throw0101c ◴[] No.45062908{5}[source]
> […] but Linux VM maintainers have consistently stated that they've designed and tuned the VM subsystem with swap in mind.

There is a citation for this that can be shown to skeptics?

34. guappa ◴[] No.45062912{5}[source]
But that was written after the change was made :D
35. throw0101c ◴[] No.45062943{4}[source]
> I thought /var/tmp is for applications while /tmp is for the user.

/tmp is for stuff that is 'absolutely' temporary, in that on many/most systems it is nuked between reboots. /var/tmp is 'relatively' temporary in that applications can put stuff there that they're working on, but if there is a crash, the contents are not deleted and can be recovered across reboots.

36. throw0101c ◴[] No.45062962{4}[source]
> Disabling swap on servers is de-facto standard for serious deployments.

I guess I have not been deploying seriously over the last couple of decades because the (hardware) systems that I deploy all had some swap, even if it was only a file.

replies(1): >>45063271 #
37. k_bx ◴[] No.45063271{5}[source]
What's your swappiness ?
38. justsomehnguy ◴[] No.45064246{5}[source]
> I haven't personally seen the OOM killer kill unproductively

Ah, the classical linux fan adage: "never happened to me means never happens ever to anyone".

My favourite things to see with OOM:

killing mysql on the machine which hosts only mysql and is THE production;

and the best one - killing sshd. Of course I can report on that only after seeing it on the tty0 through the BMC/IPMI console or KVM console of a VM.

39. worthless-trash ◴[] No.45080968{6}[source]
Ive spent the last day thinking about that, I really can't see any big negative side effects, the only issue that I'd have is being notified of OOM conditions, and that would just be a syslog regex match. Great plan.