A deep dive into Debian 13 /tmp: What's new, and what to do if you don't like it

1. fh973 ◴[29 Aug 25 05:37 UTC] No.45060597[source]▶

Swap on servers somewhat defeats the purpose of ECC memory: your program state is now subject to complex IO path that is not end-to-end checksum protected. Also you get unpredictable performance.

So typically: swap off on servers. Do they have a server story?

replies(6): >>45060665 #>>45060768 #>>45062143 #>>45062478 #>>45062741 #>>45110791 #

2. abrookewood ◴[29 Aug 25 05:50 UTC] No.45060665[source]▶

>>45060597 (TP) #

That's a really good point that had never occurred to me.

Edit: I think that the use of ZFS for your /tmp would solve this. You get Error Corrected memory writing to an check-summed file system.

replies(1): >>45060900 #

3. m463 ◴[29 Aug 25 06:12 UTC] No.45060768[source]▶

>>45060597 (TP) #

I can see it now: pro ecc sata and m.2 ssds

replies(1): >>45064472 #

4. yjftsjthsd-h ◴[29 Aug 25 06:34 UTC] No.45060900[source]▶

>>45060665 #

ZFS /tmp is probably fine, but swapping to ZFS on Linux is dicey AIUI; there's an unfortunate possibility of deadlock https://github.com/openzfs/zfs/issues/7734

replies(2): >>45061284 #>>45061712 #

5. abrookewood ◴[29 Aug 25 07:39 UTC] No.45061284{3}[source]▶

>>45060900 #

Ah, thanks for pointing that out - wasn't aware.

6. cromka ◴[29 Aug 25 08:46 UTC] No.45061712{3}[source]▶

>>45060900 #

So maybe another filesystem with heavy checksums could be used? Btrfs or dm-crypt with integrity over ext4?

replies(2): >>45061804 #>>45062874 #

7. tatref ◴[29 Aug 25 08:58 UTC] No.45061804{4}[source]▶

>>45061712 #

Why not dm-integrity?

replies(1): >>45065608 #

8. goodpoint ◴[29 Aug 25 09:54 UTC] No.45062143[source]▶

>>45060597 (TP) #

That's not how swap is meant to be used on servers.

9. blueflow ◴[29 Aug 25 10:55 UTC] No.45062478[source]▶

>>45060597 (TP) #

First, having no swap means anonymous pages cannot be evicted, named pages must be evicted instead.

Second, the binaries of your processes are mapped in as named pages (because they come from the ELF file).

Named pages are generell not understood as "used" memory because they can be evicted and reclaimed, but if you have a service with a 150MB binary running, those 150MB of seemingly "free" memory are absolutely crucial for performance.

Running out of this 150MB of disk cache will result in the machine using up all I/O capacities to re-fetch the ELF from disk and likely become unresponsive. Having swap does significantly delay this lock-up by allowing anonymous pages to be evicted, so the same memory pressure will cause less stalls.

So until the OOM management on Linux gets fixed, you need swap.

replies(1): >>45063510 #

10. dooglius ◴[29 Aug 25 11:39 UTC] No.45062741[source]▶

>>45060597 (TP) #

The purpose of ECC has nothing to do with being "end-to-end". A typical CPU path to/from DRAM will not be end-to-end either, since caches will use different encodings. This is generally considered fine since each I/O segment has error detection in one form or another, both in the CPU-to-memory case and the memory-to-disk case. ECC in general is not like cryptographic authentication where it protects against any possible alteration; it's probabilistic in nature against the most common failure modes.

11. Vogtinator ◴[29 Aug 25 11:58 UTC] No.45062874{4}[source]▶

>>45061712 #

swapfile on linux must be directly mapped, bypassing any filesystem level checksums (see https://btrfs.readthedocs.io/en/latest/Swapfile.html)

12. Scaevolus ◴[29 Aug 25 12:56 UTC] No.45063510[source]▶

>>45062478 #

Swapping anonymous pages can bring the system to a crawl too. High memory pressure makes things very slow with swap, while with swap off high memory pressure is likely to invoke the oom killer and lets the system violently repair.

replies(1): >>45063942 #

13. blueflow ◴[29 Aug 25 13:35 UTC] No.45063942{3}[source]▶

>>45063510 #

The "bug" with the OOM killer that i implied is that what you describe does not happen. Which is not surprising because disk cache thrashing is normal mode of operation for serving big files to the network. An OOM killer acting on that alone would be problematic, but without swap, that's where the slowdown will happen for other workloads, too.

Its less a bug but an understood problem, and there aren't any good solutions around yet.

replies(1): >>45066248 #

14. justsomehnguy ◴[29 Aug 25 14:18 UTC] No.45064472[source]▶

>>45060768 #

Well, SATA do have a basic CRC and you would see an increase in CRC transfer errors in SMART if the path (usually the cables) aren't good.

15. yjftsjthsd-h ◴[29 Aug 25 15:47 UTC] No.45065608{5}[source]▶

>>45061804 #

https://wiki.archlinux.org/title/Dm-integrity

> It uses journaling for guaranteeing write atomicity by default, which effectively halves the write speed.

That seems like a poor fit for swap IMO.

https://www.kernel.org/doc/html/latest/admin-guide/device-ma... says,

> There’s an alternate mode of operation where dm-integrity uses a bitmap instead of a journal. If a bit in the bitmap is 1, the corresponding region’s data and integrity tags are not synchronized - if the machine crashes, the unsynchronized regions will be recalculated. The bitmap mode is faster than the journal mode, because we don’t have to write the data twice, but it is also less reliable, because if data corruption happens when the machine crashes, it may not be detected.

It's not clear to me if that would be okay for swap (as long as you don't hibernate, maybe) or if it's sufficiently protected from corruption.

16. Trixter ◴[29 Aug 25 16:34 UTC] No.45066248{4}[source]▶

>>45063942 #

earlyoom is what we use to address this. We can't tolerate any kind of swapping at all in our workloads, where it is better for the system to kill one process to save the others, than for the system to slow down or lock up.

17. ars ◴[03 Sep 25 00:14 UTC] No.45110791[source]▶

>>45060597 (TP) #

If you have checksum errors reading data from disk, you have much worse issues than ram corruption. Any program you launch will probably be corrupted.

Although if you do swap on a server (and you should), the swap needs to be on a raid, otherwise your server will crash on a disk error.

Swap on a server is not meant for handling low memory issues, instead there's tons of data on a server that's almost never used, so instead swap that out and make more room for cache.