Most active commenters

    ←back to thread

    150 points shaunpud | 17 comments | | HN request time: 1.061s | source | bottom
    1. fh973 ◴[] No.45060597[source]
    Swap on servers somewhat defeats the purpose of ECC memory: your program state is now subject to complex IO path that is not end-to-end checksum protected. Also you get unpredictable performance.

    So typically: swap off on servers. Do they have a server story?

    replies(6): >>45060665 #>>45060768 #>>45062143 #>>45062478 #>>45062741 #>>45110791 #
    2. abrookewood ◴[] No.45060665[source]
    That's a really good point that had never occurred to me.

    Edit: I think that the use of ZFS for your /tmp would solve this. You get Error Corrected memory writing to an check-summed file system.

    replies(1): >>45060900 #
    3. m463 ◴[] No.45060768[source]
    I can see it now: pro ecc sata and m.2 ssds
    replies(1): >>45064472 #
    4. yjftsjthsd-h ◴[] No.45060900[source]
    ZFS /tmp is probably fine, but swapping to ZFS on Linux is dicey AIUI; there's an unfortunate possibility of deadlock https://github.com/openzfs/zfs/issues/7734
    replies(2): >>45061284 #>>45061712 #
    5. abrookewood ◴[] No.45061284{3}[source]
    Ah, thanks for pointing that out - wasn't aware.
    6. cromka ◴[] No.45061712{3}[source]
    So maybe another filesystem with heavy checksums could be used? Btrfs or dm-crypt with integrity over ext4?
    replies(2): >>45061804 #>>45062874 #
    7. tatref ◴[] No.45061804{4}[source]
    Why not dm-integrity?
    replies(1): >>45065608 #
    8. goodpoint ◴[] No.45062143[source]
    That's not how swap is meant to be used on servers.
    9. blueflow ◴[] No.45062478[source]
    First, having no swap means anonymous pages cannot be evicted, named pages must be evicted instead.

    Second, the binaries of your processes are mapped in as named pages (because they come from the ELF file).

    Named pages are generell not understood as "used" memory because they can be evicted and reclaimed, but if you have a service with a 150MB binary running, those 150MB of seemingly "free" memory are absolutely crucial for performance.

    Running out of this 150MB of disk cache will result in the machine using up all I/O capacities to re-fetch the ELF from disk and likely become unresponsive. Having swap does significantly delay this lock-up by allowing anonymous pages to be evicted, so the same memory pressure will cause less stalls.

    So until the OOM management on Linux gets fixed, you need swap.

    replies(1): >>45063510 #
    10. dooglius ◴[] No.45062741[source]
    The purpose of ECC has nothing to do with being "end-to-end". A typical CPU path to/from DRAM will not be end-to-end either, since caches will use different encodings. This is generally considered fine since each I/O segment has error detection in one form or another, both in the CPU-to-memory case and the memory-to-disk case. ECC in general is not like cryptographic authentication where it protects against any possible alteration; it's probabilistic in nature against the most common failure modes.
    11. Vogtinator ◴[] No.45062874{4}[source]
    swapfile on linux must be directly mapped, bypassing any filesystem level checksums (see https://btrfs.readthedocs.io/en/latest/Swapfile.html)
    12. Scaevolus ◴[] No.45063510[source]
    Swapping anonymous pages can bring the system to a crawl too. High memory pressure makes things very slow with swap, while with swap off high memory pressure is likely to invoke the oom killer and lets the system violently repair.
    replies(1): >>45063942 #
    13. blueflow ◴[] No.45063942{3}[source]
    The "bug" with the OOM killer that i implied is that what you describe does not happen. Which is not surprising because disk cache thrashing is normal mode of operation for serving big files to the network. An OOM killer acting on that alone would be problematic, but without swap, that's where the slowdown will happen for other workloads, too.

    Its less a bug but an understood problem, and there aren't any good solutions around yet.

    replies(1): >>45066248 #
    14. justsomehnguy ◴[] No.45064472[source]
    Well, SATA do have a basic CRC and you would see an increase in CRC transfer errors in SMART if the path (usually the cables) aren't good.
    15. yjftsjthsd-h ◴[] No.45065608{5}[source]
    https://wiki.archlinux.org/title/Dm-integrity

    > It uses journaling for guaranteeing write atomicity by default, which effectively halves the write speed.

    That seems like a poor fit for swap IMO.

    https://www.kernel.org/doc/html/latest/admin-guide/device-ma... says,

    > There’s an alternate mode of operation where dm-integrity uses a bitmap instead of a journal. If a bit in the bitmap is 1, the corresponding region’s data and integrity tags are not synchronized - if the machine crashes, the unsynchronized regions will be recalculated. The bitmap mode is faster than the journal mode, because we don’t have to write the data twice, but it is also less reliable, because if data corruption happens when the machine crashes, it may not be detected.

    It's not clear to me if that would be okay for swap (as long as you don't hibernate, maybe) or if it's sufficiently protected from corruption.

    16. Trixter ◴[] No.45066248{4}[source]
    earlyoom is what we use to address this. We can't tolerate any kind of swapping at all in our workloads, where it is better for the system to kill one process to save the others, than for the system to slow down or lock up.
    17. ars ◴[] No.45110791[source]
    If you have checksum errors reading data from disk, you have much worse issues than ram corruption. Any program you launch will probably be corrupted.

    Although if you do swap on a server (and you should), the swap needs to be on a raid, otherwise your server will crash on a disk error.

    Swap on a server is not meant for handling low memory issues, instead there's tons of data on a server that's almost never used, so instead swap that out and make more room for cache.