←back to thread

63 points trelane | 6 comments | | HN request time: 0.217s | source | bottom
1. dusted ◴[] No.42166600[source]
it just dawned on me how trivially simple it would be for memory controllers to implement ECC in UDIMMs, for every N words, reserve 1 word for parity. You gain ECC for a small decrease in capacity. Since the memory controller is on the CPU, it can easily abstract this away.
replies(2): >>42166695 #>>42173924 #
2. kvemkon ◴[] No.42166695[source]
Indeed. Intel has recently implemented it in a low-cost CPU SoC: "in-band ECC".

https://news.ycombinator.com/item?id=41090956

But you not only loose some capacity. Some bandwidth is also lost. And perhaps even some CPU cycles, since likely in-band ECC hasn't been implemented purely in a hard IP-block.

replies(1): >>42167425 #
3. wtallis ◴[] No.42167425[source]
I think the bigger performance problem is that a read burst from one channel of RAM is no longer matched to the CPU cache line size when doing in-band ECC.
replies(2): >>42170293 #>>42194699 #
4. dusted ◴[] No.42170293{3}[source]
This is true, however, with the readahead cpu's usually do anyway, I don't even think it's that bad.. There is definitely a performance and capacity cost, but again, technically, that capacity cost is also present in ECC memory, that extra memory is still there, it's just not printed on the label, and instead, the stick is more expensive..

The cpu cache won't be mismatched though, since the memory controller can mask this. The performance hit will be due to the memory controller having to do the extra reads for parity.

That will be a tiny mismatch, and I wonder if the performance implication of this won't more or less be equal to the performance difference we already have between buffered and unbuffered memory (more or less the same, simply, now that "extra work", moved from inside the dimm, to the memory controller)

5. sliken ◴[] No.42173924[source]
Nvidia GPUs, that support ECC, do this. I believe it's called inline ECC and does cost latency, bandwidth, and memory capacity.

This helps, but ideally the entire path from CPU to Dimms is wider and covers not just what is being read or written, but also the address it's being written to. After all writing the correct bits to the wrong address is a serious failure.

6. adrian_b ◴[] No.42194699{3}[source]
The chips with in-band ECC have a separate dedicated cache for storing ECC codes, which are stored in another part of the memory chip, not inline with the corresponding cache line that stores data.

So the burst transfers have the same size as when ECC is disabled.

Without the special cache, the number of memory accesses would double, for data and for the extra ECC bits, which would not be acceptable. With the ECC cache, in many cases the reading and writing of the extra ECC bits can be avoided.

There have been published a few benchmarks for inline ECC. The performance loss depends on the cache hit rates, so it varies a lot from program to program. In some cases the speed is lower by only a couple percent, but for some applications the performance loss can be as high as 20% or 30%.