https://news.ycombinator.com/item?id=41090956
But you not only loose some capacity. Some bandwidth is also lost. And perhaps even some CPU cycles, since likely in-band ECC hasn't been implemented purely in a hard IP-block.
The cpu cache won't be mismatched though, since the memory controller can mask this. The performance hit will be due to the memory controller having to do the extra reads for parity.
That will be a tiny mismatch, and I wonder if the performance implication of this won't more or less be equal to the performance difference we already have between buffered and unbuffered memory (more or less the same, simply, now that "extra work", moved from inside the dimm, to the memory controller)
This helps, but ideally the entire path from CPU to Dimms is wider and covers not just what is being read or written, but also the address it's being written to. After all writing the correct bits to the wrong address is a serious failure.
So the burst transfers have the same size as when ECC is disabled.
Without the special cache, the number of memory accesses would double, for data and for the extra ECC bits, which would not be acceptable. With the ECC cache, in many cases the reading and writing of the extra ECC bits can be avoided.
There have been published a few benchmarks for inline ECC. The performance loss depends on the cache hit rates, so it varies a lot from program to program. In some cases the speed is lower by only a couple percent, but for some applications the performance loss can be as high as 20% or 30%.