←back to thread

65 points qvr | 3 comments | | HN request time: 0s | source
Show context
nullc ◴[] No.44653420[source]
Are any filesystems offering file level FEC yet?

If a file has a hundred thousand blocks you could tack on a thousands blocks of error correction for the cost of making it just 1% larger. If the file is a seldom/never written archive it's essentially free beyond the space it takes up.

The kind of massive data archives that you want to minimize storage costs of tend to be read-mostly affairs.

It won't save you from a disk failure but I see bad blocks much more often than whole disk failures these days... and raid/5/6 have rather high costs while being still quite vulnerable to the possibility of an aligned fault on multiple disks.

Of course you could use par or similar tools, but that lacks nice FS transparent integration and particularly doesn't benefit from checksums already implemented in (some) FS (as you need half the error correction data to recover from known-position errors, and-or can use erasure only codes).

replies(5): >>44653469 #>>44653485 #>>44654527 #>>44655069 #>>44655131 #
1. leptons ◴[] No.44653469[source]
RAID and any other fault-tolerance scheme can not be the only way you protect your data. I have two RAID 10 arrays, one is for active data, one is for backup, and the backup system has an LTO tape drive, where I also use PAR parity files on the tape backups. Important stuff is backed-up to multiple tape sets. Both systems are in different buildings, with tapes stored in a third.

My point is, it doesn't much matter what your FS does, so long as you have 3 or more of them.

replies(1): >>44654209 #
2. zamadatix ◴[] No.44654209[source]
There is no such thing as a guaranteed data storage system. The only thing you can choose is how reliable is reliable enough (or how reliable you can afford). Parity or RAID can get you more granular reliability increments than straight copies can provide, or even just far greater convenience when you do have copies.
replies(1): >>44657545 #
3. nullc ◴[] No.44657545[source]
Without error coding only a perfect channel can give lossless performance. But with error coding even a fairly lossy channel can give performance that is arbitrarily close to lossless, depending only on how much capacity you're willing to waste.

As the number of blocks on our storage devices grows their the probability that there is at least one with an error goes up. Even with raid5 the probability that there are two errors in one stripe unit at the same time can become non-negligible.

Worse, for raid5/6 normally the system is defendant on the device detecting corruption. When it doesn't, the raid will not only not fix but potentially propagate the corruption to other data. (I understand that ZFS at least can use its internal checksums to handle this case).