Also see https://www.phoronix.com/news/Josef-Bacik-Leaves-Meta
It's important to note that striping and mirroring works just fine. It's only the 5/6 modes that are unstable: https://btrfs.readthedocs.io/en/stable/Status.html#block-gro...
How can this be a stable filesystem if parity is unstable and risks data loss?
How has this been allowed to happen?
It just seems so profoundly unserious to me.
I also have had to deal with thousands of nodes kernel panicing due to a btrfs bug in linux kernel 6.8 (stable ubuntu release).
*caveat: I’m using RAID 10, not a parity RAID. It could have problems with parity RAID. So? If you really really want RAID 5, then just use md to make your RAID 5 device and put btrfs on top.
Now you've piqued my curiosity; what uses that many filesystems/subvolumes? (Not an attack; I believe you, I'm just trying to figure out where it comes up)
The main criticism in this thread about btrfs involves multidisk setups, which aren't relevant for me, since I'm working on cloud systems and disk storage is abstracted away as a single block device.
Also, the out-of-band deduplication for btrfs using https://github.com/Zygo/bees is very impressive and flexible, in a way that ZFS just doesn't match.
The man page for mkfs.btrfs says:
> Warning: RAID5/6 has known problems and should not be used in production.
When you actually tell it to use raid5 or raid6, mkfs.btrfs will also print a large warning:
> WARNING: RAID5/6 support has known problems is strongly discouraged to be used besides testing or evaluation.
All the anecdotes I see tend to be “my drive didn’t mount, and I tried nothing before giving up because everyone knows BTRFS sux lol”. My professional experience meanwhile is that I’ve never once been able to not (very easily!) recover a BTRFS drive someone else has given up for dead… just by running its standard recovery tools.
The md metadata is not adequately protected. Btrfs checksums can tell you when a file has gone bad but not self-heal. And I'm sure there are going to be caching/perf benefits left on the table not having btrfs manage all the block storage itself.
Disaster recovery isn't obvious on any setup I've worked with. I have to RTFM to understand each system's idiosyncrasies.
The idea that some filesystems have no bugs is absurd. The idea that filesystems can mitigate all bugs in drive firmware or elsewhere in the storage stack is also absurd.
My anecdota: hundreds of intentional sabotaging of Btrfs while writing, single drive and raid1 configurations, and physically disconnecting a drive. Not one time have I encountered an inconsistent filesystem or data loss once it was on stable media. Not one. It always mounted without needing a filesystem check. This is on consumer hardware.
There's always some data loss in the single drive case no matter the filesystem. Some of the data or fs metadata isn't yet persistent. Raid1 helps with that, because so long as the hardware problem that affected the 1st drive is isolated, the data is written to stable media.
Of course, I'm no more a scientific sample than you are. And also my tests are rudimentary compared to the many thousands of synthetic tests fstests performs on Linux filesystems, both generic and fs specific, every cycle. But it is a real world test, and suggests no per se problem that inevitably means data loss as you describe.
We are using a fairly simple config, but under certain heavy load patterns the kernel would panic: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux...
I hear people say all the time how btrfs is stable now and people are just complaining about issues when btrfs is new, but please explain to me how the bug I linked is OK in a stable version of the most popular linux distro?