https://news.ycombinator.com/item?id=42221564 - 2024-11-23, 103 comments
I don’t doubt that people on all sides have made mis-steps, but from the outside it mostly just seems like Kent doesn’t want to play by the rules (despite having been given years of patience).
Also see https://www.phoronix.com/news/Josef-Bacik-Leaves-Meta
There is no 'modern' ZFS-like fs in Linux nowadays.
-- Kent Overstreet, bcachefs maintainer: https://lore.kernel.org/all/citv2v6f33hoidq75xd2spaqxf7nl5wb...
It's important to note that striping and mirroring works just fine. It's only the 5/6 modes that are unstable: https://btrfs.readthedocs.io/en/stable/Status.html#block-gro...
Your distro could very easily include bcachefs if it wishes? Although I think the ZFS + Linux situation is mostly Linux religiosity gone wild, that very particular problem doesn't exist re: bcachefs?
The problem with bcachefs is the problem with btrfs. It mostly still doesn't work to solve the problems ZFS already solves.
How can this be a stable filesystem if parity is unstable and risks data loss?
How has this been allowed to happen?
It just seems so profoundly unserious to me.
I also have had to deal with thousands of nodes kernel panicing due to a btrfs bug in linux kernel 6.8 (stable ubuntu release).
zfs is out of tree leaving it as an unviable option for many people. This news means that bcachefs is going to be in a very weird state in-kernel, which leaves only btrfs as the only other in-tree ‘modern’ filesystem.
This news about bcachefs has ramifications about the state of ‘modern’ FSes in Linux, and I’d say this news about the btrfs maintainer taking a step back is related to this.
1. The dm layer gives you cow/snapshots for any filesystem you want already and has for more than a decade. Some implementations actually use it for clever trickery like updates, even. Anyone who has software requirements in this space (as distinct from "wants to yell on the internet about it") is very well served.
2. Compression seems silly in the modern world. Virtually everything is already compressed. To first approximation, every byte in persistent storage anywhere in the world is in a lossy media format. And the ones that aren't are in some other cooked format. The only workloads where you see significant use of losslessly-compressible data are in situations (databases) where you have app-managed storage performance (and who see little value from filesystem choice) or ones (software building, data science, ML training) where there's lots of ephemeral intermediate files being produced. And again those are usages where fancy filesystems are poorly deployed, you're going to throw it all away within hours to days anyway.
Filesystems are a solved problem. If ZFS disappeared from the world today... really who would even care? Only those of us still around trying to shout on the internet.
*caveat: I’m using RAID 10, not a parity RAID. It could have problems with parity RAID. So? If you really really want RAID 5, then just use md to make your RAID 5 device and put btrfs on top.
Yeah nah, have you tried processing terabytes of data every day and storing them? It gets better now with DDR5 but bit flips do actually happen.
O_o
Apparently I've been living under a rock, can you please show us a link about this? I was just recently (casually) looking into bolting ZFS/BTRFS-like partial snapshot features to simulate my own atomic distro where I am able to freely roll back if an update goes bad. Think Linux's Timeshift with something little extra.
https://docs.kernel.org/admin-guide/device-mapper/snapshot.h...
As for the snapshots, things like LVM snapshots are pretty coarse, especially for someone like me where I run dm-crypt on top of LVM
I’d say zfs would be pretty well missed with its data integrity features. I’ve heard that btrfs is worse in that aspect, so given that btrfs saved my bacon with a dying ssd, I can only imagine what zfs does.
* Another kernel Dev takes over management and they tread it as a fork (highly unlikely according to their estimate)
* Kent hires someone to upstream the changes for him and Kent stops complaining wrt when it's getting merged
* Bcachefs gets no maintenance and will likely be removed in the next major release
I do not know him personally, but most interactions I've read online by him sounded grounded and not particularly offensive, so I'm abstaining from making any kind of judgement on it.
But while I have no stake in this, Drama really does seem to follow Kent around for one reason or another. And it's never his fault if you take him by his public statements - which I want to repeat: he sounds very grounded and not offensive to me whatsoever.
I think the Linux Kernel just doesn't want to be potentially in violation of Oracle's copyrights. That really doesn't seem that unreasonable to me, even if it feels pointless to you.
Both Linus and Kent drive a hard bargain, and it's not as simple as finding someone else to blindly forward bcachefs patches. At the first sign of conflict, the poor person in the middle would have no power, no way to make anyone back down, and we'd be back to square one.
It's in limbo, and there is still time, but if left to bitrot it will be removed eventually.
[1] Due to an unrelated bug in latest Fennec, I currently have all my extensions disabled or else all pages stop loading at all. Normally use uBlock Origin, Tampermonkey, etc.
IMHO, what his communications show is an unwillingness to acknowledge that other projects that include his work have focus, priorities, and policies that are not the same as that of his project. Also, expecting exceptions to be made for his case, since exceptions have been made in other cases.
Again IMHO, I think he would be better off developing apart with an announcement mailing list. When urgent changes are made, send to the announcement list. Let other interested parties sort out the process of getting those changes into the kernel and distributions.
If people come with bug reports from old versions distributed by others, let them know how to get the most up to date version from his repository, and maybe gently poke the distributors.
Yes, that means users will have older versions and not get fixes immediately. But what he's doing isn't working to get fixes to users immediately either.
You can certainly add verification above and below your filesystem, but the filesystem seems like a good layer to have verification. Capturing a checksum while writing and verifying it while reading seems appropriate; zfs scrub is a convenient way to check everything on a regular basis. Personally, my data feels important enough to make that level of effort, but not important enough to do anything else.
*Coming from the extremely well thought out and documented zfs utilities to btrfs will have you wondering wtf fairly frequently while you learn your way around.
The issue is that I have never seen Kent back down a single time. Kent will explain in details why the rules are bullshit and don't apply in this particular case, every single time, without any room for compromise.
If the only problem was when to send patches, that would be one thing. But disagreements over patches aren't just a timing problem that can be routed around.
I agree that the kernel community can be a hostile environment.
Though I’d argue that people _have_ tried to explain things to Kent, multiple times. At least a few have been calm, respectful attempts.
Sadly, Kent responds to everything in an email except the key part that is being pointed out to him (usually his behavior). Or deflects by going on the attack. And generally refuses to apologise.
Kent seems very patient in explaining his position (and frustrations arising from other people introducing bugs to his code) and the kernel & debian folks are performing a smearing campaign instead of replying to what I see are genuine problems in the process. As an example, the quotes that are referenced by user paravoid are, imho, taken out of context (judging by reading the provided links).
There probably is a lot more history to it, but judging from that thread it's not Kent who looks like a bad guy.
The dm stuff is one key for the entire partition and you can't check it for bitrot or repair it without the key.
He’s not super offensive, but he will tell a Debian package maintainer that their process sucks, and the should change it and they are being stupid by following that process. Overall, he seems a bit entitled, and unwilling to compromise with others. It’s not just Kent though, the areas that seem to be the most problematic for him, are when it’s an unstoppable force (Kent), and an immovable wall (Linux / Debian).
Working in the Linux kernel is well known for its frustrations and the personal conflict that it creates, to the point that there are almost no linux kernel devs/maintainers that aren’t paid to do the work. You can see a similar set of events happen with Rust4Linux people, Asahi linux project and their R4L drivers, etc.
> Personally, my data feels important enough to make that level of effort, but not important enough to do anything else.
OMG. Backups! You need backups! Worry about polishing your geek cred once your data is on physically separate storage. Seriously, this is not a technology choice problem. Go to Amazon and buy an exfat stick, whatever. By far the most important thing you're ever going to do for your data is Back. It. Up.
Filesystem choice is, and I repeat, very much a yell-on-the-internet kind of thing. It makes you feel smart on HN. Backups to junky Chinese flash sticks are what are going to save you from losing data.
I just think that while, yes, the kernel folks have tried to explain, they didn't explain well. The "why" of it is a people thing. Linus needs to be able to trust that people he's delegated some authority will respect its limits. The maintainers need to be able to trust that each other maintainer will respect the area that they have been delegated authority over. I think that Kent genuinely doesn't get this.
Kent just does not listen. Every time the discussion starts from the top. Even if you do agree on some compromise, in a month or two he'll just do the same thing again and all the same arguments start again.
You can't expect people to detail about four or five years of context in every single engagement for the benefit of interested 3rd parties like you or me.
I wouldn't comment but I feel like I'm naturally on your side of the argument and want to see it articulated well.
Behaviour sounds like the least important part of code contributions. I smell overpowered, should've-been-a-kindergarten-teacher code of conduct person overreach.
A block level cache like bcache (not fs) and dm-cache handles it less ideally, and doesn't leave the SSD space as usable space. As a home user, 2TB of SSDs is 2TB of space I'd rather have. ZFS's ZIL is similar, not leaving it as usable space. Btrfs has some recent work in differentiating drives to store metadata on the faster drives (allocator hints), but that only does metadata as there is no handling of moving data to HDDs over time. Even Microsoft's ReFS does tiered storage I believe.
I just want to have 1 or 2 SSDs, with 1 or 2 HDDs in a single filesystem that gets the advantages of SSDs with recently used files and new writes, and moves all the LRU files to the HDDs. And probably keep all the metadata on the SSDs too.
This is one of the problems: Kent is frequently unable to accept that things don't go his way. He will keep bringing it up again and again and he just grinds people down with it. If you see just one bit of it then it may seem somewhat reasonable, but it's really not because this is the umpteenth time this exact discussion is happening and it's groundhog day once again.
This is a major reason why people burn out on Kent. You can't just have a disagreement/conflict and resolve it. Everything is a discussion with Kent. He can't just shrug and say "well, I think that's a bit silly, but okay, I can work with it, I guess". The options are 1) Kent gets his way, or 2) he will keep pushing it (not infrequently ignoring previous compromises, restarting the discussion from square one). Here too, the Debian people have this entire discussion (again) forced upon them by Kent's comments in a way that's just completely unnecessary and does nothing to resolve anything.
Even as an interested onlooker who is otherwise uninvolved and generally more willing to accept difficult behaviour than most people, I've rather soured on Kent over time.
So either Kent is on a righteous crusade against unreasonable processes within the Kernel, Debian, and every other large software project he interacts with. Or there's something about the way Kent interacts with these projects that causes friction.
I like Bcachefs, I think Kent is a very talented developer, but I'm not going to pretend that he is innocent in all this.
My goal was actually the same though: to try to short-circuit the inevitable platform flame by calling it out explicitly and pointing out that the technical details are sort of a solved problem.
ZFS argumentation gets exhausting, and has ever since it was released. It ends up as a proxy for Sun vs. Linux, GNU vs. BSD, Apple vs. Google, hippy free software vs. corporate open source, pick your side. Everyone has an opinion, everyone thinks it's crucially important, and as a result of that hyperbole everyone ends up thinking that ZFS (dtrace gets a lot of the same treatment) is some kind of magically irreplaceable technology.
And... it's really not. Like I said above if it disappeared from the universe and everyone had to use dm/lvm for the actual problems they need to solve with storage management[1], no one would really care.
[1] Itself an increasingly vanishing problem area! I mean, at scale and at the performance limit, virtually everything lives behind a cloud-adjacent API barrier these days, and the backends there worry much more about driver and hardware complexity than they do about mere "filesystems". Dithering about individual files on individual systems in the professional world is mostly limited to optimizing boot and update time on client OSes. And outside the professional world it's a bunch of us nerds trying to optimize our movie collections on local networks; realistically we could be doing that on something as awful NTFS if we had to.
I spent some time researching this topic, and in all benchmarks I've seen and my personal tests btrfs is faster or much faster: https://www.reddit.com/r/zfs/comments/1i3yjpt/very_poor_perf...
https://wiki.archlinux.org/title/Dm-integrity
> It uses journaling for guaranteeing write atomicity by default, which effectively halves the write speed
I'd really rather not do that, thanks.
IIRC my laptop's zpool has a 1.2x compression ratio; it's worth doing. At a previous job, we had over a petabyte of postgres on ZFS and saved real money with compression. Hilariously, on some servers we also improved performance because ZFS could decompress reads faster than the disk could read.
1. This is misunderstanding how device corruption works. It's not and can't ever be limited to "files". (Among other things: you can lose whole trees if a directory gets clobbered, you'd never even be able to enumerate the "corrupted files" at all!). All you know (all you can know) is that you got a success and that means the relevant data and metadata matched the checksums computed at write time. And that property is no different with dm. But if you want to know a subset of the damage just read the stderr from tar, or your kernel logs, etc...
2. Metadata robustness in the face of inconsistent updates (e.g. power loss!) is a feature provided by all modern filesystems, and ZFS is no more or less robust than ext4 et. al. But all such filesystems (ZFS included) will "lose data" that hadn't been fully flushed. Applications that are sensitive to that sort of thing must (!) handle this by having some level of "transaction" checkpointing (i.e. a fsync call). ZFS does absolutely nothing to fix this for you. What is true is that an unsynchronized snapshot looks like "power loss" at the dm level where it doesn't in ZFS. But... that's not useful for anyone that actually cares about data integrity, because you still have to solve the power loss problem. And solving the power loss problem obviates the need for ZFS.
https://lore.kernel.org/lkml/CAHk-=wiLE9BkSiq8F-mFW5NOtPzYrt...
https://lore.kernel.org/all/citv2v6f33hoidq75xd2spaqxf7nl5wb...
Now you've piqued my curiosity; what uses that many filesystems/subvolumes? (Not an attack; I believe you, I'm just trying to figure out where it comes up)
But I don't usually verify the backups, so there's that. And everything is in the same zip code for the most part, so one big disaster and I'll lose everything. C'est la vie.
CoC isn't even the issue, he constantly breaks kernel development rules relating to the actual code, then starts arguments with everyone up to and including Linus when he gets called out, and aggressively misses the point every time. Then starts the same argument all over again 6 weeks later.
And, like, if you don't like some rules, then you can have that discussion, but submitting patches you know will be rejected and then re-litigating your dislike of the rules is a waste of everyone's time.
The second has one offensive remark:
> Get your head examined. And get the fuck out of here with this shit.
which I thought he admitted was out of line and - said sorry for. Or do I misremember? I admit once again, I'm still completely uninvolved and merely saw it play out on the internet.
The patch that kicked off the current conflict was the 'journal_rewind' patch; we recently (6.15) had the worst bug in the entire history upstream - it was taking out entire subvolumes.
The third report got me a metadata dump with everything I needed to debug the issue, thank god, and now we have a great deal of hardening to ensure a bug like this can never happen again. Subsequently, I wrote new repair code, which fully restored the filesystem of the 3rd user hit by the bug (first two had backups).
Linus then flipped out because it was listed as a 'feature' in the pull request; it was only listed that way to make sure that users would know about it if they were affected by the original bug and needed it. Failure to maintain your data is always a bug for a filesystem, and repair code is a bugfix.
In the private maintainer thread, and even in public, things went completely off the rails, with Linus and Ted basically asserting that they knew better than I do which bcachefs patches are regression risks (seriously), and a page and a half rant from Linus on how he doesn't trust my judgement, and a whole lot more.
There have been many repeated arguments like this over bugfixes.
The thing is, since then I started perusing pull requests from other subsystems, and it looks like I've actually been more conservative with what I consider a critical bugfix (and send outside the merge window) than other subsystems. The _only_ thing that's been out of the ordinary with bcachefs has been the volume of bugfixes - but that's exactly what you'd expect to see from a new filesystem that's stabilizing rapidly and closing out user bug reports - high volume of pure bugfixing is exactly what you want to see.
So given that, I don't think having a go-between would solve anything.
In the future bcachefs will be rolling out auxiliary dirent indices for a variety of purposes, and one of those will be to give you a list of files that have had errors detected by e.g. scrub (we already generally tell you the affected filename in error messages)
2 - No, metadata robustness absolutely varies across filesystems.
From what I've seen, ext4 and bcachefs are the gold standard here; both can recover from basically arbitrary corruption and have no single points of failure.
Other filesystems do have single points of failure (notably btree roots), and btrfs and I believe ZFS are painfully vulnerable to devices with broken flush handling. You can blame (and should) blame the device and the shitty manufacturers, but from the perspective of a filesystem developer, we should be able to cope with that without losing the entire filesystem.
XFS is quite a bit better than btrfs, and I believe ZFS, because they have a ton of ways to reconstruct from redundant metadata if they lose a btree root, but it's still possible to lose the entire filesystem if you're very, very unlucky.
On a modern filesystem that uses b-trees, you really need a way of repairing from lost b-tree roots if you want your filesystem to be bulletproof. btrfs has 'dup' mode, but that doesn't mean much on SSDs given that you have no control over whether your replicas get written to the same erase unit.
Reiserfs actually had the right idea - btree node scan, and reconstruct your interior nodes if necessary. But they gave that approach a bad name; for a long time it was a crutch for a buggy b-tree implementation, and they didn't seed a filesystem specific UUID into the btree node magic number like bcachefs does, so it could famously merge a filesystem from a disk image with the host filesystem.
bcachefs got that part right, and also has per-device bitmaps in the superblock for 'this range of the device has btree nodes' so it's actually practical even if you've got a massive filesystem on spinning rust - and it was introduced long after the b-tree implementation was widely deployed and bulletproof.
The point of contention here was a patch within fs/bcachefs/, which was repair code to make sure users didn't lose data.
If we can't have clear boundaries and delineations of responsibility, there really is no future for bcachefs in the kernel; my core mission is a rock solid commitment to reliability and robustness, including being responsive to issues users hit, and we've seen repeatedly that the kernel process does not share those priorities.
1. Regardless of whether correct or not, it's Linus that decides what's a feature and what's not in Linux. Like he has for the last however many decades. Repair code is a feature if Linus says it is a feature.
2. Being correct comes second to being agreeable in human-human interactions. For example, dunking on x file system does not work as a defense when the person opposite you is a x file system maintainer.
3. rules are rules, and generally don't have to be "correct" to be enforced in an organization
I think your perceived "unfairness" might make sense if you just thought of these things as un-workaroundable constraints, Just like the fact that SSDs wear out over time.
And there's a real connection to the issue that sparked all this drama in the kernel and the Debian drama: critical system components (the kernel, the filesystem, and others) absolutely need to be able to get bugfixes in a timely manner. That's not optional.
With Debian, we had a package maintainer who decided that unbundling Rust dependencies was more important than getting out updates, and then we couldn't get a bugfix out for mount option handling. This was a non-issue for every other distro with working processes because the bug was fixed in a few days, but a lot of Debian users weren't able to mount in degraded mode and lost access to their filesystems.
In the kernel drama, Linus threw a fit over a repair code to recover from a serious bug and make sure users didn't lose data, and he's repeatedly picked fights over bugfixes (and even called pushing for getting bugfixes out "whining" in the past).
There are a lot of issues that there can be give and take on, but getting fixes out in a timely manner is just part of the baseline set of expectations for any serious project.
To some extent drawing clear boundaries is good as a last resort when people cannot agree, but it can't be the main way to resolve disagreements. Thinking in terms of who owns what and has the final say is not the same as trying to understand the requirements from the other side to find a solution that works for everyone.
I don't think the right answer is to blindly follow whatever Linus or other people say. I don't mean you should automatically back down without technical reasons, because authority says so. But I notice I can't remember an email where concessions where made, or attemps to find a middle grounds by understanding the other side. Maybe someone can find counterexamples.
But this idea of using ownership to decide who has more authority and can impose their vision, that can't be the only way to collaborate. It really is uncompromising.
I had high hopes for bcachefs. sigh
Agreed 100%. In an ideal world, we'd be sitting down together, figuring out what our shared priorities are, and working from there.
Unfortunately, that hasn't been possible, and I have no idea what Linus's priorities except that they definitely aren't a bulletproof filesystem and safeguarding user data; his response to journal_rewind demonstrated that quite definitively.
So that's where we're at, and given the history with other local filesystems I think I have good reason not to concede. I don't want to see bcachefs run off the rails, but given all the times I've talked about process and the way I'm doing things I think that's exactly what would happen if I started conceding on these points. It's my life's work, after all.
You'd think bcachefs's track record (e.g. bug tracker, syzbot) and the response it gets from users would be enough, but apparently not, sadly. But given the way the kernel burns people out and outright ejects them, not too surprising.
based on my own testing, dm has a lot of footguns and, with some kernels, as little as 100 bytes of corruption to the underlying disk could render a dm-integrity volume completely unusable (requiring a full rebuild) https://github.com/khimaros/raid-explorations
As I understand it ZFS also has a lot of redundant metatdata (copies=3 on anything important), and also previous uberblocks[1].
In what way is XFS better? Genuine question, not really familiar with XFS.
[1]: https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSMetadata...
I think there's room to have your cake and eat it too, but I certainly can't blame you for caring about quality, that much is sure.
The main criticism in this thread about btrfs involves multidisk setups, which aren't relevant for me, since I'm working on cloud systems and disk storage is abstracted away as a single block device.
Also, the out-of-band deduplication for btrfs using https://github.com/Zygo/bees is very impressive and flexible, in a way that ZFS just doesn't match.
I do a ton of reading through forums gathering user input, and lots of people chime in with stories of lost filesystems. I've seen reports of lost filesystems with ZFS and I want to say I've seen them at around the same frequency of XFS; both are very rare.
My concern with ZFS is that they seem to have taken the same "no traditional fsck" approach as btrfs, favoring entirely online repair. That's obviously where we all want to be, but that's very hard to get right, and it's been my experience that if you prioritize that too much you miss the "disaster recovery" scenarios, and that seems to be what's happened with ZFS; I've read that if your ZFS filesystem is toast you need to send it to a data recovery service.
That's not something I would consider acceptable, fsck ought to be able to do anything a data recovery service would do, and for bcachefs it does.
I know the XFS folks have put a ton of outright paranoia into repair, including full on disaster recovery scenarios. It can't repair in scenarios where bcachefs can - but on the other hand, XFS has tricks that bcachefs doesn't, so I can't call bcachefs unequivocally better; we'd need to wait for more widespread usage and a lot more data.
Remarks like this come across as extremely patronizing, as you completely ignore what the other party says and instead project your own conclusions about the other persons motives and beliefs.
> his response to journal_rewind demonstrated that quite definitively
No, no it did not in any shape way or form do that. You had multiple other perfectly valid options to help the affected users besides getting that code shipped in the kernel there and then. Getting it shipped in the kernel was merely a convenience.
If bcachefs was established and stable it would be a different matter. But it's an experimental file system. Per definition data loss is to be expected, even if recovery is preferable.
If we had the fuse driver done that would have worked, though. Still not completely ideal because we're at the mercy of distros to make sure they're getting -tools updates out in a timely manner, they're not always as consistent with that as the kernel. Most are good, though).
Just making it available in a git repo was not an option because lots of bcachefs users are getting it from their distro kernel and have never built a kernel before (yes, I've had to help users with building kernels for the first time; it's slow and we always look for other options), and even if you know how, if your primary machine is offline the last thing you want to have to do is build a custom rescue image with a custom kernel.
And there was really nothing special about this than any other bugfix, besides needing to use a new option (which is also something that occasionally happens with hotfixes).
Bugs are just a fact of life, every filesystem has bugs and occasionally has to get hotfixes out quickly. It's just not remotely feasible or sane to be coming up with our own parallel release process for hotfixes.
But there are also reasons why things are the way they are, and that is also not unreasonable. And at the end of the day: Linus is the boss. It really does come down to that. He has dozens of other subsystem maintainers to deal with and this is the process that works for him.
Similar stuff applies to Debian. Personally, I deeply dislike Debian's inflexible and outmoded policy and lack of pragmatism. But you know, the policy is the policy, and at some point you just need to accept that and work with it the best you can.
It's okay to make all the arguments you've made. It's okay to make them forcefully (within some limits of reason). It's not okay to keep repeating them again and again until everyone gets tired of it and seemingly just completely fail to listen to what people are sating. This is where you are being unreasonable.
I mean, you *can* do that, I guess, but look at where things are now. No one is happy with this – certainly not you. And it's really not a surprise, I already said this in November last year: "I wouldn't be surprised to see bcachefs removed from the kernel at some point".[1] To be clear: I didn't want that to happen – I think you've done great work with bcachefs and I really want it to succeed every which way. But everyone could see this coming from miles.
XFS has burned through maintainers, citing "upstream burnout". It's not just bcachefs that things are broken for.
And it was burning me out, too. We need a functioning release process, and we haven't had that; instead I've been getting a ton of drama that's boiled over into the bcachefs community, oftentimes completely drowning out all the calmer, more technical conversations that we want.
It's not great. It would have been much better if this could have been worked out. But at this point, cutting ties with the kernel community and shipping as a DKMS module is really the only path forwards.
It's not the end of the world. Same with Debian; we haven't had those issues in any other distros, so eventually we'll get a better package maintainer who can work the process or they'll figure out that their Rust policy actually isn't as smart as they think it is as Rust adoption goes up.
I'm just going to push for doing things right, and if one route or option fails there's always others.
The rules were clear about the right time to merge things so they get in the next version, and if you don't, they will have to get in the version after that. I don't know the specific time since I'm not a kernel developer, but there was one.
Linus is trying to run the release cycle on a strict schedule, like a train station. You are trying to delay the train so that you can load more luggage on, instead of just waiting for the next train. You are not doing this once or twice in an emergency, but you are trying to delay every single train. Every single train, *you* have some "emergency" which requires the train to wait just for you. And the station master has gotten fed up and kicked you out of the station.
How can it be an emergency if it happens every single time? You need to plan better, so you will be ready before the train arrives. No, the train won't wait for you just because you forgot your hairbrush, and it won't even wait for you to go home and turn your oven off, even though that's really important. You have to get on the next train instead, but you don't understand that other people have their own schedules instead of running according to yours.
If it happened once, okay - shit happens. But it happens every time. Why is that? They aren't mad at you because of this specific feature. They are mad at you because it happens every time. It seems like bcachefs is not stable. Perhaps it really was an emergency just the one time you're talking about, but that means it either was an emergency all the other times and your filesystem is too broken to be in the kernel, or it wasn't an emergency all the other times and you chose to become the boy who cried wolf. In either case Linus's reaction is valid.
Do you argue with your school teachers that your book report shouldn't be due on Friday because it's not perfect yet?
I read several of your response threads across different websites. The most interesting to me was LWN, about the debian tools, where an actual psychologist got involved.
All the discussions seem to show the same issue: You disagree with policies held by people higher up than you, and you struggle with respecting their decisions and moving on.
Instead you keep arguing about things you can't change, and that leads people to getting frustrated and walking away from you.
It really doesn't matter how "right" you may be... not your circus, not your monkeys.
I will absolutely agree with you that merging that repair code would be vastly preferable to you and the users. And again, if bcachefs was mature and stable, I absolutely think users should get a way to repair ASAP.
But bcachefs is currently experimental and thus one can reasonably expect users to be prepared to deal with the consequences of that. And hence the kernel team, with Linus at the top, should be able to assume this when making decisions.
If you have users who are not prepared for this, you have a different problem and should seek how to fix that ASAP. Best would probably be to figure out how to dissuade them from installing. In any case, not doing something to prevent that scenario would be a disservice to those users.
Edit since you expanded your post:
>The most interesting to me was LWN, about the debian tools, where an actual psychologist got involved.
To me the comment was patronizing implying it was purely due to bad communication from Kent's end and shows how immature people are with running these operating system are. Putting priority on processes over the end user.
>respecting their decisions and moving on.
When this causes real pain for end users. It's validating that the decision was wrong.
> really doesn't matter how "right" you may be... not your circus
It does because it causes reputational damage for bcachefs. Even beyond reputational damage, delivering a good product to end users should be a priority. In my opinion projects as big as Debian causing harm to users should be called out instead of ignored. Else it can lead to practices like replacing dependencies out from underneath programs to become standard practice.
A lot of the bcachefs users are using it explicitly because they've been burned by btrfs and need something more reliable.
I am being much, much more conservative with removing the experimental label than past practice, but I have been very explicit that while it may not be perfect yet and users should expect some hiccups, I support it like any other stable production filesystem.
That's been key to getting it stabilized: setting a high expectations. Users know that if they find a critical bug it's going to be top priority.
Almost makes me think the distros light-forking it to just change the name (IceWeasel style) so the support requests don't get to him will help… probably not, though, because people will still go there because they want to recover their data.
But there's a ton of room for improvement beyond what ZFS did. ZFS was a very conservative design in a lot of ways (rightly so! so many ambitious projects die because of second system syndrome); notably, it's block based and doesn't do extents - extents and snapshots are a painfully difficult combination.
Took me years to figure that one out.
My hope for bcachefs has always been to be a real successor to ZFS, with better and more flexible management, better performance, and even better robustness and reliability.
Long road, but the work continues.
Regardless of differing points of view on the situation, I think everyone can agree that bcachefs being actively updated on Linus tree is a good thing, right?
If you were able to work at your own pace, and someone else took the responsibility of pulling your changes at a pace that satisfies Linus, wouldn't that solve the problem of Linux having a good modern/CoW filesystem?
You can't win a rules-lawyer argument with the rulemaker.
For many humans to work together over time on something very complex is hard. Structure and process are required. And sometimes they come at the expense of what some might call “pure” engineering. But they are the right trade offs to optimize for the actual goal.
If you can’t accept that, stick to solo projects.
Btrfs is constantly eating people data, it's a bad joke nowadays. Right now on Linux you're basically forced to constantly deal with out of tree ZFS or accept that thinly provisioned XFS over LVM2 will inevitably cause you to lose data.
We were never able to get any sane and consistent policy on bugfixes, and I don't have high hopes that anyone else will have better luck. The XFS folks have had their own issues with interference, leading to burnout - they're on their third maintainer, and it's really not good for a project to be cycling through maintainers and burning people out, losing consistency of leadership and institutional knowledge.
And I'm still seeing Linus lashing out at people on practically a weekly basis. I could never ask anyone else to have to deal with that.
I think the kernel community has some things they need to figure out before bcachefs can go back in.
Yeah, that's in place. If nothing else, the decades of successful releases indicate that the process -at worst- functions. Whether that process fits your process is irrelevant.
> You have to consider the bigger picture.
Right back at you. Buddy, you need to learn how to lose.
It's widely used and the default filesystem of several distributions. Most of the problems are like for the other filesystem: caused by the hardware.
I've been using it for more than 10 years without any problem and enjoy the experience. And like for any filesystem, I backup my data frequently (with btrbk, thanks for asking).
However, it was put in the kernel as experimental. That carries with it implications.
As such, while it's very commendable that you wish to support the experimental bcachefs as-if it was production ready, you cannot reasonably impose that wish upon the rest of the kernel.
That said I think you and your small team is doing a commendable job, and I strongly wish you succeed in making bcachefs feature complete and production ready. And I say that as someone who really, really likes ZFS and run it on my Linux boxes.
Say more? I can't say I've really thought that much about filesystems and I'm curious in what direction you think they could be taken if time and budget weren't an issue.
It's probably mostly stable now, but it's silly to act like it's a paragon of stability in the kernel.
And it's dishonest to act like bugs from 15 years ago justify present-tense claims that it is constantly eating people's data and is a bad joke. Nobody's arguing that btrfs doesn't have a past history of data loss, more than a decade ago; that's not what's being questioned here.
Tell it to my data then. I was 100% invested in Btrfs before 2017, the year where I lost a whole filesystem due to some random metadata corruption. I then started to move all of my storage to ZFS, which has never ever lost me a single byte of data yet despite the fact it's out of tree and stuff. My last Btrfs filesystem died randomly a few days ago (it was a disk in cold storage, once again random metadata corruption, disk is 100% healthy). I do not trust Btrfs in any shape and form nowadays. I also vastly prefer ZFS tooling but that's irrelevant to the argument here. The point is that I've never had nothing but pain from btrfs in more than a decade
This is my favorite side effect of compression in the right scenarios. I remember getting a huge speed up in a proprietary in-memory data structure by using LZO (or one of those fast algorithms) which outperformed memcpy, and this was already in memory so no disk io involved! And used less than a third of the memory.
I don't get why folks feel the need to come out and cheer for a tool like this, do you have skin in the game on whether or not btrfs is considered stable? Are you a contributor?
I don't get it.
But since you asked - let me find some recent bugs.
5.15.37 - fixes data corruption in database reads using btrfs https://www.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.15....
5.15.65 - fixes double allocation and cache corruption https://www.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.15....
6.1.105 - fixes O_APPEND with direct i/o can write corurpted files https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.1...
6.1.110 - fixes fsync race and corruption https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.1...
6.2.16 - fixes truncation of files causing data corruption https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.2.1...
btrfs-progs 6.2 fixes corruption on zstd extent read https://btrfs.readthedocs.io/en/latest/CHANGES.html
6.15.3, 4: possible data corruption, seems to be reparable: https://www.phoronix.com/news/Btrfs-Log-Tree-Corruption-Fix
Are people that encountered these also dishonest?
This is the difference between being smart and being wise. If the goal of all this grandstanding was that, it's so incredibly and vitally important for these patches to get into the kernel, well guess what, now due to all this drama this part of the kernel is going to go unmaintained entirely. Is that good for the users? Did that help our stated goal in any way? No.
Maybe, if you never create anything. I make a lot of game art source and much of that is in uncompressed formats. Like blend files, obj files, even DDS can compress, depending on the format and data, due to the mip maps inside them. Without FS compression it would be using GBs more space.
I'm not going to individually go through and micromanage file compression even with a tool. What a waste of time, let the FS do it.
The comapny behind it, iXsystems, pays for ZFS developers as well.
But the problem with comparisons is that even if you're better than nuclear waste being dumped into the aquifer, you still might be enough to light a river on fire.
The adult thing is to do best by the users. Critical file system bugs are worth blocking the release of any serious operating system in the real world as there is serious user impact.
>Is that good for the users?
I think it's complicated. It could allow for a faster release schedule for bug fixes which can allow for addressing file system issues faster.
It's an entirely clean slate design, and I spent years taking my time on the core planning out the design; it's as close to perfect as I can make it.
The only things I can think of that I would change or add given unlimited time and budget: - It should be written in Rust, and even better a Rust + dependent types (which I suspect could be done with proc macros) for formal verification. And cap'n proto for on disk data structures (which still needs Rust improvements to be as ergonomic as it should be) would also be a really nice improvement.
- More hardening; the only other thing we're lacking is comprehensive fault injection testing of on disk errors. It's sufficiently battle hardened that it's not a major gap, but it really should happen at some point.
- There's more work to be done in bitrot prevention: data checksums really need to be plumbed all the way into the pagecache
I'm sure we'll keep discovering new small ways to harden, but nothing huge at this point.
Some highlights: - It has more defense in depth than any filesystem I know of. It's as close to impossible to have unrecoverable data loss as I think can really be done in a practical production filesystem - short of going full immutable/append only.
- Closest realization of "filesystem as a database" that I know of
- IO path options (replication level, compression, etc.) can be set on a per file or directory basis: I'm midway through a project extending this to do some really cool stuff, basically data management is purely declarative.
- Erasure coding is much more performant than ZFS's
- Data layout is fully dynamic, meaning you can add/remove devices at will, it just does the right thing - meaning smoother device management than ZFS
- The way the repair code works, and tracking of errors we've seen - fantastic for debugability
- Debugability and introspection are second to none: long bug hunts really aren't a thing in bcachefs development because you can just see anything the system is doing
There's still lots of work to do before we're fully at parity with ZFS. Over the next year or two I should be finishing erasure coding, online fsck, failure domains, lots more management stuff... there will always be more cool projects just over the horizon
Best by users in the long term is predictable processes. "RC = pure bug fixes" is a battle tested, dependable rule, absence of which causes chaos.
> Critical file system bugs are worth blocking the release
"Experimental" label EXACTLY to prevent this stuff from blocking release. Do you not know that bcachefs is experimental? This is an example of another rule which helps predictability.