Most active commenters

koverstreet(7)
charcircuit(4)
motorest(3)

Bcachefs Goes to "Externally Maintained"

(lwn.net)

Show context

tarruda ◴[30 Aug 25 19:01 UTC] No.45077138[source]▶

Since the existing bcachefs driver will not be removed, and the problem is the bcachefs developer not following the rules, I wonder if someone else could take on the role of pulling bcachefs changes into the mainline, while also following the merge window rules.

replies(1): >>45078845 #

koverstreet ◴[30 Aug 25 23:14 UTC] No.45078845[source]▶

>>45077138 #

No, the problem wasn't following the rules.

The patch that kicked off the current conflict was the 'journal_rewind' patch; we recently (6.15) had the worst bug in the entire history upstream - it was taking out entire subvolumes.

The third report got me a metadata dump with everything I needed to debug the issue, thank god, and now we have a great deal of hardening to ensure a bug like this can never happen again. Subsequently, I wrote new repair code, which fully restored the filesystem of the 3rd user hit by the bug (first two had backups).

Linus then flipped out because it was listed as a 'feature' in the pull request; it was only listed that way to make sure that users would know about it if they were affected by the original bug and needed it. Failure to maintain your data is always a bug for a filesystem, and repair code is a bugfix.

In the private maintainer thread, and even in public, things went completely off the rails, with Linus and Ted basically asserting that they knew better than I do which bcachefs patches are regression risks (seriously), and a page and a half rant from Linus on how he doesn't trust my judgement, and a whole lot more.

There have been many repeated arguments like this over bugfixes.

The thing is, since then I started perusing pull requests from other subsystems, and it looks like I've actually been more conservative with what I consider a critical bugfix (and send outside the merge window) than other subsystems. The _only_ thing that's been out of the ordinary with bcachefs has been the volume of bugfixes - but that's exactly what you'd expect to see from a new filesystem that's stabilizing rapidly and closing out user bug reports - high volume of pure bugfixing is exactly what you want to see.

So given that, I don't think having a go-between would solve anything.

replies(6): >>45079059 #>>45079670 #>>45080227 #>>45081254 #>>45082752 #>>45083951 #

nirava ◴[30 Aug 25 23:52 UTC] No.45079059[source]▶

>>45078845 #

To list down the current state of things:

1. Regardless of whether correct or not, it's Linus that decides what's a feature and what's not in Linux. Like he has for the last however many decades. Repair code is a feature if Linus says it is a feature.

2. Being correct comes second to being agreeable in human-human interactions. For example, dunking on x file system does not work as a defense when the person opposite you is a x file system maintainer.

3. rules are rules, and generally don't have to be "correct" to be enforced in an organization

I think your perceived "unfairness" might make sense if you just thought of these things as un-workaroundable constraints, Just like the fact that SSDs wear out over time.

replies(2): >>45079083 #>>45081078 #

koverstreet ◴[30 Aug 25 23:57 UTC] No.45079083[source]▶

>>45079059 #

When rules and authority start to take precedence over making sure things work, things have gone off the rails and we're not doing engineering anymore.

replies(4): >>45079758 #>>45080373 #>>45081150 #>>45083228 #

ranger_danger ◴[31 Aug 25 02:17 UTC] No.45079758[source]▶

>>45079083 #

I think this attitude is exactly why this happened. I would have done the same thing.

Do you argue with your school teachers that your book report shouldn't be due on Friday because it's not perfect yet?

I read several of your response threads across different websites. The most interesting to me was LWN, about the debian tools, where an actual psychologist got involved.

All the discussions seem to show the same issue: You disagree with policies held by people higher up than you, and you struggle with respecting their decisions and moving on.

Instead you keep arguing about things you can't change, and that leads people to getting frustrated and walking away from you.

It really doesn't matter how "right" you may be... not your circus, not your monkeys.

replies(2): >>45080026 #>>45081188 #

charcircuit ◴[31 Aug 25 03:08 UTC] No.45080026[source]▶

>>45079758 #

Your analogy fails to account that after "Friday" bug fixes are still allowed. A file system losing your files sounds like a bug to me.

Edit since you expanded your post:

>The most interesting to me was LWN, about the debian tools, where an actual psychologist got involved.

To me the comment was patronizing implying it was purely due to bad communication from Kent's end and shows how immature people are with running these operating system are. Putting priority on processes over the end user.

>respecting their decisions and moving on.

When this causes real pain for end users. It's validating that the decision was wrong.

> really doesn't matter how "right" you may be... not your circus

It does because it causes reputational damage for bcachefs. Even beyond reputational damage, delivering a good product to end users should be a priority. In my opinion projects as big as Debian causing harm to users should be called out instead of ignored. Else it can lead to practices like replacing dependencies out from underneath programs to become standard practice.

replies(2): >>45080576 #>>45084011 #

wavemode ◴[31 Aug 25 05:22 UTC] No.45080576[source]▶

>>45080026 #

You still seem to be arguing that, shipping the change was the "right" thing to do. But that's not what's in dispute. Rather it is that, if what you think is right and what the person who makes the rules thinks is right are in disagreement, the adult thing to do is not to simply disregard the rules (and certainly not repeatedly, after being warned not to).

This is the difference between being smart and being wise. If the goal of all this grandstanding was that, it's so incredibly and vitally important for these patches to get into the kernel, well guess what, now due to all this drama this part of the kernel is going to go unmaintained entirely. Is that good for the users? Did that help our stated goal in any way? No.

replies(1): >>45080659 #

charcircuit ◴[31 Aug 25 05:48 UTC] No.45080659[source]▶

>>45080576 #

>the adult thing to do is not to simply disregard the rules

The adult thing is to do best by the users. Critical file system bugs are worth blocking the release of any serious operating system in the real world as there is serious user impact.

>Is that good for the users?

I think it's complicated. It could allow for a faster release schedule for bug fixes which can allow for addressing file system issues faster.

replies(2): >>45080980 #>>45081134 #

nirava ◴[31 Aug 25 06:58 UTC] No.45080980[source]▶

>>45080659 #

> The adult thing is to do best by the users

Best by users in the long term is predictable processes. "RC = pure bug fixes" is a battle tested, dependable rule, absence of which causes chaos.

> Critical file system bugs are worth blocking the release

"Experimental" label EXACTLY to prevent this stuff from blocking release. Do you not know that bcachefs is experimental? This is an example of another rule which helps predictability.

replies(1): >>45081206 #

1. charcircuit ◴[31 Aug 25 07:39 UTC] No.45081206[source]▶

>>45080980 #

This was a bug fix. My point is that there will always be bugs in the kernel so not all bugs are worth blocking a release, but losing data is worth blocking the release for.

>"Experimental" label EXACTLY to prevent this stuff from blocking release

In practice bcachefs is used in production with real users. If the experimental label prevents critical bug fixes from making it into the kernel then it would be better to just remove that label.

replies(2): >>45082368 #>>45082462 #

2. motorest ◴[31 Aug 25 11:30 UTC] No.45082368[source]▶

>>45081206 (TP) #

> This was a bug fix.

I'm not sure exactly what you are talking about, and I'm not sure you do either. The discussion that preceded bcachefs to be dropped from the Linux kernel mainline involved an attempt to sneak a new features in RC, sidestepping testing and QA work, which was followed up by yet more egregious behavior from the mantainer.

https://www.phoronix.com/news/Linux-616-Bcachefs-Late-Featur...

replies(1): >>45082428 #

3. charcircuit ◴[31 Aug 25 11:42 UTC] No.45082428[source]▶

>>45082368 #

>sneak a new features in RC

Too solve a bug with the filesystem that people in the wild were hitting. Like how Linus has said in the past with how there is a blurry line between security fixes and bug fixes. There is a blurry line between filesystem bugs and recovery features.

If you read the email it is clear that the full feature has more work needed and this is more of a basic implementation to address bugs that people hit in the wild.

replies(1): >>45082954 #

4. dijksterhuis ◴[31 Aug 25 11:51 UTC] No.45082462[source]▶

>>45081206 (TP) #

> In practice bcachefs is used in production with real users. If the experimental label prevents critical bug fixes from making it into the kernel then it would be better to just remove that label.

alternative perspective: those users have knowingly and willingly put experimental software into production. it was their choice, they were informed of the risk and so the consequences and responsibility are their’s.

it’s like signing up to take some experimental medicine, and then complaining no-one told me about the side-effect of persistent headaches.

that doesn’t stop anyone from being user-centric in their approach, e.g. call me if you notice any symptoms and i’ll come round your house to examine you.

… as long as everyone is clear about the fact it is experimental and the boundaries/limitations that apply, e.g. there will be certain persistent headache medicines that cannot be prescribed to you, or it might take longer for them to work because you’re on an experimental medicine.

replies(1): >>45083163 #

5. motorest ◴[31 Aug 25 13:19 UTC] No.45082954{3}[source]▶

>>45082428 #

> Too solve a bug with the filesystem that people in the wild were hitting.

So you acknowledge that this last episode involved trying to push new features into a RC.

As it was made abundantly clear, not only is the point of RC branches to only get tiny bugfixes after testing, the feature work that was presented was also untested and risked introducing major regressions.

All these red flags were repeatedly raised in the mailing list by multiple kernel maintainers. Somehow you're ignoring all the feedback and warnings and complains raised by people from Linux kernel maintainers, and instead you've opted to try to gaslight the thread.

replies(1): >>45083260 #

6. koverstreet ◴[31 Aug 25 13:54 UTC] No.45083163[source]▶

>>45082462 #

Again: the elephant in the room is that a lot of bcachefs users are using it explicitly because they have lost a lot of data on btrfs, and they've found it to be more trustworthy.

This puts us all in a shitty situation. I want the experimental label to come off at the right time - when every critical bug is fixed and it's as trustworthy as I can reasonably make it, when I know according to the data I have that everyone is going to have a good experience - but I have real users who need this thing and need to be supported.

There is _no reason_ to interpret the experimental label in the way that you're saying, you're advocating that reliability for the end user be deprioritized versus every other filesystem.

But deprioritizing reliability is what got us into this mess.

replies(1): >>45083763 #

7. koverstreet ◴[31 Aug 25 14:10 UTC] No.45083260{4}[source]▶

>>45082954 #

No, I'm sorry but you're simply wrong.

bcachefs has a ton of QA, both automated testing and a lot of testers that run my latest and I work with on a daily basis. The patch was well tested; it was for codepaths that we have good regression tests for, it was algorithmically simple, and it worked perfectly to recover a filesystem from the original bug report, and it performed flawlessly again not long after.

I've explained my testing and QA on the lists multiple times.

You, like the other kernel maintainers in that thread, are making wild assertions despite having no involvement with the project.

replies(2): >>45083367 #>>45085121 #

8. motorest ◴[31 Aug 25 14:24 UTC] No.45083367{5}[source]▶

>>45083260 #

> No, I'm sorry but you're simply wrong.

It sounds like you have a hard time coping with reality.

https://www.phoronix.com/news/Linux-616-Bcachefs-Late-Featur...

I repeat: it sounds an awful lot like you are trying to gaslight this thread. Not cool.

When this fact was again explicitly pointed out to you by Linus himself, you even tried to bullshit Linus and try to move the goalpost with absurd claims about how somehow it was ok to force untested and unreviewed features into a RC because somehow you know better about what users want or need as if it was some kind of justification for you to skip testing and proper release processes.

You need to set aside some time for introspection because you sound like you are your own worst enemy. And those you interact with seem to be fed up and had enough of these stunts.

replies(1): >>45084654 #

9. rob_c ◴[31 Aug 25 15:10 UTC] No.45083763{3}[source]▶

>>45083163 #

>users are using it explicitly because they have lost a lot of data on btrfs

PLEASE, honestly, EDUCATE THESE USERS. This is still marked experimental for numerous reasons regardless of the 'planned work for 6.18'. Users who can't suffer any data loss and are repeating their mistake of using btrfs shouldn't be using a none default/standard/hardened filesystem period.

replies(1): >>45084241 #

10. koverstreet ◴[31 Aug 25 16:08 UTC] No.45084241{4}[source]▶

>>45083763 #

No, really. People aren't losing data on bcachefs. We still have minor hiccups that do affect usability, and I put a lot of effort into educating users about where we're at and what to expect.

In the past I've often told people who wanted to migrate off of btrfs "check back in six months", but I'm not now because 6.16 is looking amazingly solid; all the data I have says that your data really is safer on bcachefs than btrfs.

I'm not advocating for people to jump from ext4/xfs/zfs, that needs more time.

replies(1): >>45084861 #

11. koverstreet ◴[31 Aug 25 16:49 UTC] No.45084654{6}[source]▶

>>45083367 #

The changes weren't untested or unreviewed, and they've performed flawlessly on quite a few occasions since then.

Sorry, the only person gaslighting here is you.

12. trueismywork ◴[31 Aug 25 17:10 UTC] No.45084861{5}[source]▶

>>45084241 #

You're arguing in circles. Either bcachefs is experimental and hence needs a lot of changes and tools to make sure that users dont lose data and hence the fixes are not critical/users can use a custom branch. Or it is stable and the only thing users need is actual big fixes. Not new tools in an RC3.

Don't compare bcachefs with btrfs for stability. Compare it with ext4. (And dont care anecdotal data, compare the process).

replies(1): >>45087245 #

13. majorchord ◴[31 Aug 25 17:38 UTC] No.45085121{5}[source]▶

>>45083260 #

As a rule, strong feelings about issues do not emerge from deep understanding.

14. koverstreet ◴[31 Aug 25 21:25 UTC] No.45087245{6}[source]▶

>>45084861 #

So, are we agreeing that btrfs isn't fit for purpose, then?

replies(1): >>45088872 #

15. trueismywork ◴[01 Sep 25 02:24 UTC] No.45088872{7}[source]▶

>>45087245 #

I don't understand your question. Are you going somewhere with this?

↑