←back to thread

216 points ksec | 3 comments | | HN request time: 0.417s | source
Show context
tarruda ◴[] No.45077138[source]
Since the existing bcachefs driver will not be removed, and the problem is the bcachefs developer not following the rules, I wonder if someone else could take on the role of pulling bcachefs changes into the mainline, while also following the merge window rules.
replies(1): >>45078845 #
koverstreet ◴[] No.45078845[source]
No, the problem wasn't following the rules.

The patch that kicked off the current conflict was the 'journal_rewind' patch; we recently (6.15) had the worst bug in the entire history upstream - it was taking out entire subvolumes.

The third report got me a metadata dump with everything I needed to debug the issue, thank god, and now we have a great deal of hardening to ensure a bug like this can never happen again. Subsequently, I wrote new repair code, which fully restored the filesystem of the 3rd user hit by the bug (first two had backups).

Linus then flipped out because it was listed as a 'feature' in the pull request; it was only listed that way to make sure that users would know about it if they were affected by the original bug and needed it. Failure to maintain your data is always a bug for a filesystem, and repair code is a bugfix.

In the private maintainer thread, and even in public, things went completely off the rails, with Linus and Ted basically asserting that they knew better than I do which bcachefs patches are regression risks (seriously), and a page and a half rant from Linus on how he doesn't trust my judgement, and a whole lot more.

There have been many repeated arguments like this over bugfixes.

The thing is, since then I started perusing pull requests from other subsystems, and it looks like I've actually been more conservative with what I consider a critical bugfix (and send outside the merge window) than other subsystems. The _only_ thing that's been out of the ordinary with bcachefs has been the volume of bugfixes - but that's exactly what you'd expect to see from a new filesystem that's stabilizing rapidly and closing out user bug reports - high volume of pure bugfixing is exactly what you want to see.

So given that, I don't think having a go-between would solve anything.

replies(6): >>45079059 #>>45079670 #>>45080227 #>>45081254 #>>45082752 #>>45083951 #
nirava ◴[] No.45079059[source]
To list down the current state of things:

1. Regardless of whether correct or not, it's Linus that decides what's a feature and what's not in Linux. Like he has for the last however many decades. Repair code is a feature if Linus says it is a feature.

2. Being correct comes second to being agreeable in human-human interactions. For example, dunking on x file system does not work as a defense when the person opposite you is a x file system maintainer.

3. rules are rules, and generally don't have to be "correct" to be enforced in an organization

I think your perceived "unfairness" might make sense if you just thought of these things as un-workaroundable constraints, Just like the fact that SSDs wear out over time.

replies(2): >>45079083 #>>45081078 #
koverstreet ◴[] No.45079083[source]
When rules and authority start to take precedence over making sure things work, things have gone off the rails and we're not doing engineering anymore.
replies(4): >>45079758 #>>45080373 #>>45081150 #>>45083228 #
motorest ◴[] No.45081150[source]
> When rules and authority start to take precedence over making sure things work, (...)

Didn't Linus lambast you for "lack of testing and collaboration before submitting patches", to the point the patches you were trying to push weren't even building?

https://ostechnix.com/linus-torvalds-expresses-frustration-w...

replies(1): >>45082917 #
koverstreet ◴[] No.45082917[source]
Linus has broken the build more recently than I have. (In the time since bcachefs went upstream, we've both done that once, that I've seen).

Linus doesn't seem to believe in automated testing. He just seems to think that there's no way I could QA code as quickly as I do, but that's because I've invested heavily in automated testing and building up a community of people doing very good testing and QA work; bcachefs's automated testing is the best of any upstream filesystem that I've seen (there's a whole cluster of machines dedicated to this), and I have people running my latest branch on a daily basis.

Nearly all of the collaboration just happens on IRC.

For big changes I wait for explicit acks from testers that they've ran it and things look good; a lot of people read and review my code too, it's just typically less formal than the rest of the kernel.

replies(2): >>45083105 #>>45083428 #
1. motorest ◴[] No.45083428[source]
> Linus has broken the build more recently than I have.

Even taking your claims at face value (which from this thread alone is a heck of a leap) I'm baffled by the way you believe this holds any relevance.

I mean, the kernel project has in place a quality assurance process designed to minimize the odds of introducing problems when preparing a release. You were caught purposely ignoring any QA process in place and trying to circumvent the whole quality assurance process and sneak into a RC features that were untested and unverified.

There is a QA process, and you purposely decided to ignore it and plow away. And then your best argument for purposely ignoring any semblance of QA is that others may or may not have broken a build before?

Come on, man. You know better than this. How desperate are you to avoid any accountability to pull these gaslighting stunts?

replies(2): >>45083771 #>>45091388 #
2. koverstreet ◴[] No.45083771[source]
Please, tell us about these wonderful QA processes the kernel has.
3. rcxdude ◴[] No.45091388[source]
I would also like to know what the QA process is, because all I can see is basically 'linus pulls in changes in the merge window, checks that the basic stuff builds, then releases the RCs and some people do some checks in some way, varying from users on the bleeding edge, some people doing manual verification on specific hardware and use-cases, and maybe some automatic tests and analysis that are not really documented anywhere, and the end result is some bug reports'. Is there anything more co-ordinated than that? Like some description of what is tested and how, or an explicit green indication that those tests have actually happened and a policy on what would hold up a release?