←back to thread

215 points ksec | 3 comments | | HN request time: 0.001s | source
Show context
tarruda ◴[] No.45077138[source]
Since the existing bcachefs driver will not be removed, and the problem is the bcachefs developer not following the rules, I wonder if someone else could take on the role of pulling bcachefs changes into the mainline, while also following the merge window rules.
replies(1): >>45078845 #
koverstreet ◴[] No.45078845[source]
No, the problem wasn't following the rules.

The patch that kicked off the current conflict was the 'journal_rewind' patch; we recently (6.15) had the worst bug in the entire history upstream - it was taking out entire subvolumes.

The third report got me a metadata dump with everything I needed to debug the issue, thank god, and now we have a great deal of hardening to ensure a bug like this can never happen again. Subsequently, I wrote new repair code, which fully restored the filesystem of the 3rd user hit by the bug (first two had backups).

Linus then flipped out because it was listed as a 'feature' in the pull request; it was only listed that way to make sure that users would know about it if they were affected by the original bug and needed it. Failure to maintain your data is always a bug for a filesystem, and repair code is a bugfix.

In the private maintainer thread, and even in public, things went completely off the rails, with Linus and Ted basically asserting that they knew better than I do which bcachefs patches are regression risks (seriously), and a page and a half rant from Linus on how he doesn't trust my judgement, and a whole lot more.

There have been many repeated arguments like this over bugfixes.

The thing is, since then I started perusing pull requests from other subsystems, and it looks like I've actually been more conservative with what I consider a critical bugfix (and send outside the merge window) than other subsystems. The _only_ thing that's been out of the ordinary with bcachefs has been the volume of bugfixes - but that's exactly what you'd expect to see from a new filesystem that's stabilizing rapidly and closing out user bug reports - high volume of pure bugfixing is exactly what you want to see.

So given that, I don't think having a go-between would solve anything.

replies(6): >>45079059 #>>45079670 #>>45080227 #>>45081254 #>>45082752 #>>45083951 #
nirava ◴[] No.45079059[source]
To list down the current state of things:

1. Regardless of whether correct or not, it's Linus that decides what's a feature and what's not in Linux. Like he has for the last however many decades. Repair code is a feature if Linus says it is a feature.

2. Being correct comes second to being agreeable in human-human interactions. For example, dunking on x file system does not work as a defense when the person opposite you is a x file system maintainer.

3. rules are rules, and generally don't have to be "correct" to be enforced in an organization

I think your perceived "unfairness" might make sense if you just thought of these things as un-workaroundable constraints, Just like the fact that SSDs wear out over time.

replies(2): >>45079083 #>>45081078 #
quotemstr ◴[] No.45081078[source]
> Being correct comes second to being agreeable in human-human interactions

Prioritizing agreeableness above correctness is the reason the space shuttle Challenger blew up.

The bcachefs fracas is interesting and important because it's like a stain making some damn germ's organelles visible: it highlights a psychological division in tech and humanity in general between people who prioritize

1) deferring to authority, reading the room, knowing your place

and people who prioritize

2) insisting on your concept of excellence, standing up against a crowd, and speaking truth to power.

I am disturbed to see the weight position #1 has accumulated over the past decade or two. These people argue that Linus could be arbitrarily wrong and Overstreet arbitrarily right and it still wouldn't matter because being nice is critical to the success of a large scale project or something.

They get angry because they feel comfort in understanding their place in a social hierarchy. Attempts to upend that hierarchy in the name of what's right creates cognitive dissonance. The rule-followers feel a tension they can relieve only by ganging up and asserting "rules are rules and you need to follow them!" --- whether or not, at the object level, a) there are rules, b) the rules are beneficial, and c) whether the rules are applied consistently. a, b, and c are exactly those object-level does-the-o-ring-actually-work-when-cold considerations that the rule-following, rule-enforcing kind of person rejects in favor a reality built out of words and feelings, not works and facts.

They know it, too. They need Overstreet and other upstarts to fail: the failure legitimizes their own timid acquiescence to rules that make no sense. If other people are able to challenge rules and win, the #1 kind of person would have to ask himself serious and uncomfortable questions about what he's doing with his life.

It's easier and psychologically safer to just tear down anyone trying to do something new or different.

The thing is all technological progress depends on the #2 people winning in the end. As Feynmann talked about when diagnosing this exact phenomenon as the root cause of the Challenger disaster, mother nature (who appears to have taken on corrupting filesystems as a personal hobby of hers) does not care one bit about these word games or how nice someone is. The only thing that matters when solving a problem of technology is whether something works.

I think a lot of people in tech have entirely lost sight of this reality. I can't emphasize enough how absurd it is to state "[b]eing correct comes second to being agreeable in human-human interactions" and how dangerously anti-technology, anti-science, and-civilization, and anti-human this poison mindset is.

replies(4): >>45081112 #>>45083278 #>>45083304 #>>45083792 #
koverstreet ◴[] No.45083278[source]
Thanks, I've been struggling to put this into words.

When you're working on the core technology we all depend on, correctness is not optional.

replies(1): >>45083355 #
nullc ◴[] No.45083355[source]
Linux is not correct. Linux has never been correct. Linux will never be correct. An incorrect belief that it is correct can only make it less correct.

You must know this when it comes to your own work. Why isn't bcachefs written in augmented rust with dependent types and formal correctness proofs for every line of code? How could there ever be a data losing bug if you had a formal proof that the file system could never lose data? Wouldn't that be more correct?

Turns out when some strong/broad notion of correctness isn't (practically) possible it is, in fact, very optional.

Good project management is all about managing resources and balancing tradeoffs. Sometimes this means making or allowing some things to be worse for the benefit of something else or in adherence to a process with a proven track record. Almost every choice makes something less correct than it could be-- with a goal of slowly inching towards a more perfect state overall in the long run.

It's also beneficial to rock the boat a bit at times, people can be wrong, processes can need improvement-- but there is a correct level, timing, and approach to achieve the best benefit. I expect that the kind of absolute approach you seem to have adopted in comments is unlikely to be successful at effective beneficial change.

replies(1): >>45083642 #
1. quotemstr ◴[] No.45083642[source]
You're staking out quite the postmodernist position there. All models are wrong, so who's to say that Alice's data corruption is worse than Bob's man page typo? The important thing is we stick to process with a proven track record, right?

I don't buy it. Object level considerations do matter. Alice's bug really is worse than Bob's. That "proven track record" shouldn't apply to Alice, and insisting that it does for the sake of process, in a way indifferent to the facts of the situation, is just a pretext for doing primate social hierarchy deference rituals in a situation in which they're producing a worse outcome and everyone knows it.

replies(1): >>45083778 #
2. nullc ◴[] No.45083778[source]
> Object level considerations do matter.

They do. And Kent expressed them and the linux kernel maintainers are amply qualified to hear out and make a call. I don't see a reason to think they were indifferent to the facts, they just weren't convinced by them. If they were they could have just said, "okay we think that this does qualify as a bugfix".

My understanding is the change in dispute wasn't over fixing the corruption introducing bug, but rather adding automated repair for cases where the corruption had already happened. I could easy see taking a position of "sad for people who are already corrupt, they can get their work around out of tree for now" (or heck, even forever depending on the scale of the impact).

Anyone who has been around for a while has seen their share of 'ate the horse to catch the spider to catch the fly to...' dance, of course the patch author is convinced that their repair is correct. They're almost always convinced of that or they don't submit it, so that carries little information. Because of this there is a strong preference for obviously minimal code in any kind of fix. Minimizing user suffering is important, but we also know every line of code comes with risk. The fact that the risk is not measurable on a case by case basis doesn't make it any less real.

replies(1): >>45084453 #
3. quotemstr ◴[] No.45084453[source]
Thanks for the thoughtful reply.

> I don't see a reason to think they were indifferent to the facts

I don't think the Linux people thought of themselves as indifferent to facts. Nor do I think they were, not at first. Most people imagine themselves as fair-minded truth-seekers. When stakes are low, they usually act like it. It's only under pressure that people reveal whether they're more committed to PR or progress.

The shitty thing about this situation is that as the dispute escalated, the technical merits of change faded from relevance. (Linus even pulled the corruption repair work in the end!) The argument transformed into a dispute over power, pride, and personalities. Linus's commitment to technical excellence was tested. It failed. Consequently, Linux will lack a cutting-edge filesystem.

I don't even object to Linus being BDFL of Linux. Somebody has to make decisions. I think Linus was wrong to reject the corruption fix patch, but he could plausibly have been right. He had an opportunity to explain his patch rejection in such a way that Overstreet would have understood it as final but also felt heard and valued. Overstreet would have been upset, and justifiably so, but by the next merge window both sides would have cooled down and progress would have resumed.

It's when Linus banned Overstreet and bcachefs from the project that he departed irrecoverably from defensibility. Linus might think he's punishing Overstreet for his intransigence by blocking his work, but Linus is actually taking his frustration out on every Linux user instead. Overstreet's ban is rooted in primate power psychology, not technical trade-offs, and it makes everyone lose.

Technical leaders who ostracize brilliant but difficult people forever cap the amount of progress we can make in the fight against the limits of nature. They're neglecting their responsibilities as leaders to harness difficult people. It's not an easy job, but being a leader shouldn't be.

Linus took the easy way out and banned the brilliant troublemaker. He should be ashamed.

> the risk is not measurable on a case by case basis

It often is. That's why when I'm on the Linus side of a case like this, I try to avoid saying "no" and instead say "yes, if". Sometimes my counterparty pulls out an "if" that convinces me.