Most active commenters
  • koverstreet(13)
  • rob_c(6)
  • bgwalter(3)
  • dataflow(3)
  • eviks(3)
  • cwillu(3)
  • bombcar(3)
  • motorest(3)

←back to thread

144 points ksec | 93 comments | | HN request time: 2.723s | source | bottom
1. criticalfault ◴[] No.44466573[source]
I've been following this for a while now.

Kent is in the wrong. Having a lead position in development I would kick Kent of the team.

One thing is to challenge things. What Kent is doing is something completely different. It is obvious he introduced a feature, not only a Bugfix.

If the rules are set in a way that rc1+ gets only Bugfixes, then this is absolutely clear what happens with the feature. Tolerating this once or twice is ok, but Kent is doing this all the time, testing Linus.

Linus is absolutely in the right to kick this out and it's Kent's fault if he does so.

replies(8): >>44466668 #>>44467387 #>>44467968 #>>44468790 #>>44468966 #>>44469158 #>>44470642 #>>44470736 #
2. Pet_Ant ◴[] No.44466668[source]
Why take it out of the kernel? Why not just make someone responsible the maintainer so they can say "no, next release" to his shenanigans? It can't be the license.
replies(3): >>44466729 #>>44466794 #>>44467109 #
3. nolist_policy ◴[] No.44466729[source]
Kent can appoint a suitable maintainer if he wishes. That's his job, not Linus'.
4. criticalfault ◴[] No.44466794[source]
This is for me unclear as well, but I'm saying I wouldn't hold it against Linus if he did this. And based on Kent's behavior he has full right to do so.

A way to handle this would be with one person (or more) in between Kent and Linus. And maybe a separate tree only for changes and fixes from bcachefs that those people in between would forward to Linus. A staging of sorts.

5. tliltocatl ◴[] No.44467109[source]
Maintainers aren't getting paid and so cannot be "appointed". Someone must volunteer - and most people qualified and motivated enough are already doing something else.
replies(1): >>44467301 #
6. timewizard ◴[] No.44467301{3}[source]
Presumably there would be an open call where people would nominate themselves for consideration. These are problems that have come up and been solved in human organizations for hundreds of years before the kernel even existed.
replies(1): >>44467788 #
7. pmarreck ◴[] No.44467387[source]
This can happen with primadonna devs who haven't had to collaborate in a team environment for a long time.

It's a damn shame too because bcachefs has some unique features/potential

replies(1): >>44471352 #
8. xorcist ◴[] No.44467788{4}[source]
There is no call. Anyone can volunteer at any time.

Software take up no space and there is no scarcity. Theoretically there could be any number of maintainers and what gets uptake is the de facto upstream. That's what people refer to when they talk about free software development in terms of meritocracy.

replies(1): >>44469688 #
9. bgwalter ◴[] No.44467968[source]
bcachefs is experimental and Kent writes in the LWN comments that nothing would get done if he didn't develop it this way. Filesystems are a massive undertaking and you can have all the rules you want. It doesn't help if nothing gets developed.

It would be interesting how strict the rules are in the Linux kernel for other people. Other projects have nepotistic structures where some developers can do what they want but others cannot.

Anyway, if Linus had developed the kernel with this kind of strictness from the beginning, maybe it wouldn't have taken off. I don't see why experimental features should follow the rules for stable features.

replies(3): >>44468097 #>>44471052 #>>44471394 #
10. yjftsjthsd-h ◴[] No.44468097[source]
If it's an experimental feature, then why not let changes go into the next version?
replies(1): >>44468133 #
11. bgwalter ◴[] No.44468133{3}[source]
That is a valid objection, but I still think that for some huge and difficult features the month long pauses imposed by release cycles are absolutely detrimental.

Ideally they'd be developed outside the kernel until they are perfect, but Kent addresses this in his LWN comment: There is no funding/time to make that ideal scenario possible.

replies(3): >>44468166 #>>44468709 #>>44473730 #
12. jethro_tell ◴[] No.44468166{4}[source]
He could release a patch that can be pulled by the people that need it.

If you’re using experimental file systems, I’d expect you to be pretty competent in being able to hold your own in a storage emergency, like compiling a kernel if that’s the way out.

This is a made up emergency, to break the rules.

replies(1): >>44470658 #
13. Analemma_ ◴[] No.44468709{4}[source]
This position seems so incoherent. If it’s so experimental, why is it in the mainline kernel? And why are fixes so critical they can’t wait for a merge window? Who is using an “experimental” filesystem for mission-critical work that also has to be on untested bleeding-edge code?

Like the sibling commenter, I suspect the word “experimental” is being used here to try and evade the rules that, somehow, every other part of the kernel manages to do just fine with.

replies(2): >>44468887 #>>44470216 #
14. redeeman ◴[] No.44468790[source]
there have been several examples of other exceptions. we are talking data corruption here. Kent may not be the best communicator, but he cares about what matters. you'd rather see people lose their data than bending rules.
15. koverstreet ◴[] No.44468887{5}[source]
No, you have to understand that filesystems are massive (decade+) projects, and one of the key things you have to do with anything that big that has to work that perfectly is a very gradual rollout, starting with the more risk tolerant users and gradually increasing to a wider and wider set of users.

We're very far along in that process now, but it's still marked as experimental because it is not quite ready for widespread deployment by just anyone. 6.16 is getting damn close, though.

That means a lot of our users now are people getting it from distro kernels, who often have never compiled a kernel before - nevertheless, they can and do report bugs.

And no matter where you are in the rollout, when you get bug reports you have to fix them and get the fixes out to users in a timely manner so that they can keep running, keep testing and reporting bugs.

It's a big loss if a user has to wait 3 months for a bugfix - they'll get frustrated and leave, and a big part of what I do is building a community that knows how the system works, how to help debug, and how to report those bugs.

A very common refrain I get is "it's experimental <expletive deleted>, why does it matter?" - and, well, the answer is getting fixes out in a timely manner matters just as much if not more if we want to get this thing done in a timely manner.

replies(7): >>44469116 #>>44469832 #>>44470468 #>>44471432 #>>44472307 #>>44472645 #>>44476867 #
16. alphazard ◴[] No.44468966[source]
> Having a lead position in development I would kick Kent of the team.

I've seen this sentiment a lot lately. That disagreeable top performers have to be disposed of because they are "toxic" or "problematic".

You aren't doing your job as a leader if this is your attitude to good engineers. Engineering is a field where a small amount of the people create a large amount of the value. You can either understand that, and take it upon yourself to integrate disagreeable yet high performing people into the team, paving over the rough patches yourself. Or you can oust them, and quite literally take a >50% productivity hit on your team.

A disagreeable person will take up more of your time as a manager, but a high performer is worth significantly more of your time. When these traits co-occur in the same person, the cost-benefit is complicated. The reason we talk about this problem a lot in tech is because it is legitimately a tough call, with errors in both directions. Wishing that the right move was always as simple as kicking someone off the team doesn't make it true, although it may relieve you from having to contend with the decision.

replies(11): >>44469067 #>>44469102 #>>44471563 #>>44472193 #>>44472275 #>>44472301 #>>44472325 #>>44472355 #>>44472556 #>>44473093 #>>44473839 #
17. koverstreet ◴[] No.44469067[source]
It's not one or the other.

Ideally, you teach people how to get along better together; I think of my job as manager (and I effectively do manage a large team these days) as one of teaching and fostering good communication.

replies(1): >>44475007 #
18. viraptor ◴[] No.44469102[source]
> Or you can oust them, and quite literally take a >50% productivity hit on your team.

In a short term, possibly. But do you think bcachefs is better in the current situation than if it moved at half the speed, but without conflict? By being out of kernel it will get less testing, fewer contributions, the main developer will get some time wasted on rebasing the patch set with every release, and distros are unlikely to expose bcachefs to the user any time soon. When you're working with an ecosystem / teams, single person's performance really doesn't mean that much in the end. And occasionally Kent will still have to upstream some changes to interfaces - how likely is anyone to review/merge them quickly?

And now, what are the chances this will ever become more than a single person project really?

replies(1): >>44469255 #
19. orbisvicis ◴[] No.44469116{6}[source]
Isn't this the point of DKMS, to decouple module code from kernel code?
replies(2): >>44469218 #>>44469533 #
20. monkeyelite ◴[] No.44469158[source]
What you’re saying make sense but do we know the social convention about bugfix classification in Linux?

My job matches what you’re describing, but bug fix is widely interpreted. It basically means “managers don’t do anything stupid”.

If someone got in trouble using language you are desiring “the rules are clear and were broken”, i would feel they were singling someone out.

21. koverstreet ◴[] No.44469218{7}[source]
Well, my hope when bcachefs was merged was for it to be a real kernel community project.

At the time it looked like that could happen - there was real interest from Redhat prior to merging. Sadly Redhat's involvement never translated into much code, and while upstreaming did get me a large influx of users - many of which have helped enormously with the QA and stabilization effort - the drama and controversies have kept developers away, so on the whole it's meant more work, pressure and stress for me.

So DKMS wouldn't be the worst route, at this point. It would be a real shame though, this close to taking the experimental label off, and an enormous hassle for users and distributions.

But it's not my call to make, of course. I just write code...

22. alphazard ◴[] No.44469255{3}[source]
It would be worse for bcachefs and the kernel if they parted ways. The Linux kernel does not have a feature complete alternative to APFS. Apple, of all companies, is beating the Linux kernel at filesystems. That hasn't happened before.

> When you're working with an ecosystem / teams, single person's performance really doesn't mean that much in the end.

This is demonstrably not true. Kent brought Bcachefs to fruition and got it upstreamed. Wireguard was also one guy. The cryptography used in both, also 1 guy. There's an argument to be made that given an elegant, well designed system, we should assume it came from a single or a few minds. But given a system that's been around for a while, you would be right to assume that a lot of people were/are involved in keeping it around.

replies(2): >>44470668 #>>44471369 #
23. webstrand ◴[] No.44469533{7}[source]
DKMS is an awful user experience, it's an easy way to render a system unbootable. I hope Linus doesn't force existing users, like me, down that path. It's why I avoid zfs, however good it may be.
replies(3): >>44470382 #>>44470523 #>>44472183 #
24. timewizard ◴[] No.44469688{5}[source]
How would they know to volunteer? Are you saying I can perform a hostile volunteering to take over for a maintainer who does not want to give up the project? I don't think you understood what was meant.
replies(1): >>44471083 #
25. hamandcheese ◴[] No.44469832{6}[source]
I am sympathetic to your plight. I work on internal dev tools and being able to get changes out to users quickly is an incredible tool to be able serve them well and earn (or keep) their trust.

It seems like you want that kind of fast turn around time in the kernel though, which strikes me as an impossible goal.

26. dataflow ◴[] No.44470216{5}[source]
> evade the rules that, somehow, every other part of the kernel manages to do just fine with

I have no context on the situation or understanding of what the right set of rules here is, but the difference between filesystems and other code is that bugs in the filesystem can cause silent, persistent corruption for user data, on top of all the other failure modes. Most other parts of the kernel don't have such a large persistence capability in case of failure. So I can understand if filesystems feel the need to be special.

replies(1): >>44470381 #
27. samus ◴[] No.44470381{6}[source]
Yet the other filesystems seem fine with the rules. And the value proposition of Bcachefs precisely is that it doesn't eat your data. So, either the marketing is off, or it is far from ready to live with the quite predictable release pace of the Linux kernel.
replies(1): >>44471308 #
28. mroche ◴[] No.44470382{8}[source]
DKMS isn't a "fire and forget" kind of tool, but it comes reasonably close most of the time. I would say it's a far cry from awful, though.
replies(1): >>44473979 #
29. samus ◴[] No.44470468{6}[source]
IMHO, anybody willing using a file system marked as experimental from a downstream kernel should be able to wait for the fix. If they need it faster they should be ready to compile their own kernel or seriously reevaluate their decision to adopt the FS.

The Kernel's pace is predictable, even billion $ corporates can live with it, and it's not like Linus hasn't accommodated you in the past. But continuing to do so will make people stop believing you are acting in good faith, and the outcome of that will be predictable.

This is simply how the development model is like in the Linux kernel. Exceptions happen, but they sometimes backfire and highlight why the rules matter in the first place, and therefore they are unlikely to change.

30. yjftsjthsd-h ◴[] No.44470523{8}[source]
One of my machines runs root on ZFS via DKMS. I will grant that it is annoying, and it used to be worse, but I don't think it's been quite as bad as all that for a very long time. I would also argue that it's more acceptable for testing actively developed stuff that's getting the bugs worked out in order to work towards mainlining.

That said, I vaguely seem to recall that bcachefs ended up involving changes to other parts of the kernel to better support it; if that's true then DKMS is likely to be much more painful if not outright impossible. It's fine to compile one module (or even several) against a normal kernel, but the moment you have to patch parts of the "main" kernel it's gonna get messy.

31. eviks ◴[] No.44470642[source]
> One thing is to challenge things. > If the rules are set in a way that rc1+ gets only Bugfixes

So it's not ok to challenge things like the substance of rules...

replies(2): >>44470715 #>>44471323 #
32. eviks ◴[] No.44470658{5}[source]
The inconvenience of this process is also addressed by the dev, as is the different definition of experimental that you're using (though your expectation re kernel doesn't follow even without the mismatch in definitions)
replies(2): >>44473345 #>>44473746 #
33. viraptor ◴[] No.44470668{4}[source]
> Bcachefs to fruition and got it upstreamed

That may be reversed, so... wouldn't count it as a success yet. The project may not get popular adoption if people don't trust its future.

34. dottedmag ◴[] No.44470715[source]
It is, but directly, not as a subversion.

I have had a similar experience with a team member who was quietly unhappy about a rule. Instead of raising a discussion about the rule (like the rest of the team members did) he tried to quietly ignore it in his work, usually via requesting reviews from less stringent reviewers.

As a result, after a while I started documenting every single instance of his sneaky rule-breakage, sending every instance straight to his manager, and the person was out pretty soon.

replies(2): >>44470798 #>>44472211 #
35. Guvante ◴[] No.44470736[source]
"Pull requests aren't the time to talk about this" is only ever correct if the next part of the sentence is "because we already agreed" or some such.

Otherwise that is a red flag. Like pull requests are when discussions are had...

replies(2): >>44471044 #>>44471422 #
36. eviks ◴[] No.44470798{3}[source]
> It is, but directly, not as a subversion.

It is directly challenged in the very thread linked in the article (and likely before, the drama is ancient).

Also, there is no "less stringent reviewer", it's always been the same you!

So your example fails at both core points, yet your outcome is still the same happy firing!

At least for paid work you can just sprinkle $ to cover up such mistakes and find someone else, but wait, this is also not paid work!

37. cwillu ◴[] No.44471044[source]
And the linux kernel project has a long-established process, which includes not routinely landing major features post-merge-window without having a discussion first.
38. cwillu ◴[] No.44471052[source]
See the replies made by Josef Bacik and Theodore Ts'o.

https://lwn.net/ml/all/20250627144441.GA349175@fedora/#t

https://lwn.net/ml/all/20250628015934.GB4253@mit.edu/

39. cwillu ◴[] No.44471083{6}[source]
Anyone remotely suitable would be active on the lkml.
40. dataflow ◴[] No.44471308{7}[source]
My impression as a total outsider here is that most (all?) other filesystems I'm aware of are either more mature - and generally not in active feature development - or they are not as popular, limiting the damage. Is this inaccurate?

I will also say that bcachefs's selling point - and probably a major reason people are so excited for it - is amount of effort it puts into avoiding data corruption. Which tells you something about the perceived quality of other filesystems on Linux. Which means that saying "other filesystems seem fine with the rules" misses the very fact that people have seen too much data corruption with other filesystems and want something that prioritizes it higher.

replies(1): >>44472119 #
41. yxhuvud ◴[] No.44471323[source]
You don't challenge them by pretending they don't exist. That only make you look like an asshole.

The proper way here would have been two pull requests, one with all the bugfixes, and one with the new feature with a cover letter motivating why an exception should happen. And if this happens often enough with sufficiently good backing motivations, then he may be able to convince people.

42. rob_c ◴[] No.44471352[source]
And a honking great bus factor of Kent deciding enough is enough and having a tantrum. You couldn't and shouldn't trust critical data to such a scenario
replies(1): >>44472312 #
43. rob_c ◴[] No.44471369{4}[source]
> a feature complete alternative to APFS

Yet Linux has better tested NFS, CEPH, a stable ZFS target... I think the opposite is still true, apples golden goose of an fs is still basically their NTFS implementation

44. rob_c ◴[] No.44471394[source]
> Kent writes in the LWN comments

Unfortunately Kent spends a lot of time and effort defending Kent. I wish he would learn to take a step back and admit he's fallible and learn to play nice in the sandbox rather than wasting all of this time and effort. A simple "mea culpa" could smooth over a lot of the feathers he constantly ruffles.

45. dataflow ◴[] No.44471422[source]
I'm not sure how I feel on the larger picture, but I think I understand his view of why certain PRs aren't the place to talk about certain things.

It's because he views user data integrity as a more critical concern than the PR process or team dynamics - which, as a user, I don't fault him for. I think that in his mind, every hour/day/week spent debating things on a PR equals more people losing or corrupting data. This is not commonly the case with most PRs - it's specific to popular filesystems in active development.

What I don't necessarily buy is how to weigh this responsibility against the responsibility users take on when they use such an experimental FS in the first place. It's a tough question in my mind, and both sides have good points. And I also don't know anything about the relative safety vs. severity of each patch. But what I do understand is the motivation for not viewing these as generic PRs against generic codebases. So the idea that this is a red flag in this case just doesn't seem right to me, based on my current understanding.

replies(1): >>44472407 #
46. rob_c ◴[] No.44471432{6}[source]
> It's a big loss if a user has to wait 3 months for a bugfix

Either the bugfix is not serious and they can wait because the system is mature. Or, The fs is so unstable you shouldn't be pandering to the crowd that struggle with build deps and make install.

There is no in between, this is the situation. And the "but not all bugs are equal" argument doesn't stand.

I know if I read of a metadata but getting fixed in ext4 or ZFS there's a very small chance of this causing my platter to evaporate. By definition of stable, if that was happening it would be hitting that one unfortunate guy (<0.001% of users) running a weird OS or hardware and that's just the luck of the draw.

If the fix is from a fs marked experimental, yes I kinda expect this could fry my data and hurt kittens. That's what experimental and stable mean. That means I expect up to 100% of users to be impacted under certain workflows or scenarios.

Everything outside of this is wasted energy and arguing with the tide.

47. mort96 ◴[] No.44471563[source]
You aren't making your job easier as a leader by keeping assholes who insist on causing problems and not following established process.
48. dismalaf ◴[] No.44472119{8}[source]
Btrfs is still very much being developed, in the kernel and is quite popular.
replies(1): >>44472321 #
49. krageon ◴[] No.44472183{8}[source]
ZFS should be avoided because it has too many dumb complete failure states (having run it in a real production storage environment), not because it's DKMS
replies(1): >>44474783 #
50. eqvinox ◴[] No.44472193[source]
A highly skilled but socially inept engineer is not a top performer. Interacting with others is part of their performance. Ultimately you need to look at the sum total of time, money, and outcome for the entire team; if firing a single "rockstar but asshole" developer allows the rest of your team to achieve the same productivity, you're still better off because you're saving both money and time on that person. Conversely if a single such developer can replace your entire team… sure, go for it.

In the extreme, if bcachefs gets removed from the kernel, the productivity outcome (depending on your measure) is actually zero.

[Ed.: also, honestly, if you need to hire a "babysitter" for such a highly skilled engineer, that is also a viable option & there shouldn't be a social stigma for that either. I wouldn't say it's the manager's job though, not to that degree at least.]

replies(1): >>44476263 #
51. krageon ◴[] No.44472211{3}[source]
You've explained everyone is unhappy with it and that you worked to get the one person who actually acted upon it fired. It's hilarious but in a pretty sad way that you're portraying this as an inevitability. It wasn't, it was just you. You had a choice, and you chose to do this. It wasn't inevitable.
52. hitekker ◴[] No.44472275[source]
When the "top performer" destroys trust and fails to rebuild it, they shouldn't be on the team*

Skimming over the context, Kent seems to be lying by omission in PRs and distorting the history behind the PRs. Plus fighting with what is basically his tech lead who represents the team's norms, culture, and health. I also think he's fighting in this comment section right now.

Speaking as a former "brilliant jerk", I wouldn't trust the memory or intentions of a brilliant jerk. I wouldn't want to be looking over my shoulder for back-stabbing on my own team. I also wouldn't want my manager to get stuck in "I can fix him" mode because they're afraid of doing their flipping job: firing an employee who refuses to learn from their mistakes.

* I'm sympathetic to contexts when the team itself is bad and the performer is actually doing better. In that case: forget their trust, the performer should either remake the team (become the manager) or leave to do better work.

53. brookst ◴[] No.44472301[source]
It’s no different from giving up on someone who writes terrible code or creates got hell.

Sure, you talk to them. And sure, you explain what the problem is and treat them like an adult. But ultimately it is completely acceptable to give up.

Peoples’ potential matters to parents, and to mentors. A high-potential, low-performing person can be a project worth taking on, but they are not an obligation in the workplace, especially for someone as senior and time-constrained as Linus.

54. krzyk ◴[] No.44472307{6}[source]
> It's a big loss if a user has to wait 3 months for a bugfix

Is the wait really 3 monts away? I don't exactly know the release cycle, but for me kernels are released quite frequently, at least there are RC sooner than 3 months. Just checked latest history and major releases are 2 months apart - and between them there are minor ones.

People using experimental features are quite aware how to get new experimental kernel sources.

55. bombcar ◴[] No.44472312{3}[source]
There’s no harm doing it - if the thing actually works! Kent getting that lass metro pass wouldn’t cause your file system to immediately corrupt and delete itself.

What you want to avoid is becoming dependent on continued development of it - but unless you’re particularly using some specific feature of the file system that none other provide you’ll have time to migrate off it.

Even resierfs didn’t cease to operate.

replies(2): >>44472721 #>>44473464 #
56. koverstreet ◴[] No.44472321{9}[source]
Chris Mason moved on a long time ago, Josef seems to be spending most of his time on other things, and if you look at the commit history btrfs development has been moving pretty slowly for a long time.

It's a bad sign when the key founders leave like that, filesystems require a lot of coherence of design and institutional knowledge to be retained.

57. bombcar ◴[] No.44472325[source]
If you as a manager can build trust with your high performance engineer with zero social skills, you can end up with a power combination. You protect the engineer from insane requirements and also protect the rest of the team/company from outbursts.

I’ve seen it time and time again, sometime so much so that hiring the engineer also means hiring his handler, and everyone knows it and is ok with it - even the engineer.

58. moomin ◴[] No.44472355[source]
Counterpoint: every time I meet someone who is perceived this way, they’re definitely an asshole, but their “productivity” is often mostly corner-cutting. Other devs irritation with them is often conflating the technical unprofessionality with the team unprofessionality. Managers are lousy at actually judging the productivity in these situations. You 100% can ditch these people and your productivity will rise. You just won’t have some asshole claiming the credit for other people’s work anymore.

Funnily enough, I just tracked down a problem that significantly affected the calculation of how much money something cost down to an issue one of these geniuses introduced by thinking they were too good for regular, dull, due diligence in their development practices.

59. koverstreet ◴[] No.44472407{3}[source]
No, it's mainly that tensions have been high between myself and Linus so I want that stuff done privately so it doesn't spill out into the community the way it has been :)

It gets to be a real distraction. Fortunately the people I work with have learned how to roll with it, so it's not nearly as bad as it used to be. Now it mainly shows up in forum comments where it doesn't really affect me and I can eat popcorn.

It is true that I don't want critical fixes being held up by angry arguing, but most pull requests, even fixes, aren't nearly so critical.

The main thing I keep hammering on is "the development process _matters_ if we want to get this done right", and user considerations are a big part of that.

Debugging issues that come up in the wild, and getting those fixes to users in a timely manner so they can keep testing and we can get all these crazy failure modes sorted out is a big part of that - if we want a filesystem that's truly bulletproof. I know I want that!

I've been spending the past week and a half mostly working with one user and his filesystem that's been through flaky dying controllers and now lightning strikes; ext4 even got corrupted on the same setup.

But we discovered some 6.16 regressions, got some more people involved staring at code and logs (a new guy spotted a big one), and another small pile of fixes are going out next week. And even with the 6.16 regressions (some nasty ones were found), it's looking like he didn't lose much, thanks in part to journal rewind.

This thing is turning into a tank.

All in a day's work...

replies(1): >>44472505 #
60. dastbe ◴[] No.44472505{4}[source]
As a person who probably has one of the best vantage points on this, how was Apple to get apfs out so quickly compared to filesystems in Linux like bcachefs?
replies(1): >>44472657 #
61. baobun ◴[] No.44472556[source]
IMO it's very clear by reading a few threads that K is not just disagreeable but manipulative and disingeniuous. Bordering on gaslighting at times.

As someone who might have fallen in your grumpy-disagreeable-senior bucket at times, that's a different story and not something I would accept.

> I've seen this sentiment a lot lately. That disagreeable top performers have to be disposed of because they are "toxic" or "problematic".

This is not really a relevant argument to this situation.

62. baobun ◴[] No.44472645{6}[source]
DKMS as an option might be better then you imagine.
63. koverstreet ◴[] No.44472657{5}[source]
I am curious about that myself, I know very little about apfs.

But Apple has historically been strong on organizing and supporting teams (see: their chip design), a filesystem sounds exactly like something they'd do well if they decided to give it the proper investment and support.

Where they seem to be falling down these days is software maintenance - many, many reports of MacOS getting buggier with every release. But a big, complicated, but well defined and self contained engineering project? That's their ballpark.

64. tremon ◴[] No.44472721{4}[source]
The reiserfs code was stable and in maintenance mode. All new development effort was going into reiser4, which absolutely did die off. IIRC a few developers (that were already working on it) tried to continue the development, but it was abandoned due to lack of support and funds.

In terms of maturity, bcachefs is closer to production quality than reiser4 was, but it's still closer to reiser4 than reiserfs in its lifecycle.

replies(1): >>44472805 #
65. koverstreet ◴[] No.44472805{5}[source]
we're further along than btrfs in "will it keep my data"
replies(5): >>44472928 #>>44473415 #>>44473951 #>>44473972 #>>44477696 #
66. tremon ◴[] No.44472928{6}[source]
Fair enough, I have no practical experience with bcachefs myself.
replies(2): >>44472989 #>>44474229 #
67. koverstreet ◴[] No.44472989{7}[source]
Fair :) I've been trying to keep this thing (relatively) quiet and low profile until it's completely done, but it's gotten hyped.

Data integrity, core IO paths, all the failure modes, all the crazy repair corner cases - these are the hard parts if you really want to get a filesystem right. These are what we've been taking our time on.

I can't claim 100% success rate w.r.t. data loss, but it's been phenomenally good, with some crazy stories of filesystems we've gotten back that you'd never expect - and then it just becomes the norm.

I love the crazy bug reports that really push the boundaries of our capabilities.

That's an attitude that reiserfs and btrfs never had, and when I am confident that it is 100% rock solid and bulletproof I'll be lifting the experimental label.

68. saghm ◴[] No.44473093[source]
In other words, the rules don't apply to people who are "top performers"? This mentality will drive out all of the other people working for you, so even ignoring the obvious issues with how enables all sorts of shitty behavior from certain people, you're going to cost yourself more from losing larger numbers of "lower performers" in the long run (unless you end up replenishing the numbers with people equally shitty or at least willing to tolerate shittiness, which I guess would explain the stories of literal cesspools that have cropped up over the years that otherwise are hard to even comprehend).
69. rovr138 ◴[] No.44473345{6}[source]
The kernel, even its bugs, should be stable (in that they shouldn't change unless it happens the correct way). If not, it starts introducing unexpected issues to users.

If someone's testing against these versions, adding their fixes and patches, stuff like this will break things for users. He can't assume all users will be regular desktop users, even on an experimental area of the code.

Things like 'RC' have meaning. Meaning that has been there for years. He can develop on a separate tree and users that want it can use it. This is used all over.

70. jcalvinowens ◴[] No.44473415{6}[source]
> we're further along than btrfs in "will it keep my data"

Honestly Kent, this continuing baseless fearmongering from you about btrfs is absolutely disgusting.

It costs you. I was initially very interested in bcachefs, but I will never spend my time testing it or contributing to it as long as you continue behave this way. I'm certain there are many many others who would nominally be very interested, but feel the same way I do.

Your filesystem charitably gets 0.001% the real world testing btrfs does. To claim it is more reliable than btrfs is ignorant and naive.

Maybe it actually is more reliable in the real world (press X to doubt...), but you can't possibly know yet, and you won't know for a long time.

replies(3): >>44473484 #>>44473553 #>>44476415 #
71. rob_c ◴[] No.44473464{4}[source]
> There’s no harm doing it - if the thing actually works

This is the antiphrasis of good project management and stability.

No you want to avoid a static target in a dynamic environment that is unmaintained (such as an experimental fs in the kernel tree).

If it's static and unsupported. You'd end up failing to be run this to recover disks using ryzen9 processors that requires a minimum kernel version where the API/abi have drifted so far that the old module won't compile or import.

If you can't afford to get your hands dirty and hack at the API changing if this has such a bus factor. DON'T USE IT.

Frankly the argument you're making is the other side of stick with ext2 since it works. It's probably going to die soon and frankly unless there's a community to support it. (such as zfs, or ext4 in the kernel, or CEPH in hpc corporate spaces)

72. rob_c ◴[] No.44473484{7}[source]
I'm happy to support that bcache may have a stable on disk format, but the lashing out at the alternatives is another example of behaviour I'd prefer to see dropped.

If your product is so great its it's own advert. If it has problems spend the limited person power fixing it not attacking the opposition, this is what ciq have done, do better.

73. koverstreet ◴[] No.44473553{7}[source]
We have documented, in this very thread, issues with multi device setups that btrfs has that bcachefs does not - and btrfs developers ignoring these issues.

This isn't baseless fearmongering, this is "did you think through the failure modes when you were designing the basics".

This stuff comes up over, and over, and over.

Engineering things right matters, and part of that absolutely is comparing and documenting approaches and solutions to see what went right and what went wrong.

This isn't a popularity contest, and this isn't high school where we form into cliques and start slinging insults.

Come up with facts, documentation, analysis. That's what we do. I'm tired of these threads degenerating into flamewars.

replies(1): >>44473712 #
74. kzrdude ◴[] No.44473712{8}[source]
(That's impressive, but the real world user pool is much smaller isn't. It still sounds like a proud brag more than it does proven by workload.)

I am not a filesystems guy, but I was disappointed when I realized that btrfs did not have a good design for ENOSPC handling.

So I'm curious, does bcachefs design for a benign failure mode when out of space?

replies(1): >>44473786 #
75. motorest ◴[] No.44473730{4}[source]
> That is a valid objection, but I still think that for some huge and difficult features the month long pauses imposed by release cycles are absolutely detrimental.

I feel you're not answering the question, nor are you presenting any case in favor of forcing an exceptional release process for an unstable feature.

The "detrimental" claim is also void of any reason or substance. It's not to it's users as users know better than rolling out experimental file systems for critical data, and those hypothetical users who somehow are really really interested in using bleeding edge will already be building their own kernel for this reason alone. Both scenarios don't require this code to be in the kernel, let alone exceptional release processes.

> Ideally they'd be developed outside the kernel until they are perfect, but Kent addresses this in his LWN comment: There is no funding/time to make that ideal scenario possible.

It is clear then that the code should be pulled from the kernel. If it isn't impossible to maintain a feature with the regular release process, everyone stands to benefit by not shipping code that is impossible to stabilize.

replies(1): >>44474525 #
76. motorest ◴[] No.44473746{6}[source]
> The inconvenience of this process is also addressed by the dev, as is the different definition of experimental that you're using (...)

The only aspect of "experimental" that matters is what it means to the release process. If you can't meet that bar then debating semantics won't help either.

And by the way, the patch thread clearly stresses a) loss of data, b) the patch trying to sneak under the maintenance radar new features. That is the definition of unstable in anyone's book.

replies(1): >>44475065 #
77. koverstreet ◴[] No.44473786{9}[source]
We have enough user reports of multi device testing that they put both bcachefs and btrfs through, where bcachefs consistently survives where btrfs does not. We have much better repair and recovery, with real defense in depth.

Now: I am not saying that bcachefs is yet trouble free enough for widespread deployment, we're still seeing cases where repair needs fairly minor work, but the filesystem may be offline while we get that fixed.

OTOH we also recover, regularly, from absolutely crazy scenarios involving hardware failure: flaky controllers, lightning strikes, I've seen cases where it looked like a head crashed - took out a whole bunch of btree nodes in similar LBA ranges.

IOW: the fundamentals are very solid, but keep checking back if you're wondering when it'll be ready for widespread deployment.

Milestones to watch for: - 3 months with zero data loss or downtime events: I think we may get this soon, knock on wood - stable backports starting: 6.17, knock on wood (or maybe we'll be out of the kernel and coming up with our own plan, who knows) - weird application specific bugs squashed: these have been on the back burner, but there's some weird stuff to get to still (e.g. weird unlink behavior that affects docker and go builds, and the Rust people just reported something odd when building certain packages with mold).

And yes, we've always handled -ENOSPC gracefully.

78. motorest ◴[] No.44473839[source]
> You aren't doing your job as a leader if this is your attitude to good engineers.

This is precisely the mistake you are making: conflating egregious types who post code as "good engineers". They are not. They are incompetent.

There is no engineering activity that is not driven.by teams. Being able to work effecticely in a team environment is therefore a very basic and critical skill. Those who are unable to work in a team environment are lacking a very basic and critical skill. Those who fail at this basic skill to the point they are dubbed as "toxic" end up sabotaging your whole team, needlessly creating problems to everyone around them, and preventing any collaboration to take place.

If this problem is introduced by a single team member, it is in everyone's best interest to just cut the cancer.

79. bombcar ◴[] No.44473951{6}[source]
From my experience as a [x,z]fs snob, "further along than butterfs" is damning with faint praise.
80. bigyabai ◴[] No.44473972{6}[source]
I have used BTRFS for 6 years on 5 drives without a single journaled corruption.
81. webstrand ◴[] No.44473979{9}[source]
I think my problem is that it's just close enough to being fire-and-forget that I forget how to do the recovery when it misfires. It usually seems to crop up when I'm on vacation or something and I don't have my tools.
82. sroussey ◴[] No.44474229{7}[source]
> I have no practical experience with bcachefs myself.

Who does?

When the MySQL and Postgres projects recommend it, I’ll have a look.

replies(1): >>44475960 #
83. bgwalter ◴[] No.44474525{5}[source]
> The "detrimental" claim is also void of any reason or substance.

Thanks for the compliments! Detrimental for development speed, not for the users.

84. cyberpunk ◴[] No.44474783{9}[source]
I’ve run racks and racks of it in prod also. What are these dumb complete failure states you mean?
85. hinkley ◴[] No.44475007{3}[source]
But if you have one “top performer” who gets in the way of every other person’s productivity and buy-in, they have to go. You can’t base an organization on a bus number of one.
86. koverstreet ◴[] No.44475065{7}[source]
Experimental has no defined meaning with respect to the release process.

It's a signal to users - "watch out if you're going to use this thing"

87. koverstreet ◴[] No.44475960{8}[source]
That's still a ways off, but it is worth noting that bcachefs handles database workloads in cow mode with no issue.
88. lll-o-lll ◴[] No.44476263{3}[source]
Difficult people are not “assholes”. Someone saying “your engineering practices are shoddy and the quality of your code is bad” does not make them an asshole. It makes them french maybe.

My point is that there are people low on the agreeableness scale, and they can often be exceptional engineers. You have to manage them, yes, but taking the easy way out of “i’ll sack anyone who’s prickly” will mean a shit team. You need people (or at least one), who will say “that won’t work because…” “this is bad because…” “this will fail because…”.

replies(1): >>44476757 #
89. Dylan16807 ◴[] No.44476415{7}[source]
I haven't lost data on btrfs but I have broken half the partitions I made with it. The comparison doesn't feel baseless to me.

> 0.001% the real world testing

Statistics can be quite powerful. If you have a million installs, and your billion-install competitor has 100 problems per million installs, you can make some pretty strong statements about how you rate against that competitor. Just for easy example numbers.

90. jjaksic ◴[] No.44476757{4}[source]
A person who points out flaws is not an asshole. An asshole is a person who breaks rules and who breaks trust. We're talking about the latter, not the former.
91. jjaksic ◴[] No.44476867{6}[source]
> It's a big loss if a user has to wait 3 months for a bugfix

This is incredibly short-sighted. You're talking about 1 user 3 months, and you think that's "big" ? I'd say it's a much bigger loss if the project gets kicked out because of one person's impatience. Then everybody will have to wait forever, how is that better?

If the fs is as good as you claim, then you better play by the rules and make sure the project survives and eventually goes GA. If it happens a few months later, then so be it. Think about the long term.

If you're worried about a single user leaving, then a much better strategy would be to explain to this user the Linux release timeline, or how to apply a patch, than to go up toe to toe against Linus.

And btw, squeezing a fix/feature in at the last minute in order to help one user is not as good as you think it is. Even if that one user appreciates your responsiveness, to everyone else it sends a message that the key dev is super impatient and unprofessional. So even if you manage to keep that one user, how many potential users are you losing by sending that message?

92. int_19h ◴[] No.44477696{6}[source]
btrfs is used by numerous NAS providers at this point.
replies(1): >>44477929 #
93. koverstreet ◴[] No.44477929{7}[source]
Do you know any that use it in multi device mode?