Bcachefs may be headed out of the kernel

(lwn.net)

144 points ksec | 2 comments | 04 Jul 25 13:32 UTC | HN request time: 0.449s | source

Show context

criticalfault ◴[04 Jul 25 17:57 UTC] No.44466573[source]▶

I've been following this for a while now.

Kent is in the wrong. Having a lead position in development I would kick Kent of the team.

One thing is to challenge things. What Kent is doing is something completely different. It is obvious he introduced a feature, not only a Bugfix.

If the rules are set in a way that rc1+ gets only Bugfixes, then this is absolutely clear what happens with the feature. Tolerating this once or twice is ok, but Kent is doing this all the time, testing Linus.

Linus is absolutely in the right to kick this out and it's Kent's fault if he does so.

replies(8): >>44466668 #>>44467387 #>>44467968 #>>44468790 #>>44468966 #>>44469158 #>>44470642 #>>44470736 #

pmarreck ◴[04 Jul 25 19:56 UTC] No.44467387[source]▶

>>44466573 #

This can happen with primadonna devs who haven't had to collaborate in a team environment for a long time.

It's a damn shame too because bcachefs has some unique features/potential

replies(1): >>44471352 #

rob_c ◴[05 Jul 25 09:28 UTC] No.44471352[source]▶

>>44467387 #

And a honking great bus factor of Kent deciding enough is enough and having a tantrum. You couldn't and shouldn't trust critical data to such a scenario

replies(1): >>44472312 #

bombcar ◴[05 Jul 25 12:24 UTC] No.44472312[source]▶

>>44471352 #

There’s no harm doing it - if the thing actually works! Kent getting that lass metro pass wouldn’t cause your file system to immediately corrupt and delete itself.

What you want to avoid is becoming dependent on continued development of it - but unless you’re particularly using some specific feature of the file system that none other provide you’ll have time to migrate off it.

Even resierfs didn’t cease to operate.

replies(2): >>44472721 #>>44473464 #

tremon ◴[05 Jul 25 13:37 UTC] No.44472721[source]▶

>>44472312 #

The reiserfs code was stable and in maintenance mode. All new development effort was going into reiser4, which absolutely did die off. IIRC a few developers (that were already working on it) tried to continue the development, but it was abandoned due to lack of support and funds.

In terms of maturity, bcachefs is closer to production quality than reiser4 was, but it's still closer to reiser4 than reiserfs in its lifecycle.

replies(1): >>44472805 #

koverstreet ◴[05 Jul 25 13:51 UTC] No.44472805[source]▶

>>44472721 #

we're further along than btrfs in "will it keep my data"

replies(5): >>44472928 #>>44473415 #>>44473951 #>>44473972 #>>44477696 #

jcalvinowens ◴[05 Jul 25 15:28 UTC] No.44473415[source]▶

>>44472805 #

> we're further along than btrfs in "will it keep my data"

Honestly Kent, this continuing baseless fearmongering from you about btrfs is absolutely disgusting.

It costs you. I was initially very interested in bcachefs, but I will never spend my time testing it or contributing to it as long as you continue behave this way. I'm certain there are many many others who would nominally be very interested, but feel the same way I do.

Your filesystem charitably gets 0.001% the real world testing btrfs does. To claim it is more reliable than btrfs is ignorant and naive.

Maybe it actually is more reliable in the real world (press X to doubt...), but you can't possibly know yet, and you won't know for a long time.

replies(3): >>44473484 #>>44473553 #>>44476415 #

koverstreet ◴[05 Jul 25 15:48 UTC] No.44473553[source]▶

>>44473415 #

We have documented, in this very thread, issues with multi device setups that btrfs has that bcachefs does not - and btrfs developers ignoring these issues.

This isn't baseless fearmongering, this is "did you think through the failure modes when you were designing the basics".

This stuff comes up over, and over, and over.

Engineering things right matters, and part of that absolutely is comparing and documenting approaches and solutions to see what went right and what went wrong.

This isn't a popularity contest, and this isn't high school where we form into cliques and start slinging insults.

Come up with facts, documentation, analysis. That's what we do. I'm tired of these threads degenerating into flamewars.

replies(1): >>44473712 #

1. kzrdude ◴[05 Jul 25 16:14 UTC] No.44473712[source]▶

>>44473553 #

(That's impressive, but the real world user pool is much smaller isn't. It still sounds like a proud brag more than it does proven by workload.)

I am not a filesystems guy, but I was disappointed when I realized that btrfs did not have a good design for ENOSPC handling.

So I'm curious, does bcachefs design for a benign failure mode when out of space?

replies(1): >>44473786 #

2. koverstreet ◴[05 Jul 25 16:27 UTC] No.44473786[source]▶

>>44473712 (TP) #

We have enough user reports of multi device testing that they put both bcachefs and btrfs through, where bcachefs consistently survives where btrfs does not. We have much better repair and recovery, with real defense in depth.

Now: I am not saying that bcachefs is yet trouble free enough for widespread deployment, we're still seeing cases where repair needs fairly minor work, but the filesystem may be offline while we get that fixed.

OTOH we also recover, regularly, from absolutely crazy scenarios involving hardware failure: flaky controllers, lightning strikes, I've seen cases where it looked like a head crashed - took out a whole bunch of btree nodes in similar LBA ranges.

IOW: the fundamentals are very solid, but keep checking back if you're wondering when it'll be ready for widespread deployment.

Milestones to watch for: - 3 months with zero data loss or downtime events: I think we may get this soon, knock on wood - stable backports starting: 6.17, knock on wood (or maybe we'll be out of the kernel and coming up with our own plan, who knows) - weird application specific bugs squashed: these have been on the back burner, but there's some weird stuff to get to still (e.g. weird unlink behavior that affects docker and go builds, and the Rust people just reported something odd when building certain packages with mold).

And yes, we've always handled -ENOSPC gracefully.

↑