Bcachefs may be headed out of the kernel

(lwn.net)

Show context

chasil ◴[04 Jul 25 17:02 UTC] No.44466139[source]▶

So the assertion is that users with (critical) data loss bugs need complete solutions for recovery and damage containment with all possible speed, and without this "last mile" effort, stability will never be achieved.

The objection is the tiniest bug-fix windows get everything but the kitchen sink.

These are both uncomfortable positions to occupy, without doubt.

replies(2): >>44467021 #>>44468195 #

koverstreet ◴[04 Jul 25 19:00 UTC] No.44467021[source]▶

>>44466139 #

No, the assertion is that the proper response to a bug often (and if it's high impact - always) involves a lot more than just the bugfix.

And the whole reason for a filesystem's existence is to store and maintain your data, so if that is what the patch if for, yes, it should be under consideration as a hotfix.

There's also the broader context: it's a major problem for stabilization if we can't properly support the people using it so they can keep testing.

More context: the kernel as a whole is based on fixed time tables and code review, which it needs because QA (especially automated testing) is extremely spotty. bcachefs's QA, both automated testing and community testing, is extremely good, and we've had bugfix patchsets either held up or turn into flamewars because of this mismatch entirely too many times.

replies(4): >>44467217 #>>44467479 #>>44468100 #>>44470493 #

rewgs ◴[05 Jul 25 06:20 UTC] No.44470493[source]▶

>>44467021 #

Kent, it’s actually really simple: bcachefs is experimental. Those that are currently using bcachefs and those that can’t wait for a data recovery tool that hasn’t existed until now is a group containing precisely zero people.

You’re acting like bcachefs systems are storing Critical Data That Absolutely Cannot Be Lost. And yet at the same time it’s experimental. I’m just one user, but I can tell you that, even as excited as I am about bcachefs, I’m not touching it with a ten foot pole for anything beyond playing around until at least the experimental label is removed.

I imagine my position is not uncommon.

Please stop trying to die on this hill. Your project is really great and really important. I want it to succeed.

Just chill and let bug fixes be bug fixes and features be features.

replies(2): >>44472283 #>>44473217 #

koverstreet ◴[05 Jul 25 12:19 UTC] No.44472283[source]▶

>>44470493 #

I've recovered a _lot_ of data for users that didn't have backups.

It's all part of the job.

replies(2): >>44474097 #>>44475331 #

magicalhippo ◴[05 Jul 25 20:33 UTC] No.44475331[source]▶

>>44472283 #

If you have a lot of users who store data on an experimental filesystems and who don't back up said data, yet are not cool with data losses, I would say you have a serious communication issue at hand.

I have lost years of my work due to not having proper backups. I know the pain.

And I totally get you feel responsible for the data loss and want to help your users, I'm like that too with my code.

But this is an experiment. It's right there in the name.

If this feature really is as needed as you claim here, then getting it into the kernel is a mere side-issue.

In that case, your #1 priority should be to fix whatever is causing such users to install and use bcachefs without having a recovery plan that they have verified, and get existing users on the same level.

Because not doing so would be a huge disservice to those users who don't know better, and at worst borderline exploitative.

Writing recovery software is part of the job. Forcing it into the kernel to save users who are clearly not in any shape or form competent enough to partake in your experiment is definitely not part of the job.

Finding yourself in this position means something has gone very, very wrong.

replies(1): >>44476498 #

1. Dylan16807 ◴[05 Jul 25 23:53 UTC] No.44476498[source]▶

>>44475331 #

> Forcing it into the kernel to save users who are clearly not in any shape or form competent enough to partake in your experiment is definitely not part of the job.

That code needs to be there for non-experimental users later on.

The reason to push it in quickly is so it can get tested and iterated on. Saving a handful of experimental users is not the main benefit.

replies(2): >>44476678 #>>44477451 #

2. magicalhippo ◴[06 Jul 25 00:24 UTC] No.44476678[source]▶

>>44476498 (TP) #

> Saving a handful of experimental users is not the main benefit.

Kent himself argued otherwise here[1].

And even if that were the case, there's no need to take a stand on trying to get it into the kernel. If he gets booted out of the kernel tree, then the end result is the same for his users: they have to compile their own kernel. So it makes no sense to push this so hard.

[1]: https://www.phoronix.com/forums/forum/software/general-linux...

replies(1): >>44476725 #

3. Dylan16807 ◴[06 Jul 25 00:33 UTC] No.44476725[source]▶

>>44476678 #

> Kent himself argued otherwise here[1].

I said "main" for a reason. The current users that need to recover data are a part of the picture, but they're a few trees out of the forest.

"Please tell that to the users who lost data." is not arguing against what I said.

> And even if that were the case, there's no need to take a stand on trying to get it into the kernel. If he gets booted out of the kernel tree, then the end result is the same for his users: they have to compile their own kernel. So it makes no sense to push this so hard.

He's not giving an ultimatum here. The goal is to figure out something that works for everyone.

4. rewgs ◴[06 Jul 25 02:54 UTC] No.44477451[source]▶

>>44476498 (TP) #

> The reason to push it in quickly is so it can get tested and iterated on. Saving a handful of experimental users is not the main benefit.

Testing and iterating on code does not require making exceptions to the kernel development schedule.

replies(1): >>44477474 #

5. Dylan16807 ◴[06 Jul 25 03:00 UTC] No.44477474[source]▶

>>44477451 #

He makes a pretty good argument that the filesystem's never going to get done if there's only one iteration per kernel release.

I don't know what the best solution is, but it looks like it requires either exceptions or something else that gets around the schedule.

replies(1): >>44479373 #

6. NekkoDroid ◴[06 Jul 25 10:10 UTC] No.44479373{3}[source]▶

>>44477474 #

Well... there is the merge window where this should be added and then like 8 release candidates (as always: it depends) where he can iterate on the added code. So the statement of "only one iteration per kernel release" is just categorically wrong.

replies(1): >>44479572 #

7. Dylan16807 ◴[06 Jul 25 10:53 UTC] No.44479572{4}[source]▶

>>44479373 #

"categorically wrong" is a rather uncharitable way to describe you and me using different connotations of the word "iterate". Especially when you pulled "on the added code" out of nowhere.

To stabilize the filesystem he needs to iterate on the code that has been there for a while, to add more debugging and fallbacks for error situations. In this case he wanted to add a new fallback.

↑