Bcachefs may be headed out of the kernel

1. chasil ◴[04 Jul 25 17:02 UTC] No.44466139[source]▶

So the assertion is that users with (critical) data loss bugs need complete solutions for recovery and damage containment with all possible speed, and without this "last mile" effort, stability will never be achieved.

The objection is the tiniest bug-fix windows get everything but the kitchen sink.

These are both uncomfortable positions to occupy, without doubt.

replies(2): >>44467021 #>>44468195 #

2. koverstreet ◴[04 Jul 25 19:00 UTC] No.44467021[source]▶

>>44466139 (TP) #

No, the assertion is that the proper response to a bug often (and if it's high impact - always) involves a lot more than just the bugfix.

And the whole reason for a filesystem's existence is to store and maintain your data, so if that is what the patch if for, yes, it should be under consideration as a hotfix.

There's also the broader context: it's a major problem for stabilization if we can't properly support the people using it so they can keep testing.

More context: the kernel as a whole is based on fixed time tables and code review, which it needs because QA (especially automated testing) is extremely spotty. bcachefs's QA, both automated testing and community testing, is extremely good, and we've had bugfix patchsets either held up or turn into flamewars because of this mismatch entirely too many times.

replies(4): >>44467217 #>>44467479 #>>44468100 #>>44470493 #

3. koverstreet ◴[04 Jul 25 19:37 UTC] No.44467261{3}[source]▶

>>44467217 #

Oh my. Got any other prescriptions for me? :)

It's a bcachefs thread, and I'm the resident bcachefs expert, so.... :)

I'm not terribly invested in these threads, the actual decisionmaking happens elsewhere. But they are a good opportunity to educate people on the whole process of shipping a filesystem, talk about what we're doing, what our priorities are, all that jazz.

replies(1): >>44468199 #

4. WesolyKubeczek ◴[04 Jul 25 20:07 UTC] No.44467479[source]▶

>>44467021 #

> No, the assertion is that the proper response to a bug often (and if it's high impact - always) involves a lot more than just the bugfix.

Then what you do is you try to split your work in two. You could think of a stopgap measure or a workaround which is small, can be reviewed easily, and will reduce the impact of the bug while not being a "proper" fix, and prepare the "properer" fix when the merge window opens.

I would ask, since the bug probably lived since the last stable release, how come it fell through the crack and had only been noticed recently? Could it be that not all setups are affected? If so, can't they live with it until the next merge window?

By making a "feature that fixes the bug for real", you greatly expand the area in which new, unknown bugs may land, with very little time to give it proper testing. This is inevitable, evident by the simple fact that the bug you were trying to fix exists. You can be good, but not that good. Nobody is that good. If anybody was that good, they wouldn't have the bug in the first place.

If you have commercial clients who use your filesystem and you have contractual obligations to fix their bugs and keep their data intact, you could (I'd even say "should") maintain an out-of-tree version with its own release and bugfix schedule. This is IMO the only reasonable way to have it, because the kernel is a huge administrative machine with lots of people, and by mainlining stuff, you necessarily become co-dependent on the release schedule for the whole kernel. I think a conflict between kernel's release schedule and contractual obligations, if you have any, is only a matter of time.

replies(1): >>44468619 #

5. magicalhippo ◴[04 Jul 25 21:45 UTC] No.44468100[source]▶

>>44467021 #

While I absolutely think you're taking a stand in the wrong fights, like I don't see why you needed to push it so far on this hill in particular, I am sympathetic to your argument that experimental kernel modules like filesystems might need a different release approach at times.

At work we have our main application which also contains a lot of customer integrations. Our policy has been new features in trunk only, except if it's entirely contained inside a customer-specific integration module.

We do try to avoid it, but this does allow us to be flexible with regards to customer needs, while keeping the base application stable.

This new recovery feature was, as far as I could see, entirely contained within the bcachefs kernel code. Given the experimental status, as long as it was clearly communicated to users, I don't see a huge problem allowing such self-contained features during the RC phase.

Obviously a requirement must be that it doesn't break the build.

replies(2): >>44470006 #>>44472364 #

6. jethro_tell ◴[04 Jul 25 22:00 UTC] No.44468195[source]▶

>>44466139 (TP) #

Who’s using an experimental filesystem and risking critical data loss? Rule one of experimental file systems is have a copy on a not experimental file system.

replies(1): >>44472374 #

7. NewJazz ◴[04 Jul 25 22:00 UTC] No.44468199{4}[source]▶

>>44467261 #

Yes, drink lots of water and get lots of sleep lol

I mean if you insist OK, but I think there are a lot better ways to educate folks.

8. koverstreet ◴[04 Jul 25 22:54 UTC] No.44468619{3}[source]▶

>>44467479 #

> Then what you do is you try to split your work in two. You could think of a stopgap measure or a workaround which is small, can be reviewed easily, and will reduce the impact of the bug while not being a "proper" fix, and prepare the "properer" fix when the merge window opens.

That is indeed what I normally do. For example, 6.14 and 6.15 had people discovering btree iterator locking bugs (manifesting as assertion pops) while running evacuates on large filesystems (it's hard to test a sufficiently deep tree depth in virtual machine tests with our large btree nodes); some small hotfixes went out in rc kernels, but the majority of the work (a whole project to add assertions for path->should_be_locked, which should shut these down for good) waited until the 6.16 merge window.

That was for a less critical bug - your machine crashing is somewhat less severe than losing a filesystem.

In this case, we had a bug pop up in 6.15 where the link count in the VFS inode getting screwed up caused an inode to be deleted that shouldn't have been - a subvolume root - and then an untested repair path took out the entire subvolume.

Ouuuuch.

That's why the repair code was rushed; it had already gotten one filesystem back, and I'd just gotten another report of someone else hitting it - and for every bug report there are almost always more people who hit it and don't report it.

And considering that a lot of people running bcachefs now are getting it from distro kernels and don't know how to build kernels - that is why it was important to get this out quickly through the normal channels.

In addition, the patch wasn't risky, contrary to what Ted was saying. It's a code path that's very well covered by automated tests, including KASAN/UBSAN/lockdep variants - those would exploded if this patch was incorrect.

When to ship a patch is always a judgement call, and part of how you make that call is how well your QA process can guarantee the patch is correct. Part of what was going on here is a disconnect between those of us who do make heavy use of modern QA infrastructure and those who do it the old school way, relying heavily on manual review and long testing periods for rc kernels.

replies(1): >>44475096 #

9. ◴[05 Jul 25 04:03 UTC] No.44470006{3}[source]▶

>>44468100 #

10. rewgs ◴[05 Jul 25 06:20 UTC] No.44470493[source]▶

>>44467021 #

Kent, it’s actually really simple: bcachefs is experimental. Those that are currently using bcachefs and those that can’t wait for a data recovery tool that hasn’t existed until now is a group containing precisely zero people.

You’re acting like bcachefs systems are storing Critical Data That Absolutely Cannot Be Lost. And yet at the same time it’s experimental. I’m just one user, but I can tell you that, even as excited as I am about bcachefs, I’m not touching it with a ten foot pole for anything beyond playing around until at least the experimental label is removed.

I imagine my position is not uncommon.

Please stop trying to die on this hill. Your project is really great and really important. I want it to succeed.

Just chill and let bug fixes be bug fixes and features be features.

replies(2): >>44472283 #>>44473217 #

11. koverstreet ◴[05 Jul 25 12:19 UTC] No.44472283{3}[source]▶

>>44470493 #

I've recovered a _lot_ of data for users that didn't have backups.

It's all part of the job.

replies(2): >>44474097 #>>44475331 #

12. bombcar ◴[05 Jul 25 12:34 UTC] No.44472364{3}[source]▶

>>44468100 #

I have seen modules and code scream at me that code needed something else - so a PR for the literal bugfix could include a message that says “RECOVERABLE SITUATION DETECTED - visit bcachefs.org/owmp for details”

Then you have details on how to obtain recovery tools. You’d only need it for one patch revision.

13. bombcar ◴[05 Jul 25 12:35 UTC] No.44472374[source]▶

>>44468195 #

The biggest dirty secret of the IT world is that everyone knows you should have more backups than God, but everyone runs with an average of about zero.

14. gdevenyi ◴[05 Jul 25 14:59 UTC] No.44473217{3}[source]▶

>>44470493 #

> You’re acting like bcachefs systems are storing Critical Data That Absolutely Cannot Be Lost.

It is to the user storing it.

replies(1): >>44474121 #

15. rewgs ◴[05 Jul 25 17:13 UTC] No.44474097{4}[source]▶

>>44472283 #

Frankly, if you store important data on an experimental file system and don’t have backups, you deserve to lose it.

And I have to imagine that those that are technical enough to not only use Linux, but use an experimental non-default file system, and those that don’t have backups of their data, is a vanishingly small group.

So no, I actually disagree —- it’s not part of the job.

So again we arrive at the same place: this data recovery tool is not worth the drama.

It’s a feature, not a bug fix, and an incredibly unimportant one at that _at this stage of development_. If bcachefs weren’t experimental and were widely used, it would be a different story —- I’d probably be in favor of bending the rules to get it in there faster. But that just isn’t where we are right now.

16. rewgs ◴[05 Jul 25 17:18 UTC] No.44474121{4}[source]▶

>>44473217 #

As I said in my reply to Kent: frankly, if you store important data on an experimental file system and don’t have backups, you deserve to lose it.

17. WesolyKubeczek ◴[05 Jul 25 19:56 UTC] No.44475096{4}[source]▶

>>44468619 #

> In this case, we had a bug pop up in 6.15 where the link count in the VFS inode getting screwed up caused an inode to be deleted that shouldn't have been - a subvolume root - and then an untested repair path took out the entire subvolume.

I would rather make sure this path was never hit for rc, to minimize the damage. The fact alone that it didn’t pop up until late in the 6.15 cycle could hint at some specific circumstances the bug manifested on, and those could be described and avoided.

And I think there could be a mediocre way to get by until the next merge window in which a superior solution could be presented.

I don’t want to sound as if I’m an expert in how to do VFS, because I’m not. I’m, however, an expert in how to be “correcter than others” which has cost me getting kicked out of jobs before. I hope I have learned better since, and at the time I have been very, very stubborn (they wouldn’t have kicked me out otherwise).

There is this bit when working with others that you will likely go with solutions you deem more mediocre than the theoretically best solution, or that experienced people will say “no” to your ideas or solutions and you accept it instead of seeking quarrel in spite of them obviously not understanding you (spoiler: this is not so). But if you show that you’re willing to work with other as a single unit, you will be listened to, appreciated, and concessions will be made for you too. This is not necessarily about the kernel, it’s about working in a team in general.

I don’t have a dog in this fight, but I’ve been “that guy” before and I regret it took me that long to realize this and mend my ways. Hope it doesn’t keep happening to you.

18. magicalhippo ◴[05 Jul 25 20:33 UTC] No.44475331{4}[source]▶

>>44472283 #

If you have a lot of users who store data on an experimental filesystems and who don't back up said data, yet are not cool with data losses, I would say you have a serious communication issue at hand.

I have lost years of my work due to not having proper backups. I know the pain.

And I totally get you feel responsible for the data loss and want to help your users, I'm like that too with my code.

But this is an experiment. It's right there in the name.

If this feature really is as needed as you claim here, then getting it into the kernel is a mere side-issue.

In that case, your #1 priority should be to fix whatever is causing such users to install and use bcachefs without having a recovery plan that they have verified, and get existing users on the same level.

Because not doing so would be a huge disservice to those users who don't know better, and at worst borderline exploitative.

Writing recovery software is part of the job. Forcing it into the kernel to save users who are clearly not in any shape or form competent enough to partake in your experiment is definitely not part of the job.

Finding yourself in this position means something has gone very, very wrong.

replies(1): >>44476498 #

19. Dylan16807 ◴[05 Jul 25 23:53 UTC] No.44476498{5}[source]▶

>>44475331 #

> Forcing it into the kernel to save users who are clearly not in any shape or form competent enough to partake in your experiment is definitely not part of the job.

That code needs to be there for non-experimental users later on.

The reason to push it in quickly is so it can get tested and iterated on. Saving a handful of experimental users is not the main benefit.

replies(2): >>44476678 #>>44477451 #

20. magicalhippo ◴[06 Jul 25 00:24 UTC] No.44476678{6}[source]▶

>>44476498 #

> Saving a handful of experimental users is not the main benefit.

Kent himself argued otherwise here[1].

And even if that were the case, there's no need to take a stand on trying to get it into the kernel. If he gets booted out of the kernel tree, then the end result is the same for his users: they have to compile their own kernel. So it makes no sense to push this so hard.

[1]: https://www.phoronix.com/forums/forum/software/general-linux...

replies(1): >>44476725 #

21. Dylan16807 ◴[06 Jul 25 00:33 UTC] No.44476725{7}[source]▶

>>44476678 #

> Kent himself argued otherwise here[1].

I said "main" for a reason. The current users that need to recover data are a part of the picture, but they're a few trees out of the forest.

"Please tell that to the users who lost data." is not arguing against what I said.

> And even if that were the case, there's no need to take a stand on trying to get it into the kernel. If he gets booted out of the kernel tree, then the end result is the same for his users: they have to compile their own kernel. So it makes no sense to push this so hard.

He's not giving an ultimatum here. The goal is to figure out something that works for everyone.

22. rewgs ◴[06 Jul 25 02:54 UTC] No.44477451{6}[source]▶

>>44476498 #

> The reason to push it in quickly is so it can get tested and iterated on. Saving a handful of experimental users is not the main benefit.

Testing and iterating on code does not require making exceptions to the kernel development schedule.

replies(1): >>44477474 #

23. Dylan16807 ◴[06 Jul 25 03:00 UTC] No.44477474{7}[source]▶

>>44477451 #

He makes a pretty good argument that the filesystem's never going to get done if there's only one iteration per kernel release.

I don't know what the best solution is, but it looks like it requires either exceptions or something else that gets around the schedule.