Bcachefs may be headed out of the kernel

1. alphazard ◴[04 Jul 25 23:25 UTC] No.44468816[source]▶

This whole debacle is the perfect advertisement for microkernels. The only reason Kent needs to coordinate with Linus is because filesystems need to live in the kernel. FUSE is second class. Imagine how much easier this all would be if linux maintained a slowly evolving filesystem API, and all bcachefs had to do was keep up with it.

replies(4): >>44469515 #>>44470749 #>>44471267 #>>44475551 #

2. skissane ◴[05 Jul 25 01:53 UTC] No.44469515[source]▶

>>44468816 (TP) #

I don’t think FUSE is deliberately a “second class citizen”, it is simply that doing a filesystem in user space has a performance cost compared to doing it in the kernel-and that is a very tricky problem to solve. Even microkernels have this problem, it is just you don’t notice it as readily because a pure microkernel doesn’t offer in-kernel filesystems as a comparator - but if you take a microkernel and transform it into a hybrid kernel by moving filesystems (and block device drivers) into kernel space, like NeXT/Apple did in turning Mach into XNU, almost certainly you are going to see tangible performance gains. Maybe this is less true with more modern microkernel designs such as L4, but even there I suspect it is still true, even if not to quite the same extent.

I think the performance cost of FUSE compared to in-kernel filesystems is improving with time - FUSE with io_uring is a big step forward, but the immaturity of io_uring is an obstacle to its adoption, at least in the short-to-medium-term. I’m sure in the future we’ll see even further improvements in this area. But will we ever reach the Nirvana where FUSE equals the performance of in-kernel filesystems, or (maybe more realistically) the performance overhead has become so marginal nobody is bothered by it in practice? I’d like to think we eventually will, but it is far from certain.

replies(1): >>44470034 #

3. koverstreet ◴[05 Jul 25 04:15 UTC] No.44470034[source]▶

>>44469515 #

There's no inherent reason why FUSE has to be noticably slower for buffered IO, it just hasn't gotten nearly enough well thought it attention. But that's starting to change, there's a lot more interest these days in a faster FUSE.

Direct IO would be slower via FUSE, but L4 style IPC could solve that.

It would be an interesting proposition, although not my first choice for the direction I want to go in :)

replies(1): >>44470190 #

4. skissane ◴[05 Jul 25 05:00 UTC] No.44470190{3}[source]▶

>>44470034 #

I think the issue with any new physical filesystem, is even if it becomes mature, fully upstream as part of the mainline Linux kernel, and supported out-of-the-box by all the major distributions - still a lot of people are just never going to use it, because there is so much competition in that space (ext4, XFS, btrfs, etc), people are understandably quite conservative (fear of data loss due to bugs), and the fear that a less popular filesystem may end up being abandoned if something unexpected happens to its primary developer (see e.g. ReiserFS)

By contrast, improvements in performance of FUSE, L4-style IPC, could be much more widely beneficial-both for developers of new physical filesystems (by making possible in-user space implementations where they can iterate faster, get better API/ABI stability, easier adoption by end-users), but also for developers of numerous other pieces of software too

Of course, you personally are going to scratch the itch you want to scratch. But in terms of what’s most beneficial for the Linux ecosystem as a whole, I think FUSE improvements and L4-style IPC would deliver the most benefit per unit of effort

replies(1): >>44472245 #

5. toast0 ◴[05 Jul 25 07:23 UTC] No.44470749[source]▶

>>44468816 (TP) #

Kernel modules exist. The Linux VFS is a slowly evolving filesystem API. Most Linux distributions boot with initramfs, so it's not hard to use a stable filesystem for the bootloader to read the kernel and initramfs which includes the driver for the experimental filesystem.

Sometimes a new filesystem needs changes to things in the kernel and the VFS API isn't enough, but often VFS is enough.

6. snvzz ◴[05 Jul 25 09:07 UTC] No.44471267[source]▶

>>44468816 (TP) #

It is indeed a mistake to target Linux, as it guarantees the majority of effort will be spent tracking Linux, rather than working on the filesystem itself.

There are far better options such as FUSE or the filesystem APIs in other operating systems like Netbsd, Haiku, Genode or even ReactOS (and Windows NT).

Some of the best filesystems such as OpenZFS, HAMMER2 or Lustre are developed outside of Linux.

7. koverstreet ◴[05 Jul 25 12:13 UTC] No.44472245{4}[source]▶

>>44470190 #

I agree about the benefit they'd offer, but the thing is - I already have a todo list that extents out to 2030, and FUSE is going to take a lot of work before it gets there: probably years, because it's going to be done incrementally on top of a big hodgepog instead of being done right by someone willing to invest the time to get it right.

We've had someone show up claiming "I'm going to do FUSE right!" and it never happened, so - the incremental approach is probably best here. But it's going to take awhile.

8. holowoodman ◴[05 Jul 25 21:07 UTC] No.44475551[source]▶

>>44468816 (TP) #

FUSE is what a microkernel filesystem would look like. There are some optimisations that FUSE doesn't do, that microkernels usually have. In the most extreme form, L4 trims down communication primitives to the most efficient platform-specific ways of exchanging memory buffers. In all cases, microkernels and FUSE still need context switches for everything, and those are expensive. If you leave out the context switches, you don't have a microkernel anymore. This is what Windows did by pulling graphics drivers into the kernel, because context switches are slow.

So no. Microkernels have been tried. Microkernel-workalike filesystems are here with FUSE. They suck because microkernels suck when you need performance. Research has gone into different directions, like microkernels as hypervisors and for security, because it has become clear that the performance problems if microkernels are inherent and unfixable.

replies(1): >>44476466 #

9. Dylan16807 ◴[05 Jul 25 23:46 UTC] No.44476466[source]▶

>>44475551 #

> because it has become clear that the performance problems if microkernels are inherent and unfixable.

I don't get the impression that CPU designers have been putting a particularly large focus on making context switches fast. They try but they're busy doing everything else too. If context switches were constant I think silicon would make them go a lot faster.