I think the performance cost of FUSE compared to in-kernel filesystems is improving with time - FUSE with io_uring is a big step forward, but the immaturity of io_uring is an obstacle to its adoption, at least in the short-to-medium-term. I’m sure in the future we’ll see even further improvements in this area. But will we ever reach the Nirvana where FUSE equals the performance of in-kernel filesystems, or (maybe more realistically) the performance overhead has become so marginal nobody is bothered by it in practice? I’d like to think we eventually will, but it is far from certain.
Direct IO would be slower via FUSE, but L4 style IPC could solve that.
It would be an interesting proposition, although not my first choice for the direction I want to go in :)
By contrast, improvements in performance of FUSE, L4-style IPC, could be much more widely beneficial-both for developers of new physical filesystems (by making possible in-user space implementations where they can iterate faster, get better API/ABI stability, easier adoption by end-users), but also for developers of numerous other pieces of software too
Of course, you personally are going to scratch the itch you want to scratch. But in terms of what’s most beneficial for the Linux ecosystem as a whole, I think FUSE improvements and L4-style IPC would deliver the most benefit per unit of effort
Sometimes a new filesystem needs changes to things in the kernel and the VFS API isn't enough, but often VFS is enough.
There are far better options such as FUSE or the filesystem APIs in other operating systems like Netbsd, Haiku, Genode or even ReactOS (and Windows NT).
Some of the best filesystems such as OpenZFS, HAMMER2 or Lustre are developed outside of Linux.
We've had someone show up claiming "I'm going to do FUSE right!" and it never happened, so - the incremental approach is probably best here. But it's going to take awhile.
So no. Microkernels have been tried. Microkernel-workalike filesystems are here with FUSE. They suck because microkernels suck when you need performance. Research has gone into different directions, like microkernels as hypervisors and for security, because it has become clear that the performance problems if microkernels are inherent and unfixable.
I don't get the impression that CPU designers have been putting a particularly large focus on making context switches fast. They try but they're busy doing everything else too. If context switches were constant I think silicon would make them go a lot faster.