←back to thread

lsr: ls with io_uring

(rockorager.dev)
335 points mpweiher | 8 comments | | HN request time: 0.262s | source | bottom
Show context
ninkendo ◴[] No.44604369[source]
I wonder how it performs against an NFS server with lots of files, especially one over a kinda-crappy connection. Putting an unreliable network service behind blocking POSIX syscalls is one of the main reasons NFS is a terrible design choice (as can be seen by anyone who's tried to ctrl+c any app that's reading from a broken NFS folder), but I wonder if io_uring mitigates the bad parts somewhat.
replies(4): >>44604885 #>>44605310 #>>44605896 #>>44610978 #
mprovost ◴[] No.44605896[source]
The designers of NFS chose to make a distributed system emulate a highly consistent and available system (a hard drive), which was (and is) a reasonable tradeoff. It didn't require every existing tool, such as ls, to deal with things like the server rebooting while listing a directory. (The original NFS protocol is stateless, so clients can survive server reboots.) What does vi do when the server hosting the file you're editing stop responding? None of these tools have that kind of error handling.

I don't know how io_uring solves this - does it return an error if the underlying NFS call times out? How long do you wait for a response before giving up and returning an error?

replies(2): >>44606764 #>>44609333 #
1. ninkendo ◴[] No.44606764[source]
> The designers of NFS chose to make a distributed system emulate a highly consistent and available system (a hard drive), which was (and is) a reasonable tradeoff

I don't agree that it was a reasonable tradeoff. Making an unreliable system emulate a reliable one is the very thing I find to be a bad idea. I don't think this is unique to NFS, it applies to any network filesystem you try to present as if it's a local one.

> What does vi do when the server hosting the file you're editing stop responding? None of these tools have that kind of error handling.

That's exactly why I don't think it's a good idea to just pretend a network connection is actually a local disk. Because tools aren't set up to handle issues with it being down.

Contrast it with approaches where the client is aware of the network connection (like HTTP/GRPC/etc)... the client can decide for itself how long it should retry failed requests, whether it should bubble up failures to the caller, or work "offline" until it gets an opportunity to resync, etc. With NFS the syscall just hangs forever by default.

Distributed systems are hard, and NFS (and other similar network filesystems) just pretend it isn't hard at all, which is great until something goes wrong, and then the abstraction leaks.

(Also I didn't say io_uring solves this, but I'm curious as to whether its performance would be any better than blocking calls.)

replies(4): >>44607948 #>>44608276 #>>44608718 #>>44615332 #
2. JonChesterfield ◴[] No.44607948[source]
> Making an unreliable system emulate a reliable one is the very thing I find to be a bad idea.

It's the only idea though. We don't know how to make reliable systems, other than by cobbling together a lot of unreliable ones and hoping the emergent behaviour is more reliable than that of the parts.

replies(3): >>44608130 #>>44608869 #>>44611250 #
3. mrlongroots ◴[] No.44608130[source]
I think "making an unreliable system emulate a reliable one = bad" is too simplistic a heuristic.

We do this all the time with things like ECC and retransmissions and packet recovery. This intrinsically is not bad at all, the question is: what abstraction does this expose to the higher layer.

With TCP the abstraction we expect is "pretty robust but has tail latencies, do not use for automotive networks or avionics" and that works out well. The right question IMO is always "what kind of tail behaviors does this expose, and are the consumers of the abstraction prepared for them".

4. pvtmert ◴[] No.44608276[source]
I think it highly depends on your architecture and the scale you are pushing.

The other far-edge is the S3, where appending has just been possible within the last a few years as far as I can tell. Meanwhile editing a file requiring a full download/upload, not great either.

For the NFS case, I cannot say it's my favorite, but certainly easy to setup and run on your own. Obviously a rebooting server may cause certain issues during the unavailability, but the NFS server should be in highly-available. with NFSv4.1, you may use UDP as the primary transport, which allows you to swap/switch servers pretty quickly. (Given you connect to a DNS/FQDN rather than the IP address)

Another case is the plug and play, with NFS, UNIX permissions, ownership/group details, execute bit, etc are all preserved nicely...

Besides, you could always have a "cache" server locally. Similar to GDrive or OneDrive clients, constantly syncing back and forth, caching the data locally, using file-handles to determine locks. Works pretty well _at scale_ (ie. many concurrent users in the case of GDrive or OneDrive).

5. cwillu ◴[] No.44608718[source]
Do you have similar thoughts about iscsi?
6. ninkendo ◴[] No.44608869[source]
I think a difference in magnitude turns into a difference in kind. There's lots of systems where the unreliability of the underlying parts is low enough that it can be a simple matter of retrying quickly once or twice (bit flips in ECC RAM), and others where at least the unreliability is well-known enough that software has all learned to work around the leaky abstraction (like TCP. Although QUIC and other protocols show that maybe it's better to move the unreliability up a layer for more intelligent handling of the edge cases.)

But the unreliability of "the network" compared to "my SATA port" is a whole different ballgame. Filesystems are designed for the latter, and when software uses filesystems it generally expects a reliability guarantee that "the network" can't really provide. Especially on mobile internet, wifi, etc... And that's not even getting into places where NFS just can't do things that local filesystems can do (has anyone figured out how to make inotify/fsevents work?) and all the software that subtly breaks because of it.

7. tbrownaw ◴[] No.44611250[source]
> other than by cobbling together a lot of unreliable ones and hoping the emergent behaviour

No, there's math to calculate that without having to rely on hope.

8. mprovost ◴[] No.44615332[source]
Sure, at some point you have to let the abstraction leak. At the time Sun designed NFS, you could basically count on the fact that the server was some Solaris machine capable of multiple years of uptime on your LAN. Filesystems never made the transition to running over the internet because that was too unreliable and POSIX didn't really provide the right interfaces to expose that.

We're coming to the end of the road with that generation of OS design - MacOS is still Unix and thinks that it's running on a VAX. There's a reason why Macbooks don't come with a 5G modem: because programs would have to be aware of the underlying network. That's why it's inevitable that we'll move to something like IOS (or Android), because every program that uses the network has to handle not only failures but situations like being in flight mode or running on low-bandwidth mobile networks.