35x less system calls = others wait less for the kernel to handle their system calls
That isn't how it works. There isn't a fixed syscall budget distributed among running programs. Internally, the kernel is taking many of the same locks and resources to satisfy io_uring requests as ordinary syscall requests.
Also, more fs-related system calls mean less available kernel threads to process these system calls. eg. XFS can paralellize mutations only up to its number of allocation groups (agcount)
Again, this just isn't true. The same "stat" operations are being performed one way or another.
> Also, more fs-related system calls mean less available kernel threads to process these system calls.
Generally speaking sync system calls are processed in the context of the calling (user) thread. They don't consume kernel threads generally. In fact the opposite is true here -- io_uring requests are serviced by an internal kernel thread pool, so to the extent this matters, io_uring requests consume more kernel threads.