A little offtop, but do you know a number in usecs that io_uring can save on enterprise grade servers, with 10G NICs, for socket latency overheads vs LD_PRELOAD when hardware supports that? Let's say it's Mellanox 4 or 5. My understanding is that each gives around 10us savings, maybe less. Based on some benchmarking, which was not focused on any of those explicitly but had some imprecise experiments. It also looks like they do not add up. Do you have a number based on real experience?