I keep watching and trying io_uring and still can't make it work as fast with simple code as consistently for those use cases. AF_XDP gets me partly there but then you're writing ebpf... might as well go full-dpdk.
Maybe it's a skill issue on my part, though. Or just a well-fitting niche.
I also want to get into socket io using io_uring in zig. I'll try to apply everything I found in liburing wiki [0] and see how much I can get (max hardware I have is 10gbit/s).
Seems like there is: - multi-shot requests - register_napi on uring instance - zero copy receive/send. (Probably won't be able to get into it)
Did you already try these or are there other configurations I can add to improve it?
[0]: https://github.com/axboe/liburing/wiki/io_uring-and-networki...
One other big plus of DPDK for me is the low-level access to hardware offload. GPUDirect (when you can get it to work), StorageDirect or most of the available DMA engines in some (not so) high-end hardware. The flow API on mellanox hardware is the basis of many of my multi-accelerator applications (I wish they supported P4 for packet format instead, or just open-source whatever low-level ISA the controller is running, but I don't buy enough gear to have a voice). Perusing the DPDK documentation can give ideas.
So, yes, very low-level with some batteries included. Good and stable for niche uses. But far smaller hiring pool (is the io_uring-100Gb pool bigger ? I don't know).