←back to thread

67 points anon6362 | 1 comments | | HN request time: 0s | source
Show context
alexdns ◴[] No.45074520[source]
It was considered innovative when it was first shared here eight years ago.
replies(1): >>45074700 #
nurumaik ◴[] No.45074700[source]
Anything more innovative happened since (honestly curious)?
replies(4): >>45075146 #>>45075479 #>>45075495 #>>45077234 #
ozgrakkurt ◴[] No.45075495[source]
You can apparently do 100gbit/sec on a single thread over ethernet with io uring.
replies(2): >>45075999 #>>45076203 #
touisteur ◴[] No.45075999{3}[source]
Recently did 400gb/s on a single core / 4x100gb nics (or just the one 400g nic, too) with dpdk. Mind you it's with jumbo frames and constant packet size for hundreds of mostly synchronized streams... You won't process each packet individually, mostly put them in queues for later batch-process by other cores. Amazing for data acquisition applications using UDP streams.

I keep watching and trying io_uring and still can't make it work as fast with simple code as consistently for those use cases. AF_XDP gets me partly there but then you're writing ebpf... might as well go full-dpdk.

Maybe it's a skill issue on my part, though. Or just a well-fitting niche.

replies(2): >>45076075 #>>45077079 #
ozgrakkurt ◴[] No.45077079{4}[source]
Sounds super cool but dpdk sounds like it won't be worth the difficulty from what I read so far.

I also want to get into socket io using io_uring in zig. I'll try to apply everything I found in liburing wiki [0] and see how much I can get (max hardware I have is 10gbit/s).

Seems like there is: - multi-shot requests - register_napi on uring instance - zero copy receive/send. (Probably won't be able to get into it)

Did you already try these or are there other configurations I can add to improve it?

[0]: https://github.com/axboe/liburing/wiki/io_uring-and-networki...

replies(2): >>45077749 #>>45078109 #
1. touisteur ◴[] No.45078109{5}[source]
I ... kind of agree with the difficulty. I don't get it - DPDK is at its core really not a complex API ! Allocate a pool of buffers, and in an infinite loop, ask your NIC to fill these buffers. There. After that, yes you have to decap every packet (ethernet then IP - don't forget reassembly - then whatever you have over - UDP is absolutely no effort, TCP... not so). It's wholly manageable to anyone knowing a bit of light C++ (more C-like) and lower layers (and can parse the sometimes very dry and cryptic doc, for all the utility fonctions. Interaction with the actual consumer of the data can be done with DPDK-provided primitives or simple shared memory... it's really not hard for a mid-level systems programmer. But I still find myself unable to hire people who can work at that level of the stack, a bit baffling. I can't see how they'd be better with io_uring or AF_XDP and all their inherent complexity. Anything harder than a socket and epoll and you're a wizard now...

One other big plus of DPDK for me is the low-level access to hardware offload. GPUDirect (when you can get it to work), StorageDirect or most of the available DMA engines in some (not so) high-end hardware. The flow API on mellanox hardware is the basis of many of my multi-accelerator applications (I wish they supported P4 for packet format instead, or just open-source whatever low-level ISA the controller is running, but I don't buy enough gear to have a voice). Perusing the DPDK documentation can give ideas.

So, yes, very low-level with some batteries included. Good and stable for niche uses. But far smaller hiring pool (is the io_uring-100Gb pool bigger ? I don't know).