←back to thread

306 points carlos-menezes | 1 comments | | HN request time: 0s | source
Show context
jrpelkonen ◴[] No.41891238[source]
Curl creator/maintainer Daniel Stenberg blogged about HTTP/3 in curl a few months ago: https://daniel.haxx.se/blog/2024/06/10/http-3-in-curl-mid-20...

One of the things he highlighted was the higher CPU utilization of HTTP/3, to the point where CPU can limit throughput.

I wonder how much of this is due to the immaturity of the implementations, and how much this is inherit due to way QUIC was designed?

replies(4): >>41891693 #>>41891790 #>>41891813 #>>41891887 #
1. dan-robertson ◴[] No.41891813[source]
Two recommendations are for improving receiver-side implementations – optimising them and making them multithreaded. Those suggest some immaturity of the implementations. A third recommendation is UDP GRO, which means modifying kernels and ideally NIC hardware to group received UDP packets together in a way that reduces per-packet work (you do lots of per-group work instead of per-packet work). This already exists in TCP and there are similar things on the send side (eg TSO, GSO in Linux), and feels a bit like immaturity but maybe harder to remedy considering the potential lack of hardware capabilities. The abstract talks about the cost of how acks work in QUIC but I didn’t look into that claim.

Another feature you see for modern tcp-based servers is offloading tls to the hardware. I think this matters more for servers that may have many concurrent tcp streams to send. On Linux you can get this either with userspace networking or by doing ‘kernel tls’ which will offload to hardware if possible. That feature also exists for some funny stuff in Linux about breaking down a tcp stream into ‘messages’ which can be sent to different threads, though I don’t know if it allows eagerly passing some later messages when earlier packets were lost.