QUIC is not quick enough over fast internet

(arxiv.org)

313 points carlos-menezes | 1 comments | 19 Oct 24 21:04 UTC | HN request time: 0.202s | source

Show context

cletus ◴[19 Oct 24 23:46 UTC] No.41891721[source]▶

At Google, I worked on a pure JS Speedtest. At the time, Ookla was still Flash-based so wouldn't work on Chromebooks. That was a problem for installers to verify an installation. I learned a lot about how TCP (I realize QUIC is UDP) responds to various factors.

I look at this article and consider the result pretty much as expected. Why? Because it pushes the flow control out of the kernel (and possibly network adapters) into userspace. TCP has flow-control and sequencing. QUICK makes you manage that yourself (sort of).

Now there can be good reasons to do that. TCP congestion control is famously out-of-date with modern connection speeds, leading to newer algorithms like BRR [1] but it comes at a cost.

But here's my biggest takeaway from all that and it's something so rarely accounted for in network testing, testing Web applications and so on: latency.

Anyone who lives in Asia or Australia should relate to this. 100ms RTT latency can be devastating. It can take something that is completely responsive to utterly unusable. It slows down the bandwidth a connection can support (because of the windows) and make it less responsive to errors and congestion control efforts (both up and down).

I would strongly urge anyone testing a network or Web application to run tests where they randomly add 100ms to the latency [2].

My point in bringing this up is that the overhead of QUIC may not practically matter because your effective bandwidth over a single TCP connection (or QUICK stream) may be MUCH lower than your actual raw bandwidth. Put another way, 45% extra data may still be a win because managing your own congestion control might give you higher effective speed over between two parties.

[1]: https://atoonk.medium.com/tcp-bbr-exploring-tcp-congestion-c...

[2]: https://bencane.com/simulating-network-latency-for-testing-i...

replies(11): >>41891766 #>>41891768 #>>41891919 #>>41892102 #>>41892118 #>>41892276 #>>41892709 #>>41893658 #>>41893802 #>>41894376 #>>41894468 #

ec109685 ◴[19 Oct 24 23:54 UTC] No.41891766[source]▶

>>41891721 #

For reasonably long downloads (so it has a chance to calibrate), why don't congestion algorithms increase the number of inflight packets to a high enough number that bandwidth is fully utilized even over high latency connections?

It seems like it should never be the case that two parallel downloads will preform better than a single one to the same host.

replies(4): >>41891861 #>>41891874 #>>41891957 #>>41892726 #

1. dan-robertson ◴[20 Oct 24 00:35 UTC] No.41891957[source]▶

>>41891766 #

There are two places a packet can be ‘in-flight’. One is light travelling down cables (or the electrical equivalent) or in memory being processed by some hardware like a switch, and the other is sat in a buffer in some networking appliance because the downstream connection is busy (eg sending packets that are further up the queue, at a slower rate than they arrive). If you just increase bandwidth it is easy to get lots of in-flight packets in the second state which increases latency (admittedly that doesn’t matter so much for long downloads) and the chance of packet loss from overly full buffers.

CUBIC tries to increase bandwidth until it hits packet loss, then cuts bandwidth (to drain buffers a bit) and ramps up and hangs around close to the rate that led to loss, before it tries sending at a higher rate and filling up buffers again. Cubic is very sensitive to packet loss, which makes things particularly difficult on very high bandwidth links with moderate latency as you need very low rates of (non-congestion-related) loss to get that bandwidth.

BBR tries to do the thing you describe while also modelling buffers and trying to keep them empty. It goes through a cycle of sending at the estimated bandwidth, sending at a lower rate to see if buffers got full, and sending at a higher rate to see if that’s possible, and the second step can be somewhat harmful if you don’t need the advantages of BBR.

I think the main thing that tends to prevent the thing you talk about is flow control rather than congestion control. In particular, the sender needs a sufficiently large send buffer to store all unacked data (which can be a lot due to various kinds of ack-delaying) in case it needs to resend packets, and if you need to resend some then your send buffer would need to be twice as large to keep going. On the receive size, you need big enough buffers to be able to fill up those buffers from the network while waiting for an earlier packet to be retransmitted.

On a high-latency fast connection, those buffers need to be big to get full bandwidth, and that requires (a) growing a lot, which can take a lot of round-trips, and (b) being allowed by the operating system to grow big enough.

↑