←back to thread

188 points ilove_banh_mi | 9 comments | | HN request time: 0s | source | bottom
1. akira2501 ◴[] No.42169811[source]
> If Homa becomes widely deployed, I hypothesize that core congestion will cease to exist as a significant networking problem, as long as the core is not systemically overloaded.

Yep. Sure; but, what happens when it becomes overloaded?

> Homa manages congestion from the receiver, not the sender. [...] but the remaining scheduled packets may only be sent in response to grants from the receiver

I hypothesize it will not be a great day when you do become "systemically" overloaded.

replies(2): >>42170099 #>>42182735 #
2. andrewflnr ◴[] No.42170099[source]
Will it be a worse day than it would be with TCP? Either way, the only solution is to add more hardware, unless I'm misunderstanding the term "systemically overloaded".
replies(1): >>42171605 #
3. bayindirh ◴[] No.42171605[source]
I think so. If your core saturates, you add more capacity to your core switch. In HOMA, you need to add more receivers, but if you can't add them because the core can't handle more ports?

Ehrm. Looks like core saturation all over again.

Edit: Just popped to my mind. What prevents maliciously reducing "receive quotas" on compromised receivers to saturate an otherwise capable core? Looks like it's a very low bar for a very high impact DOS attack. Ouch.

replies(2): >>42172889 #>>42173204 #
4. klysm ◴[] No.42172889{3}[source]
This is designed for in data center use, so the security tradeoff is probably worth it
replies(1): >>42174112 #
5. andrewflnr ◴[] No.42173204{3}[source]
The "receivers" are just other machines in your data center. Their number is determined by the workload, same as always. Adding more will tend to increase traffic.

I'm not a datacenter expert, but is "not enough ports" really a showstopper? It just sounds like bad planning to me. And still not a protocol issue.

replies(1): >>42174080 #
6. bayindirh ◴[] No.42174080{4}[source]
Depends on where you run out of ports, actually.

A datacenter has layers of networking, and some of it is what we call "the core" or "the spine" of the network. Sometimes you need to shuffle things around, and you need to move things closer to core, and you can't because there are no ports. Or you procure the machines, and adding new machines requires some ports closer or at the core, and you are already running at full capacity there.

I mean, it can be dismissed like "planning/skill issue", but these switches are fat machines in terms of bandwidth, and they do not come cheap, or you can't get them during lunch break from the IT shop at the corner when required.

Being able to carry 1.2Tbit/sec of aggregated network from a dozen thin fibers is exciting, but scaling expensive equipment at a whim is not.

At the end of the day "network core" is more monolithic than your outer network layers and server layout. It's more viscous and evolves slowly. This inevitably corners you in some cases, but with careful planning you can postpone that, but not prevent.

replies(1): >>42175758 #
7. bayindirh ◴[] No.42174112{4}[source]
Nope. Tending a datacenter close to two decades, I can say that putting people behind NATs and in jails/containers/VMs doesn't prevent security incidents all the time.

With all the bandwidth, processing power and free cooling, a server is always a good target, and sometimes people will come equipped with 0-days or exploits which are very, very fresh.

Been there, seen and cleaned that mess. I mean, reinstallation is 10 minutes, but the event is always ugly.

8. andrewflnr ◴[] No.42175758{5}[source]
Ok, that's good to know. But I still don't see how having congestion control be driven by receivers instead of senders makes it harder to fix than it is currently.

I mean, I don't actually see why you would need more ports at all. You still just have a certain number of machines that want to exchange a certain amount of traffic. That number is either above or below what your core can handle (guessing, again, at what the author means by "systemically overloaded").

9. bewo001 ◴[] No.42182735[source]
I don't understand the difference to TCP here. If the path is not congested but the receiving endpoint is, the receiver can control the bandwidth by reducing the window size. Ultimately, it is always the sender that has to react to congestion by reducing the amount of traffic sent.

RPC is something of a red flag as well. RPCs will never behave like local procedure calls, so the abstraction will always leak (the pendulum of popularity keeps swinging back and forth between RPC and special purpose protocols every few years, though).