←back to thread

188 points ilove_banh_mi | 1 comments | | HN request time: 0.199s | source
Show context
slt2021 ◴[] No.42170020[source]
the problem with trying to replace TCP only inside DC, is because TCP will still be used outside DC.

Networking Engineering is already convoluted and troublesome as it is right now, using only tcp stack.

When you start using homa inside, but TCP from outside things will break, because a lot of DC requests are created as a response for an inbound request from outside DC (like a client trying to send RPC request).

I cannot imagine trying to troubleshoot hybrid problems at the intersection of tcp and homa, its gonna be a nightmare.

Plus I don't understand why create a a new L4 transport protocol for a specific L7 application (RPC)? This seems like a suboptimal choice, because RPC of today could be replaced with something completely different, like RDMA over Ethernet for AI workloads or transfer of large streams like training data/AI model state.

I think tuning TCP stack in the kernel, adding more configuration knobs for TCP, switching from stream(tcp) to packet (udp) protocols where it is warranted, will give more incremental benefits.

One major thing author missed is security applications, these are considered table stakes: 1. encryption in transit: handshake/negotiation 2. ability to intercept and do traffic inspection for enterprise security purposes 3. resistance to attacks like flood 4. security of sockets in containerized Linux environment

replies(2): >>42170162 #>>42170268 #
nicman23 ◴[] No.42170162[source]
only thing homa makes sense is when there is no external tcp to the peers or at least not on the same context ie for roce
replies(1): >>42171014 #
1. slt2021 ◴[] No.42171014[source]
1. add software defined network, where transport and signaling is done by vendor-specific underlay, possibly across multiple redundant uplinks

2. term "external" is really vague as modern networks have blended boundaries. Things like availability zone, region make dc-dc connection irrelevant, because at any point of time you will be required to failover to another AZ/DC/region.

3. when I think of inter-Datacenter, I can only think of Ethernet. That's really it. Even in Ethernet, what you think of a peer and existing in your same subnet, could be a different DC, again due to software-defined network.