A typical software L7 load balancer (like nginx) will parse entire TCP stream and HTTP header and applies bunch of logic based on URL, and various HTTP headers.
There is a lot of work going on in the userland, like filling up TCP buffer, parsing HTTP stream, applying bunch of business logic, creating a downstream connection, sending data, getting response, etc.
This is a lot of work in the userland and because of that a default nginx config is like 1024 concurrent connections per core, so not a lot.
L4 load balance on the other hand works purely in a packet switching mode or NAT mode. So the work consists in just replacing IP header fields (src.ip, src.port, dst.ip, dst.port, proto), it can use various frameworks like intel vectorized packet processing or Intel dpdk for accelerated packet switching.
Because of that, L4 load balancer can work perform very very close to the line rate speed, meaning it can load balance connections as fast as packets arrive to the network interface card. Line rate is the theoretical maximum of packet processing.
In case of stateless L4 load balancing there is no upper bound in number of concurrent sessions to balance, it will almost as fast as core router that feeds the data.
As you can see L4 is clearly superior in performance, but the reason L4 LB is possible is because it has TCP inbound and TCP outbound, so the only work required is replace IP header and recalculate CRC.
With Homa, you would need to fully process TCP stream, before you initiate Homa connection, meaning you will waste a lot of RAM on keeping TCP buffers and rebuilding the stream according to the TCP sequence. Homa will lose all its benefits in the load balancing scenario.
Author pitches only one use case for homa: East-West traffic, but again - these days the software is really agnostic of this East-West direction. What your software thinks is running in the server in next rack, could as well be a server in a different Availability Zone or read replica in different geo region.
And that's the beauty of modern infra: everything is a software, everything is ephemeral, and we don't really care if we running this in a single DC or multiple DCs.
Because of that, I think we will still stick to TCP as a proven protocol that will seamlessly interop when crossing different WAN/LAN/VPN networks
I am not even talking about software defined networks, like SD-WAN where transport&signaling is done by the vendor-specfic underlay network, and overlay network is really just abstraction for users that hides a lot network discovery and network management underneath