And as usual, hardware gets faster, better and cheaper over the next years and suddenly the problem isn't a problem anymore - if it even ever was for the vast majority of applications. We only recently got a new fleet of compute nodes with 100gbit NICs. The previous one only had 10, plus omnipath. We're going ethernet only this time.
I remember when saturating 10gbit/s was a challenge. This time around, reaching line speed with tcp, the server didn't even break a sweat. No jumbo frames, no fiddling with tunables. And that actually was while testing with 4 years old xeon boxes, not even the final hw.
Again, I can see how there are use cases that benefit from even lower latency, but thats a niche compared to all DC business, and I'd assume you might just want rdma in that case, instead of optimizing on top of ethernet or IP.