For TCP streams syscall overhead isn't a big issue really, you can easily transfer large chunks of data in each write(). If you have TCP segmentation offload available you'll have no serious issues pushing 100gbit/s. Also if you are sending static content don't forget sendfile().
UDP is a whole another kettle of fish, get's very complicated to go above 10gbit/s or so. This is a big part of why QUIC really struggles to scale well for fat pipes [1]. sendmmsg/recvmmsg + UDP GRO/GSO will probably get you to ~30gbit/s but beyond that is a real headache. The issue is that UDP is not stream focused so you're making a ton of little writes and the kernel networking stack as of today does a pretty bad job with these workloads.
FWIW even the fastest QUIC implementations cap out at <10gbit/s today [2].
Had a good fight writing a ~20gbit userspace UDP VPN recently. Ended up having to bypass the kernels networking stack using AF_XDP [3].
I'm available for hire btw, if you've got an interesting networking project feel free to reach out.
1. https://arxiv.org/abs/2310.09423
2. https://microsoft.github.io/msquic/
3. https://github.com/apoxy-dev/icx/blob/main/tunnel/tunnel.go