←back to thread

148 points meysamazad | 1 comments | | HN request time: 0.197s | source
Show context
notepad0x90 ◴[] No.45957801[source]
I'm slightly surprised cloudflare isn't using a userspace tcp/ip stack already (faster - less context switches and copies). It's the type of company I'd expect to actually need one.
replies(2): >>45958128 #>>45959181 #
Droobfest ◴[] No.45958128[source]
From 2016: https://blog.cloudflare.com/why-we-use-the-linux-kernels-tcp...
replies(1): >>45958213 #
notepad0x90 ◴[] No.45958213[source]
Nice, they know better. But it also makes me wonder, because they're saying "but what if you need to run another app", I'd expect for things like loadbalancers for example, you'd only run one app per server on the data plane, the user space stack handles that, and the OS/services use a different control plane NIC with the kernel stack so that boxes are reachable even if there is link saturation, ddos,etc..

It also makes me wonder, why is tcp/ip special? The kernel should expose a raw network device. I get physical or layer 2 configuration happening in the kernel, but if it is supposed to do IP, then why stop there, why not TLS as well? Why run a complex network protocol stack in the kernel when you can just expose a configured layer 2 device to a user space process? It sounds like "that's just the way it's always been done" type of a scenario.

replies(3): >>45958565 #>>45959377 #>>45960224 #
hansvm ◴[] No.45960224[source]
TCP/IP is, in theory (AFAIK all experiments related to this fizzled out a decade or two ago), a global resource when you start factoring in congestion control. TLS is less obviously something you would want kernel involvement from, give or take the idea of outsourcing crypto to the kernel or some small efficiency gains for some workloads by skipping userspace handoffs, with more gains possible with NIC support.
replies(2): >>45960346 #>>45961233 #
1. Veserv ◴[] No.45960346[source]
You do want to offload crypto to dedicated hardware otherwise your transport will get stuck at a paltry 40-50 Gb/s per core. However, you do not need more than block decryption; you can leave all of the crypto protocol management in userspace with no material performance impact.