How Netflix Accurately Attributes eBPF Flow Logs

(netflixtechblog.com)

160 points simplesort | 2 comments | 08 Apr 25 18:21 UTC | HN request time: 0.425s | source

Show context

thewisenerd ◴[08 Apr 25 20:28 UTC] No.43626119[source]▶

so they didn't want to pay for AWS CloudWatch [1]; decided to roll their in-house network flow log collection; and had to re-implement attribution?

i wonder how many hundreds of thousands of dollars network flow logs cost them; obviously at some point it is going to be cheaper to re-implement monitoring in-house.

[1]: https://youtu.be/8C9xNVYbCVk?feature=shared&t=1685

replies(2): >>43626400 #>>43626840 #

1. Hikikomori ◴[08 Apr 25 20:59 UTC] No.43626400[source]▶

>>43626119 #

Because vanilla flowlogs that you get from VPC/TGW are nearly useless outside the most basic use cases. All you get is how many bytes and which tcp flags were seen per connection per 10 minutes. Then you need to attribute ip addresses to actual resources yourself separately, which isn't simple when you have containers or k8s service networking.

Doing it with eBPF on end hosts you can get the same data, but you can attribute it directly as you know which container it originates from, snoop dns, then you can get extremely useful metrics like per tcp connection ack delay and retransmissions, etc.

AWS recently released Cloudwatch Network Monitoring that also uses an agent with eBPF, but its almost like a children's toy compared to something like Datadog NPM. I was working on a solution similar to Netflix's when NPM was released, was no point after that.

replies(1): >>43627832 #

2. nptr ◴[09 Apr 25 00:38 UTC] No.43627832[source]▶

>>43626400 (TP) #

This is spot on. The AWS logs can also be orders of magnitude more expensive.

↑