(netflixtechblog.com)

160 points simplesort | 3 comments | 08 Apr 25 18:21 UTC | HN request time: 0.857s | source

Show context

slt2021 ◴[08 Apr 25 22:11 UTC] No.43626982[source]▶

Question to the Netflix folks: I saw a lot of in-house developed tools being quoted, do you guys have service mesh like linkerd ?

Have you guys evaluated vendors like Kentik?

I would love to get more insight into what do you guys actually do with flow logs? for example if I store 1 TB of flow logs, what value can I actually derive from them that justify the cost of collection, processing, and storage.

replies(2): >>43627581 #>>43628881 #

1. retiredpapaya ◴[08 Apr 25 23:49 UTC] No.43627581[source]▶

>>43626982 #

I think Netflix does use an Envoy-based Service Mesh [1], and they roll their own control plane.

https://netflixtechblog.com/zero-configuration-service-mesh-...

replies(1): >>43627703 #

2. slt2021 ◴[09 Apr 25 00:09 UTC] No.43627703[source]▶

>>43627581 (TP) #

If the goal of gathering and attributing VPC flows is to have a workload granularity flow logs, then imho gathering mesh level logs is more direct and atraight forward approach, because mesh(and workload orchestrator) are uniquely qualified to know when workload A is running on a host X and is trying to connect to workload B.

Looking at Envoy access logs for example is more straightforward and simple aplroach, than running distributed ebpf and memory intensive large spark streaming job

replies(1): >>43627867 #

3. nptr ◴[09 Apr 25 00:44 UTC] No.43627867[source]▶

>>43627703 #

The blog post mentioned that "The eBPF flow logs provide a comprehensive view of service topology and network health across Netflix’s extensive microservices fleet, regardless of the programming language, RPC mechanism, or application-layer protocol used by individual workloads."

Service mesh may have restrictions on the network protocols and may not cover all network traffic (like connections to Kafka and databases).

↑

How Netflix Accurately Attributes eBPF Flow Logs