Most active commenters

__turbobrew__(3)
nptr(3)

Popular/hot comments

>>43625773 #
>>43625701 #
>>43625797 #
>>43626383 #
>>43628048 #

How Netflix Accurately Attributes eBPF Flow Logs

(netflixtechblog.com)

1. butlike ◴[08 Apr 25 19:42 UTC] No.43625701[source]▶

>>43624888 (OP) #

All that logging and they cant figure out why people are going to other streaming services

replies(4): >>43625729 #>>43625773 #>>43625797 #>>43625972 #

2. neogodless ◴[08 Apr 25 19:44 UTC] No.43625729[source]▶

>>43625701 #

I mean, I haven't been a subscriber for a couple years but are they losing a lot of subscribers?

https://backlinko.com/netflix-users

I see some slowing growth particularly across 2021/2022 but as of this report (April 2024) they were still growing through 2023.

260M subscribers. They aren't exactly hurting.

replies(1): >>43625877 #

3. ASalazarMX ◴[08 Apr 25 19:50 UTC] No.43625773[source]▶

>>43625701 #

Despite their awful UX, I'm always impressed with how reliable their service is, technically speaking. Video is always good and responsive even on less-than-stellar connections, you can leave a show paused for hours, and resume it almost instantly. Their fast.com speed test is always much faster than your regular internet access, I guess thanks to their Open Connect Appliances.

It must be great to work for them in infrastructure and backends.

replies(5): >>43626160 #>>43626388 #>>43626597 #>>43627654 #>>43627779 #

4. seneca ◴[08 Apr 25 19:53 UTC] No.43625797[source]▶

>>43625701 #

No one I know uses Netflix anymore, and I haven't for a while, but from what I've seen their subscriber numbers are actually doing quite well.

replies(3): >>43625854 #>>43626229 #>>43626311 #

5. temp0826 ◴[08 Apr 25 19:59 UTC] No.43625854{3}[source]▶

>>43625797 #

I'm not sure I know anyone that _doesn't_ have it

replies(2): >>43626085 #>>43627565 #

6. EwanToo ◴[08 Apr 25 20:01 UTC] No.43625877{3}[source]▶

>>43625729 #

It's now over 300 million subscribers, as you say they're doing OK.

https://ir.netflix.net/investor-news-and-events/financial-re...

7. blinded ◴[08 Apr 25 20:10 UTC] No.43625972[source]▶

>>43625701 #

lol wat?

They make 10 billion+ a <b>quarter</b>.

8. steve_adams_86 ◴[08 Apr 25 20:23 UTC] No.43626085{4}[source]▶

>>43625854 #

Like seneca, I also don’t know anyone who uses it. That’s interesting. I haven’t used it for around 2 years.

I wonder who uses it when I see their reports, because I don’t know them. I probably have a weird group of friends. I’m sure if I asked, some of my coworkers use it.

My kids would love if we used it. Perhaps it’s big among younger people.

replies(2): >>43626123 #>>43626417 #

9. thewisenerd ◴[08 Apr 25 20:28 UTC] No.43626119[source]▶

>>43624888 (OP) #

so they didn't want to pay for AWS CloudWatch [1]; decided to roll their in-house network flow log collection; and had to re-implement attribution?

i wonder how many hundreds of thousands of dollars network flow logs cost them; obviously at some point it is going to be cheaper to re-implement monitoring in-house.

[1]: https://youtu.be/8C9xNVYbCVk?feature=shared&t=1685

replies(2): >>43626400 #>>43626840 #

10. temp0826 ◴[08 Apr 25 20:28 UTC] No.43626123{5}[source]▶

>>43626085 #

Maybe that's part of it?...my sibling has a toddler and they use it for a lot of children's shows. Conversely, my mother watches it often too (not for kids shows!)

11. yuters ◴[08 Apr 25 20:32 UTC] No.43626160{3}[source]▶

>>43625773 #

I have an old fire tv and never tried to stop automatic updates on it, it has become so slow and unresponsive that I'm barely able to switch inputs to use something else. Netflix is the only app that still works on that tv.

replies(1): >>43627557 #

12. nikolay_sivko ◴[08 Apr 25 20:37 UTC] No.43626202[source]▶

>>43624888 (OP) #

At Coroot, we solve the same problem, but in a slightly different way. The traffic source is always a container (Kubernetes pod, systemd slice, etc.). The destination is initially identified as an IP:PORT pair, which, in the case of Kubernetes services, is often not the final destination. To address this, our agent also determines the actual destination by accessing the conntrack table at the eBPF level. Then, at the UI level, we match the actual destination with metadata about TCP listening sockets, effectively converting raw connections into container-to-container communications.

The agent repo: https://github.com/coroot/coroot-node-agent

13. zX41ZdbW ◴[08 Apr 25 20:51 UTC] No.43626335[source]▶

>>43624888 (OP) #

If you are interested in network monitoring in Kubernetes, it's worth looking at Kubenetmon: https://github.com/ClickHouse/kubenetmon - an open-source eBPF-based implementation from ClickHouse.

replies(1): >>43628908 #

14. nimbius ◴[08 Apr 25 20:57 UTC] No.43626383[source]▶

>>43624888 (OP) #

i refuse to believe a company that wasted $320 million dollars on "the electric state" could ever manage to do anything correctly. stripe the parking lot? stock the breakroom? clean the toilets? simply not possible.

replies(3): >>43627144 #>>43627314 #>>43628089 #

15. faitswulff ◴[08 Apr 25 20:58 UTC] No.43626388{3}[source]▶

>>43625773 #

I can only remember one major outage from them in the past ~10 years (in the 2020s, not the 2012 outage), and if I recall correctly, it was fixed in short order...and they never released a postmortem

16. Hikikomori ◴[08 Apr 25 20:59 UTC] No.43626400[source]▶

>>43626119 #

Because vanilla flowlogs that you get from VPC/TGW are nearly useless outside the most basic use cases. All you get is how many bytes and which tcp flags were seen per connection per 10 minutes. Then you need to attribute ip addresses to actual resources yourself separately, which isn't simple when you have containers or k8s service networking.

Doing it with eBPF on end hosts you can get the same data, but you can attribute it directly as you know which container it originates from, snoop dns, then you can get extremely useful metrics like per tcp connection ack delay and retransmissions, etc.

AWS recently released Cloudwatch Network Monitoring that also uses an agent with eBPF, but its almost like a children's toy compared to something like Datadog NPM. I was working on a solution similar to Netflix's when NPM was released, was no point after that.

replies(1): >>43627832 #

17. itishappy ◴[08 Apr 25 21:00 UTC] No.43626417{5}[source]▶

>>43626085 #

Do you use something different? Anecdotally, I find Disney+ to be a major divider. Friends of mine have kids and Disney+ and/or they're still using a decade-old Netflix subscription.

replies(1): >>43627317 #

18. silisili ◴[08 Apr 25 21:20 UTC] No.43626597{3}[source]▶

>>43625773 #

Generally I'd agree, but you must not have seen their attempts at live events :(.

19. DadBase ◴[08 Apr 25 21:52 UTC] No.43626840[source]▶

>>43626119 #

I recall a time when we managed network flows by manually parsing /proc/net/tcp and correlating PIDs with netstat outputs. eBPF? Sounds like a fancy way to avoid good old-fashioned elbow grease.

20. slt2021 ◴[08 Apr 25 22:11 UTC] No.43626982[source]▶

>>43624888 (OP) #

Question to the Netflix folks: I saw a lot of in-house developed tools being quoted, do you guys have service mesh like linkerd ?

Have you guys evaluated vendors like Kentik?

I would love to get more insight into what do you guys actually do with flow logs? for example if I store 1 TB of flow logs, what value can I actually derive from them that justify the cost of collection, processing, and storage.

replies(2): >>43627581 #>>43628881 #

21. ZeWaka ◴[08 Apr 25 22:35 UTC] No.43627144[source]▶

>>43626383 #

Beautiful art and book, but what a unfaithful travesty of a production that absolutely trodded on the original work.

22. r3tr0 ◴[08 Apr 25 22:48 UTC] No.43627227[source]▶

>>43624888 (OP) #

we are working on a similar product that is eBPF powered and can extract flow logs:

https://yeet.cx

23. autoexec ◴[08 Apr 25 23:02 UTC] No.43627314[source]▶

>>43626383 #

Badly managed companies often hire some very good talent which can allow them to sometimes do very impressive things in spite of themselves.

24. steve_adams_86 ◴[08 Apr 25 23:02 UTC] No.43627317{6}[source]▶

>>43626417 #

We use Apple TV a small amount (for Silo and Severance most recently), and Disney+ slightly more for the kids. We occasionally use YouTube.

We got a promotion for Disney+ and probably won't renew it. By the time it's over, it seems like there won't be much worth watching left on there. The kids already have a hard time finding anything they're interested in watching.

25. ciupicri ◴[08 Apr 25 23:44 UTC] No.43627557{4}[source]▶

>>43626160 #

I also have an old TV and guess what? Netflix stopped working last year. The application is not supported anymore. Beats me why.

replies(1): >>43627583 #

26. SoftTalker ◴[08 Apr 25 23:45 UTC] No.43627565{4}[source]▶

>>43625854 #

Canceled mine a few months ago as I never used it.

27. retiredpapaya ◴[08 Apr 25 23:49 UTC] No.43627581[source]▶

>>43626982 #

I think Netflix does use an Envoy-based Service Mesh [1], and they roll their own control plane.

https://netflixtechblog.com/zero-configuration-service-mesh-...

replies(1): >>43627703 #

28. acdha ◴[08 Apr 25 23:49 UTC] No.43627583{5}[source]▶

>>43627557 #

Often it’s CA certificates expiring. My old Toshiba had app rot set in like that, where after about 5 years none of the built in apps worked any more and the errors appeared to be TLS related. I suspect that was due to pinned certs to prevent MITM pirating.

29. toomuchtodo ◴[09 Apr 25 00:01 UTC] No.43627654{3}[source]▶

>>43625773 #

They also performed work to ensure it performed well for Starlink customers.

A Global Perspective on the Past, Present, and Future of Video Streaming over Starlink - https://dl.acm.org/doi/10.1145/3700412 | https://doi.org/10.1145/3700412

30. slt2021 ◴[09 Apr 25 00:09 UTC] No.43627703{3}[source]▶

>>43627581 #

If the goal of gathering and attributing VPC flows is to have a workload granularity flow logs, then imho gathering mesh level logs is more direct and atraight forward approach, because mesh(and workload orchestrator) are uniquely qualified to know when workload A is running on a host X and is trying to connect to workload B.

Looking at Envoy access logs for example is more straightforward and simple aplroach, than running distributed ebpf and memory intensive large spark streaming job

replies(1): >>43627867 #

31. __turbobrew__ ◴[09 Apr 25 00:10 UTC] No.43627708[source]▶

>>43624888 (OP) #

Maybe Im missing something but can’t you run workloads in separate network namespaces and then attach a bpf probe to the veth interface in the namespace? At that point you know all flows on that veth are from a specific workload as long as you keep track of what is running in which network namespaces?

I wonder if it is possible with ipv6 to never (or you roll through the addresses so reuse is temporally distant) re use addresses which removes the problems with staleness and false attribution.

replies(1): >>43627904 #

32. ndriscoll ◴[09 Apr 25 00:25 UTC] No.43627779{3}[source]▶

>>43625773 #

Not that this detracts from the wider point, but I'd expect unpause to just work unless you go out of your way to make it not work. Even if you drop the connection at some point, afaik they use ~15 Mb/s as their "premium" bitrate, so e.g. a 30 s buffer takes less than 64 MB. That gives plenty of time to re-establish streaming after an unpause. It's not like the computer forgets what it was doing if you leave it alone.

replies(1): >>43635988 #

33. nptr ◴[09 Apr 25 00:38 UTC] No.43627832{3}[source]▶

>>43626400 #

This is spot on. The AWS logs can also be orders of magnitude more expensive.

34. nptr ◴[09 Apr 25 00:44 UTC] No.43627867{4}[source]▶

>>43627703 #

The blog post mentioned that "The eBPF flow logs provide a comprehensive view of service topology and network health across Netflix’s extensive microservices fleet, regardless of the programming language, RPC mechanism, or application-layer protocol used by individual workloads."

Service mesh may have restrictions on the network protocols and may not cover all network traffic (like connections to Kafka and databases).

35. VaiTheJice ◴[09 Apr 25 00:53 UTC] No.43627904[source]▶

>>43627708 #

I think thats pretty reasonable tbf and probably at a more 'simpler' scale and i use simple loosely because Netflix’s container runtime is Titus, which is more bare metal oriented than, say, Kubernetes. It doesn’t always isolate workloads as cleanly in separate netns per container, especially for network optimisation purposes like IPv6-to-IPv4 sharing.

"I wonder if it is possible with ipv6 to never... re use addresses which removes the problems with staleness and false attribution."

Most VPCs (also AWS) don’t currently support "true" IPv6 scaleout behavior. Buttt!! if IPs were truly immutable and unique per workload, attribution becomes trivial. It’s just not yet realistic... maybe something to explore with the lads?

replies(1): >>43628269 #

36. mmckeen ◴[09 Apr 25 01:05 UTC] No.43627957[source]▶

>>43624888 (OP) #

https://retina.sh/ is a similar open source tool for Kubernetes.

It's early and has some bugs but seems promising.

replies(1): >>43629264 #

37. meltyness ◴[09 Apr 25 01:09 UTC] No.43627980[source]▶

>>43624888 (OP) #

I wonder how much of Netflix infra is on AWS. Feels like building a castle on someone else's kingdom at that scale; in light of the Prime Video investment, and I guess twitch too.

replies(2): >>43628048 #>>43628179 #

38. r3trohack3r ◴[09 Apr 25 01:24 UTC] No.43628048[source]▶

>>43627980 #

Netflix serves nearly all of its video from a server down the street from you via its OpenConnect infrastructure. AWS only hosts its microservice graph that does stuff like determining which videos and qualities you should be offered.

That being said, its core product has been nearly comoditized. When Netflix entered the market, delivering long form high quality video over the public internet was nascent. Now everyone and their grandma can spin up a video streaming service from a vendor.

replies(3): >>43628072 #>>43629429 #>>43631322 #

39. meltyness ◴[09 Apr 25 01:27 UTC] No.43628072{3}[source]▶

>>43628048 #

I was aware of this, I think a talk about a performance regression on the BSD variant these appliances run was up here recently.

I mean I guess a similar argument holds for colocating with telecoms that would have recently been cable or IPTV providers.

It'd be pretty tough to design around though if any of their caching infrastructure reveals viewership, or engagement data to intermediaries.

40. jandrese ◴[09 Apr 25 01:31 UTC] No.43628089[source]▶

>>43626383 #

The scriptwriters aren't managing the network flows across the backend of their infrastructure. Netflix's trouble with scripts doesn't affect their ability to move bits around.

41. ilrwbwrkhv ◴[09 Apr 25 01:48 UTC] No.43628179[source]▶

>>43627980 #

And more importantly none of this is required. Pornhub pushes more video data on far more unreliable connections without any of this madness. This is purely a play to get the tech company multiple. Nothing else. This is Wework style coverup.

replies(2): >>43628450 #>>43628782 #

42. __turbobrew__ ◴[09 Apr 25 02:04 UTC] No.43628269{3}[source]▶

>>43627904 #

Makes sense, I have worked in and around CNI stuff for k8s and generally netns+veth is how most of them work. That being said we run k8s on bare metal, there isn’t any reason why running things on bare metal excludes netns usage.

> Most VPCs (also AWS) don’t currently support "true" IPv6 scaleout behavior.

Thats a shame.

> if IPs were truly immutable and unique per workload, attribution becomes trivial

I would like to see that. IPAM for multi-tenant workloads always felt like a kludge. You need the network to understand how to route to a workloads, but the network when running on ipv4 has many more workloads than addresses. If you assign immutable addresses per workload (or say it takes you a month to chew through your ipv6 address space) it makes it so the network natively knows how to route to workloads without the need to kludge with IP reassignments.

I have had to deal with IP address pools being exhausted due to high pod churn in EC2 a number of times and it is always a pain.

replies(1): >>43628429 #

43. VaiTheJice ◴[09 Apr 25 02:43 UTC] No.43628429{4}[source]▶

>>43628269 #

Ahh! Nothing like watching pods fail to schedule because you ran out of assignable IPs in a subnet you thought was generous.

Immutable addressing per workload with IPv6 feels like such a clean mental model, especially for attribution, routing, and observability.

Curious if you have seen anyone pull that off cleanly in production, like truly immutable addressing at scale? Curious if it’s been battle tested somewhere or still mostly an ideal.

replies(1): >>43628742 #

44. rpmisms ◴[09 Apr 25 02:48 UTC] No.43628450{3}[source]▶

>>43628179 #

Back to basics.

45. __turbobrew__ ◴[09 Apr 25 03:51 UTC] No.43628742{5}[source]▶

>>43628429 #

Unfortunately my place is still stuck on ipv4.

Hypothetically it is not hard, you split your ipv6 prefix per datacenter. Then you use etcd to coordinate access to the ipv6 pool to hand out immutable addresses. You just start from the lowest address and go to the highest address. If you get to the highest address you go back to the lowest address, as long as your churn is not too high and your pool is big enough you should only wrap addresses far enough apart in time that address reuse doesn't cause any problems with false attribution.

In the etcd store you can just store KV pairs of ipv6 -> workload ID. If you really want to be fancy you can watch those KV pairs using etcd clients and get live updates of new addresses being assigned to workloads. You can plug these updates into your system of choice which needs to map ipv6 to workload such as network flow tools.

Unless you are doing something insane, you should easily be able to keep up with immutable address requests with a DC local etcd quorum.

46. adrianN ◴[09 Apr 25 04:03 UTC] No.43628782{3}[source]▶

>>43628179 #

People might be more tolerant to quality issues for pornhub‘s content than Netflix‘s.

replies(1): >>43629026 #

47. madduci ◴[09 Apr 25 04:29 UTC] No.43628881[source]▶

>>43626982 #

Exactly my thought. Maybe it's the "not invented here" syndrome?

We use Istio as Service Mesh and get the same result, using the same architecture as shown in the blog post (especially the part where each workload has a sidecar container running Flow).

replies(1): >>43629391 #

48. thewisenerd ◴[09 Apr 25 04:37 UTC] No.43628908[source]▶

>>43626335 #

i mean.. from your blog post linked in the repo; this isn't eBPF based?

https://clickhouse.com/blog/kubenetmon-open-sourced

the data collection method says: "conntrack with nf_conntrack_acct"

49. philsnow ◴[09 Apr 25 05:00 UTC] No.43629014[source]▶

>>43624888 (OP) #

Is it necessary to rely on ip address attribution? If FlowExporter uses ebpf and tcp tracepoints, could each workloads be placed in its own cgroup and could FlowExporter directly introspect which cgroup (and thus, workload) a given tcp socket event should be attributed to?

replies(1): >>43634617 #

50. usui ◴[09 Apr 25 05:02 UTC] No.43629026{4}[source]▶

>>43628782 #

Why might that be?

51. roboben ◴[09 Apr 25 05:51 UTC] No.43629264[source]▶

>>43627957 #

Tried it, had some issue, opened a bug report, no response. I think it is dead.

52. moqizhengz ◴[09 Apr 25 06:16 UTC] No.43629391{3}[source]▶

>>43628881 #

From my experience in big tech, another reason is that OPS guys just cant resist the concept of eBPF, go all the way done trying to figure out what this beautiful technology can do and forgot what thery really wanted at the begining.

53. scyzoryk_xyz ◴[09 Apr 25 06:26 UTC] No.43629429{3}[source]▶

>>43628048 #

Any chance you would be able to point me to a good source or article describing/explaining the first half of your comment? I.e. someone getting into the nuts and bolts?

Netflix is a platform - their strategic advantage is in their content sourcing and development pipeline which is fed the unique insights on audience preferences. This is distributed with recommendation algorithms and UX. It could be argued, like someone also already pointed out, that this infra aspect is a commodity at this point.

replies(1): >>43630246 #

54. miyuru ◴[09 Apr 25 09:06 UTC] No.43630246{4}[source]▶

>>43629429 #

https://openconnect.netflix.com/en/

there are attempts to serve high bandwidth throughput, I think the last update was below.

https://news.ycombinator.com/item?id=40329303

55. myvoiceismypass ◴[09 Apr 25 12:30 UTC] No.43631322{3}[source]▶

>>43628048 #

AWS also hosts all of Netflix’ internal apps (which probably dwarfs the amount of actual public facing stuff)

56. nptr ◴[09 Apr 25 17:13 UTC] No.43634617[source]▶

>>43629014 #

That may help identify the local IPs but not the remote IPs.

57. ASalazarMX ◴[09 Apr 25 18:49 UTC] No.43635988{4}[source]▶

>>43627779 #

Counterpoint: Plex and Jellyfin free resources if you leave your video paused too long, and it will take a noticeably amount of time to resume streaming, much more if it needs transcoding.

They're not going out of their way to annoy us, they try to be efficient with the finite resources a home server has. Netflix is going out of their way to make it smooth no matter what you do, even if they have to pool a bit of their own resources for it.

replies(1): >>43659807 #

58. ndriscoll ◴[11 Apr 25 23:26 UTC] No.43659807{5}[source]▶

>>43635988 #

Buffering would be on the client though. Assuming it has a couple dozen MB of memory, it should be able to buffer like 30 seconds. I realize their resume has more to do, but e.g. my jellyfin server can initiate playback or seek within maybe 100-250 ms (it's just barely a noticable pause after a random seek). So a 30 s buffer should be more than sufficient for unpausing without any stutter.

↑