https://backlinko.com/netflix-users
I see some slowing growth particularly across 2021/2022 but as of this report (April 2024) they were still growing through 2023.
260M subscribers. They aren't exactly hurting.
It must be great to work for them in infrastructure and backends.
https://ir.netflix.net/investor-news-and-events/financial-re...
I wonder who uses it when I see their reports, because I don’t know them. I probably have a weird group of friends. I’m sure if I asked, some of my coworkers use it.
My kids would love if we used it. Perhaps it’s big among younger people.
i wonder how many hundreds of thousands of dollars network flow logs cost them; obviously at some point it is going to be cheaper to re-implement monitoring in-house.
The agent repo: https://github.com/coroot/coroot-node-agent
Doing it with eBPF on end hosts you can get the same data, but you can attribute it directly as you know which container it originates from, snoop dns, then you can get extremely useful metrics like per tcp connection ack delay and retransmissions, etc.
AWS recently released Cloudwatch Network Monitoring that also uses an agent with eBPF, but its almost like a children's toy compared to something like Datadog NPM. I was working on a solution similar to Netflix's when NPM was released, was no point after that.
Have you guys evaluated vendors like Kentik?
I would love to get more insight into what do you guys actually do with flow logs? for example if I store 1 TB of flow logs, what value can I actually derive from them that justify the cost of collection, processing, and storage.
We got a promotion for Disney+ and probably won't renew it. By the time it's over, it seems like there won't be much worth watching left on there. The kids already have a hard time finding anything they're interested in watching.
https://netflixtechblog.com/zero-configuration-service-mesh-...
A Global Perspective on the Past, Present, and Future of Video Streaming over Starlink - https://dl.acm.org/doi/10.1145/3700412 | https://doi.org/10.1145/3700412
Looking at Envoy access logs for example is more straightforward and simple aplroach, than running distributed ebpf and memory intensive large spark streaming job
I wonder if it is possible with ipv6 to never (or you roll through the addresses so reuse is temporally distant) re use addresses which removes the problems with staleness and false attribution.
Service mesh may have restrictions on the network protocols and may not cover all network traffic (like connections to Kafka and databases).
"I wonder if it is possible with ipv6 to never... re use addresses which removes the problems with staleness and false attribution."
Most VPCs (also AWS) don’t currently support "true" IPv6 scaleout behavior. Buttt!! if IPs were truly immutable and unique per workload, attribution becomes trivial. It’s just not yet realistic... maybe something to explore with the lads?
It's early and has some bugs but seems promising.
That being said, its core product has been nearly comoditized. When Netflix entered the market, delivering long form high quality video over the public internet was nascent. Now everyone and their grandma can spin up a video streaming service from a vendor.
I mean I guess a similar argument holds for colocating with telecoms that would have recently been cable or IPTV providers.
It'd be pretty tough to design around though if any of their caching infrastructure reveals viewership, or engagement data to intermediaries.
> Most VPCs (also AWS) don’t currently support "true" IPv6 scaleout behavior.
Thats a shame.
> if IPs were truly immutable and unique per workload, attribution becomes trivial
I would like to see that. IPAM for multi-tenant workloads always felt like a kludge. You need the network to understand how to route to a workloads, but the network when running on ipv4 has many more workloads than addresses. If you assign immutable addresses per workload (or say it takes you a month to chew through your ipv6 address space) it makes it so the network natively knows how to route to workloads without the need to kludge with IP reassignments.
I have had to deal with IP address pools being exhausted due to high pod churn in EC2 a number of times and it is always a pain.
Immutable addressing per workload with IPv6 feels like such a clean mental model, especially for attribution, routing, and observability.
Curious if you have seen anyone pull that off cleanly in production, like truly immutable addressing at scale? Curious if it’s been battle tested somewhere or still mostly an ideal.
Hypothetically it is not hard, you split your ipv6 prefix per datacenter. Then you use etcd to coordinate access to the ipv6 pool to hand out immutable addresses. You just start from the lowest address and go to the highest address. If you get to the highest address you go back to the lowest address, as long as your churn is not too high and your pool is big enough you should only wrap addresses far enough apart in time that address reuse doesn't cause any problems with false attribution.
In the etcd store you can just store KV pairs of ipv6 -> workload ID. If you really want to be fancy you can watch those KV pairs using etcd clients and get live updates of new addresses being assigned to workloads. You can plug these updates into your system of choice which needs to map ipv6 to workload such as network flow tools.
Unless you are doing something insane, you should easily be able to keep up with immutable address requests with a DC local etcd quorum.
We use Istio as Service Mesh and get the same result, using the same architecture as shown in the blog post (especially the part where each workload has a sidecar container running Flow).
https://clickhouse.com/blog/kubenetmon-open-sourced
the data collection method says: "conntrack with nf_conntrack_acct"
Netflix is a platform - their strategic advantage is in their content sourcing and development pipeline which is fed the unique insights on audience preferences. This is distributed with recommendation algorithms and UX. It could be argued, like someone also already pointed out, that this infra aspect is a commodity at this point.
there are attempts to serve high bandwidth throughput, I think the last update was below.
They're not going out of their way to annoy us, they try to be efficient with the finite resources a home server has. Netflix is going out of their way to make it smooth no matter what you do, even if they have to pool a bit of their own resources for it.