←back to thread

492 points storf45 | 2 comments | | HN request time: 0.416s | source
Show context
grogenaut ◴[] No.42160548[source]
This topic is really just fun for me to read based on where I work and my role.

Live is a lot harder than on demand especially when you can't estimate demand (which I'm sure this was hard to do). People are definitely not understanding that. Then there is that Netflix is well regarded for their engineering not quite to the point of snobbery.

What is actually interesting to me is that they went for an event like this which is very hard to predict as one of their first major forays into live, instead of something that's a lot easier to predict like a baseball game / NFL game.

I have to wonder if part of the NFL allowing Netflix to do the Christmas games was them proving out they could handle live streams at least a month before. The NFL seems to be quite particular (in a good way) about the quality of the delivery of their content so I wouldn't put it past them.

replies(3): >>42160748 #>>42160770 #>>42160867 #
devit ◴[] No.42160867[source]
Why is live a lot harder?

Aside from latency (which isn't much of a problem unless you are competing with TV or some other distribution system), it seems easier than on-demand, since you send the same data to everyone and don't need to handle having a potentially huge library in all datacenters (you have to distribute the data, but that's just like having an extra few users per server).

My guess is that the problem was simply that the number of people viewing Netflix at once in the US was much larger than usual and higher than what they could scale too, or alternatively a software bug was triggered.

replies(4): >>42161026 #>>42161045 #>>42161084 #>>42161376 #
nemothekid ◴[] No.42161084[source]
On demand is easier precisely because having a huge library in all data centers is relatively cheap. In actuality you just have a cache, collocated ISPs that pulls from your origin servers. Likely you have users all watching different things so you can easily avoid hot spots by sharding on the content type. Once the in demand content is in the cache its' relatively easy to serve.

Live content is harder because it can't really be cached, nor, due to TLS, can you really serve everyone the same stream. I think the hardest problem to solve is provisioning. If you are expecting 1 million users, and 700,000 of them get routed to a single server, that server will begin to struggle. This can happen in a couple different ways - for example an ISP who isn't a large consumer normally, suddenly overloads its edge server. Even though your DC can handle the traffic just fine, the links between your DC and the ISP begin to suffer, and since the even is live, it's not like you can just wait until the cache is filled downstream.

replies(1): >>42188367 #
1. pas ◴[] No.42188367[source]
... what do you mean it cannot be cached?

isn't it a tree of cache servers? as origin sends the frames they're cached.

and as load grows the tree has to grow too, and when it cannot resorting to degrading bitrate, and ultimately to load shedding to keep the viewers happy?

and it seems Netflix opted to forego the last one to avoid a the bad PR of an error message of "we are over capacity" and instead went with actually let it burn, no?

replies(1): >>42197487 #
2. nemothekid ◴[] No.42197487[source]
>... what do you mean it cannot be cached?

When I mean "cached", it means that the PoP server can serve content without contacting the origin server. (The PoP can't serve content it does not have).

>and it seems Netflix opted to forego the last one to avoid a the bad PR of an error message of "we are over capacity" and instead went with actually let it burn, no?

Anything other than 100% uptime is bad PR for Netflix.