←back to thread

208 points henrijn | 10 comments | | HN request time: 0.426s | source | bottom
Show context
ksenzee ◴[] No.42160131[source]
The infrastructure has been rock-solid. I’ve never seen a service grow this fast without any noticeable outages. The architecture and execution are obviously informed by serious experience at Twitter, but it’s also clear that management is giving them everything they need to do the job right.
replies(4): >>42160170 #>>42160324 #>>42160542 #>>42163517 #
1. nicce ◴[] No.42160324[source]
> The infrastructure has been rock-solid. I’ve never seen a service grow this fast without any noticeable outages. The architecture and execution are obviously informed by serious experience at Twitter, but it’s also clear that management is giving them everything they need to do the job right.

The world has changed quite bit. If you have deep pockets and you can use AWS etc., it isn't a major problem anymore. However, if they indeed run it on their own data centers, that is impressive.

replies(4): >>42160426 #>>42160457 #>>42160493 #>>42161085 #
2. dangus ◴[] No.42160426[source]
Deciding to use a cloud provider or not has very little do to with the quality of the application and infrastructure architecture.

You can make an app that scales beautifully on AWS or you can make one that chokes.

3. xboxnolifes ◴[] No.42160457[source]
I'm not so sure about this when not even 24 hours ago Netflix has streaming issues running on AWS infrastructure, and existing social media sites still have outages.
replies(2): >>42160934 #>>42181090 #
4. crazygringo ◴[] No.42160493[source]
> it isn't a major problem anymore

This is not true at all. The hard part isn't cloud vs. on-premise, it's the architecture.

Most sites can either put all their data in a single massive database, or else have an obvious dimension to shard by (e.g. by user ID if users mostly interact with their own data).

But sites where the data is many-to-many and there's a firehose of writes, of which Twitter is a prime example, are a nightmare to scale while remaining performant and reliable. Every single user gets an updated live feed of tweets drawn from every other user -- handling millions of users simultaneously is not easy.

replies(2): >>42160884 #>>42181045 #
5. nicce ◴[] No.42160884[source]
> But sites where the data is many-to-many and there's a firehose of writes, of which Twitter is a prime example, are a nightmare to scale while remaining performant and reliable. Every single user gets an updated live feed of tweets drawn from every other user -- handling millions of users simultaneously is not easy.

It is definitely not easy. But the core problem of this has been discussed since the release of Facebook. There are very obvious architectures which you can follow and then fix the bottlenecks with money. The cost is still the most relevant problem, which I wanted to say. The current cloud enables much higher optimization threshold and error margin.

replies(1): >>42161231 #
6. nicce ◴[] No.42160934[source]
> Netflix has streaming issues running on AWS infrastructure

I think the performance of Netflix is highly dependent of ISP's data centers [1].

But yeah, there are still limits where the cloud won't help you.

If you whole infrastructure is designed to serve "historical" content instead of streaming, some bottlenecks cannot be avoided if you want to server low-latency sports stream. This came by surprise for me, but apparently betting has still significant role for the viewers.

[1]: https://openconnect.netflix.com/en/

7. ksenzee ◴[] No.42161085[source]
They moved from AWS to on-prem sometime in the last year or two: https://newsletter.pragmaticengineer.com/p/bluesky

And I don't care how many resources you have available to throw at it, plenty of sites would still fall over with the kind of growth they're having.

8. lazystar ◴[] No.42161231{3}[source]
bingo. this problem is known and will soon be offered as a fully managed service - 1 click to have your own private social network.
9. vidarh ◴[] No.42181045[source]
> Every single user gets an updated live feed of tweets drawn from every other user -- handling millions of users simultaneously is not easy.

This is a trivial approach, which works but is suboptimal (you can cut down on the IO with various optimisations):

Shard by id. Treat it as messages queues. Think e-mail, with a lookup from public id -> internal id @ shard.

Then, additionally, every account that gets more than n followers are sharded into "sub-accounts", where posts to the main account are transparently "reposted" to the sub-accounts, just like simple mailing list reflector.

(the first obvious optimization to this is to drop propagation of posts from accounts that will hit a large proportion of users, and weave those into timelines on read/generation instead of writing them to each user; second is to drop propagation of posts to accounts that have not been accessed for a while, and instead take the expensive generation step of pulling posts to create the timeline next time they log in; there are many more, but even with the naive approach outlined above this is a solved problem)

10. vidarh ◴[] No.42181090[source]
Netflix runs core services in AWS, but streaming goes via Open Connect, which involves appliances in the networks of major ISPs.

The Open Connect Wikipedia page [1] currently claims 8,000+ Open Connect Appliances at more than 1,000 ISPs as of 2021, and OCA's in over 52 interchange points.

Netflix is shuffling data at a scale nobody outside maybe a dozen other companies globally needs to deal with, and I doubt any of the social media sites come close.

[1] https://en.wikipedia.org/wiki/Open_Connect