The world has changed quite bit. If you have deep pockets and you can use AWS etc., it isn't a major problem anymore. However, if they indeed run it on their own data centers, that is impressive.
This is not true at all. The hard part isn't cloud vs. on-premise, it's the architecture.
Most sites can either put all their data in a single massive database, or else have an obvious dimension to shard by (e.g. by user ID if users mostly interact with their own data).
But sites where the data is many-to-many and there's a firehose of writes, of which Twitter is a prime example, are a nightmare to scale while remaining performant and reliable. Every single user gets an updated live feed of tweets drawn from every other user -- handling millions of users simultaneously is not easy.
It is definitely not easy. But the core problem of this has been discussed since the release of Facebook. There are very obvious architectures which you can follow and then fix the bottlenecks with money. The cost is still the most relevant problem, which I wanted to say. The current cloud enables much higher optimization threshold and error margin.