"In the case of timelines, each “page” of followers is 10,000 users large and each “page” must be fanned out before we fetch the next page. This means that our slowest writes will hold up the fetching and Fanout of the next page."
Basically means that they block on each page, process all the items on the page, and then move on to the next page. Why wouldn't you rather decouple page fetcher and the processing of the pages?
A page fetching activity should be able to continuously keep fetching further set of followers one after another and should not wait for each of the items in the page to be updated to continue.
Something that comes to mind would be to have a fetcher component that fetches pages, stores each page in S3 and publishes the metadata (content) and the S3 location to a queue (SQS) that can be consumed by timeline publishers which can scale independently based on load. You can control the concurrency in this system much better, and you could also partition based on the shards with another system like Kafka by utilizing the shards as keys in the queue to even "slow down" the work without having to effectively drop tweets from timelines (timelines are eventually consistent regardless).
I feel like I'm missing something and there's a valid reason to do it this way.