> this is also a very specific and usually ephemeral situation - a small team running a disproportionately fast growing product in a hyper-competitive and fast-evolving space.
This is basically how we ran things for the reliability team at Netflix. One person was on call for a week at a time. They had to deal with tickets and issues. Everyone else was on backup and only called for a big issue.
The week after you were on call was spent following up on incidents and remediation. But the remaining weeks were for deep work, building new reliability tools.
The tools that allowed us to be resilient enough that being on call for one week straight didn't kill you. :)
replies(1):