Splitting engineering teams into defense and offense

(www.greptile.com)

212 points dakshgupta | 2 comments | 14 Oct 24 20:07 UTC | HN request time: 0.425s | source

Show context

jedberg ◴[14 Oct 24 20:51 UTC] No.41841847[source]▶

> this is also a very specific and usually ephemeral situation - a small team running a disproportionately fast growing product in a hyper-competitive and fast-evolving space.

This is basically how we ran things for the reliability team at Netflix. One person was on call for a week at a time. They had to deal with tickets and issues. Everyone else was on backup and only called for a big issue.

The week after you were on call was spent following up on incidents and remediation. But the remaining weeks were for deep work, building new reliability tools.

The tools that allowed us to be resilient enough that being on call for one week straight didn't kill you. :)

replies(1): >>41842151 #

dakshgupta ◴[14 Oct 24 21:20 UTC] No.41842151[source]▶

>>41841847 #

I am surprised and impressed a company at that scale functions like this. We often internally discuss if we can still doing this when we’re 7-8 engineers.

replies(1): >>41842458 #

1. jedberg ◴[14 Oct 24 21:57 UTC] No.41842458[source]▶

>>41842151 #

I think you're looking at it backwards. We were only able to do it because we had so many engineers that we had time to write tools to make the system reliable enough.

On call for a week at a time only really works if you only get paged at night once a week max. If you get paged every night, you will die from sleep deprivation.

replies(1): >>41845266 #

2. dmoy ◴[15 Oct 24 05:38 UTC] No.41845266[source]▶

>>41842458 (TP) #

Moving from 24/7 oncall to 12 hour shifts trading off with another continent is really nice

↑