More general questions I would consider asking:
1. It appears there was a safe path with more safety and scrutiny, and a fast path with less. In this case, over time, the fast path became routine. Are there other places where this pattern could develop or has already developed? Is this tradeoff between speed and scrutiny actually necessary? (ie could you have urgent updates reach production faster but actually receive more scrutiny/more testing, even if that happens after the fact?)
2. In a similar vein, if the system has a failsafe configuration (eg only changes that have passed the full barnyard, or configurations that have been running safely for more than a certain amount of time), would it be plausible to automatically roll servers back to that configuration if they remain unresponsive for a certain amount of time?
3. It seems as though there are multiple points (big WAF refactor, credential expiry, internal services dependent on working prod) where a sufficiently cynical engineer would say "I bet there's something here that could, if not bring down the site, at least ruin someone's day". Is there a suitable voice for this kind of cynicism? Eg, a red team or similar? If you were Murphy's Law incarnate, messing with Cloudflare's systems to achieve maximum mischief, where would you start?
4. I get the sense that there are many reliable and well-tested layers of safety, but is it common to test what happens if they fail anyway? Eg: let's pretend Cloudflare just got knocked out globally by a wizard spell, what do we do? Or let's say our staged rollout system gets completely bypassed because of solar flares, how bad is it? Beyond developing a procedure or training for these kinds of situations, are they actively simulated or practiced?
If anything, I'd guess the root root cause here is a success failure, where the system has been so reliable for so long that the main reactions to it failing are disbelief and unpreparedness. I'm sure it wasn't funny at the time, but it gives me a chuckle to imagine the SREs speculating about Mossad quantum-tunnelling 0days or something because the idea of everything falling over on its own is so unthinkable. Meanwhile, those of us without so many 9s would jump straight to "I probably broke it again."