←back to thread

698 points jgrahamc | 5 comments | | HN request time: 0s | source
Show context
buildzr ◴[] No.20422952[source]
> Then we moved on to restoring the WAF functionality. Because of the sensitivity of the situation we performed both negative tests (asking ourselves “was it really that particular change that caused the problem?”) and positive tests (verifying the rollback worked) in a single city using a subset of traffic after removing our paying customers’ traffic from that location.

Haha, so the free customers are crash test dummies for providing test traffic. Nice.

I actually don't mind that much, considering it's basically bulletproof DDoS protection for free. I'd much rather "be the product" in this way than in the way ad companies cause at least.

replies(5): >>20423170 #>>20423194 #>>20423767 #>>20424021 #>>20424880 #
pvg ◴[] No.20423194[source]
Or you can say all customers were affected but some localized free-tier customers got the fix first.
replies(2): >>20423357 #>>20423732 #
regnerba ◴[] No.20423357[source]
In this case yes, however they also indicate this is how they do their staged rollouts in general. So if they are releasing any other software update that goes through the staged rollout free customers are tested first. If that change broke something, free customers get that first. Which seems fair to me.
replies(1): >>20425450 #
1. charrondev ◴[] No.20425450[source]
In my experience it’s generally best to roll out changes on testing, staging, and then clients in order of how much they pay, especially if you have SLAs with the highest paying customers.

Impact is generally lower, both to the client, and to your bank account.

replies(1): >>20426500 #
2. Thorrez ◴[] No.20426500[source]
That sounds strange to me. If you introduce a bug then roll back very quickly, it will only affect high paying customers. If you introduce a bug then roll back a while later, it will impact high paying and low paying customers equally. Why would you want this scenario? If you flip it it seems strictly better to me.
replies(1): >>20426591 #
3. pvg ◴[] No.20426591[source]
The idea is that the fix itself is being tested. If you knew your 'rollback' will work for certain, then you'd just deploy it to everyone asap. But since you don't, you test it and as a potential outcome of your test is no fix or making things worse, you don't test it on your highest-value customers. Imagine what your postmortem would read like if your fix made an even bigger mess.
replies(1): >>20426655 #
4. Thorrez ◴[] No.20426655{3}[source]
Oh, I misunderstood you originally. I thought you said rollout from highest to lowest. You're actually saying lowest to highest.
replies(1): >>20426709 #
5. pvg ◴[] No.20426709{4}[source]
I'm just describing what I think the sequence in the postmortem is. They were already in the poop and wanted to test their fix in a real but low-impact way.