←back to thread

124 points edent | 2 comments | | HN request time: 0s | source
Show context
ForHackernews ◴[] No.42725944[source]
Unpopular opinion, but I think many systems would benefit from a regular "downtime window". Not everything needs to be 24/7 high availability.

Maybe not every night, but if you get users accustomed to the idea that you're offline for 12 hours every Sunday morning, they will not be angry when you need to be offline for 12 hours on a Sunday morning to do maintenance.

The stock market closes, more things should close. We are paying too high of a price for 99.999% uptime when 99.9% is plenty for most applications.

replies(4): >>42726175 #>>42727059 #>>42727458 #>>42727676 #
kragen ◴[] No.42726175[source]
Basically this happens because the DVLA and the stock market don't have any competition. Customers in a competitive market won't be angry when you need to be offline for 12 hours every Sunday morning; they'll just switch to your competitor some Sunday, because the competitor is providing them something they value that you don't provide.
replies(2): >>42726266 #>>42727383 #
ForHackernews ◴[] No.42726266[source]
Maybe they should regulate Sunday trading hours, or unionized sysadmins should negotiate the end of on-call hours.

The red queen's race that you describe for ever-greater scale, ever-greater availability is an example of the tragedy of the commons. Think how much money and many human minds have been wasted trying to squeeze out that last .0001% of "zero downtime" when they could have been creating something new.

"Keep doing the same thing, but more of it, harder" is a recipe for a barren world of monoculture.

replies(3): >>42726517 #>>42726771 #>>42726990 #
1. abigail95 ◴[] No.42726990[source]
Who is trying to achieve zero downtime? Facebook has degraded service regularly it's just close enough to 99.9 that nobody cares.

If loading my messages times out I just move onto something else and go back a few minutes later.

Surely they have metrics measuring that and don't think it's worth the engineering effort to improve it.

replies(1): >>42727402 #
2. kragen ◴[] No.42727402[source]
One of the interesting things that came out of Google's "SRE" system is that they deliberately add outages if they don't have enough. They learned years ago that if you build a service that promises 99% uptime and deliver 99.99% uptime, other people in the company will come to depend on that 99.99% uptime unintentionally. So they chaos-monkey it to ensure that the inevitable failures aren't catastrophic.