←back to thread

242 points panrobo | 1 comments | | HN request time: 0.468s | source
Show context
kjellsbells ◴[] No.42055342[source]
Kjell's Law: the cost of a platform eventually exceeds the cost of the one it replaced. But each cost is in a different budget.

We seem to have replaced cooling and power and a grumpy sysadmin with storage and architects and unhappy developers.

replies(2): >>42055387 #>>42055481 #
jimt1234 ◴[] No.42055481[source]
I've never worked in a data center that did cooling and power correctly. Everyone thinks they're doing it right, and then street power gets cut - there's significant impact, ops teams scramble to contain, and finally there's the finger-pointing.
replies(5): >>42055551 #>>42055711 #>>42056128 #>>42056883 #>>42057945 #
1. jms ◴[] No.42056883[source]
The first time we tested cutting the power back in the day, the backup generator didn't fire! Turns out someone had pushed the big red stop button, which remains pushed in until reset.

That would have been a major problem if we'd had a nighttime power outage.

After that we ran regular switchover testing :)

The other time we ran into trouble was after someone drove a car into the local power substation. Our systems all ran fine for the immediate outage, but the power company's short term fix was to re-route power, which caused our voltage to be low enough for our UPS batteries to slowly drain without tripping over to the generator.

That was a week or two of manually pumping diesel into the generator tank so we could keep the UPS batteries topped up.