We seem to have replaced cooling and power and a grumpy sysadmin with storage and architects and unhappy developers.
We seem to have replaced cooling and power and a grumpy sysadmin with storage and architects and unhappy developers.
Or the electrician doing maintenance on the backup generator doesn't properly connect the bypass and no one notices until he disconnects the generator and the entire DC instantly goes quiet.
Or your DC provisions rack space without knowing which servers are redundant with which other servers, and suddenly when two services go from 10% CPU use to 100% CPU across ten servers the breaker for that circuit gives up entirely and takes down your entire business.
I say “I’m used to” because having things there has spanned more than one job.
One power outage was days to a week. Don’t recall exactly.
It’s possible to do it right.
That would have been a major problem if we'd had a nighttime power outage.
After that we ran regular switchover testing :)
The other time we ran into trouble was after someone drove a car into the local power substation. Our systems all ran fine for the immediate outage, but the power company's short term fix was to re-route power, which caused our voltage to be low enough for our UPS batteries to slowly drain without tripping over to the generator.
That was a week or two of manually pumping diesel into the generator tank so we could keep the UPS batteries topped up.
Area wide power cut, winter afternoon so it was already getting dark. The two signs I knew there was something wrong were that all the lights went out outside, ie other businesses, street lighting etc. And my internet connection stopped working. Nothing else in the DC was affected. Even the elevator was working.
One of these units blew at one point. We had 4 and only needed two running, so no big deal. The company who managed the whole thing (Swiss) came to replace it. Amazing job, they had to put it on small rollers, like industrial roller skates, then embed hooks in the walls at each corridor junction, and slowly winch the thing along, it was like watching the minute hand of a clock.
Then the whole process in reverse to bring in the new one. Was fascinating to watch. The guy in charge was a giant, built like a brick outhouse. They knew their stuff.
The computer, storage, etc. ran off the generator, which first eliminated any risk of power spikes and surges (as the flywheel is a very effective low-pass filter), and the circuits controlling motor speed also ensured the AC frequency was better than the power company supply. This was located in a rural area, so the long power lines with few sinks (customers pulling power) made lightening spike risk spread further, and the rural voltage and frequency fluctuated a lot. Seemed like a really cool system that worked flawlessly in the years I was there.