Use One Big Server (2022)

1. garganzol ◴[31 Aug 25 19:13 UTC] No.45086114[source]▶

And then boom, all your services are gone due to a pesky capacitor on the motherboard. Also good luck trying to change even one software component of that monolith without disrupting and jeopardizing the whole operation.

While it is a useful advice to some people in certain conditions, it should be taken with a grain of salt.

replies(1): >>45086204 #

2. fragmede ◴[31 Aug 25 19:21 UTC] No.45086204[source]▶

>>45086114 (TP) #

That capacitor thing hasn't been true since the 90's.

replies(2): >>45086318 #>>45086354 #

3. icedchai ◴[31 Aug 25 19:33 UTC] No.45086318[source]▶

>>45086204 #

Capacitor problem or not, hardware does fail. Power supplies crap out. SSDs die in strange ways. A failure of a supposedly "redundant" SSD might cause your system to freeze up.

replies(1): >>45089801 #

4. garganzol ◴[31 Aug 25 19:38 UTC] No.45086354[source]▶

>>45086204 #

Hardware still fails. It isn't a question of "if", it's a question of "when". Nothing lasts forever, the naivety lasts only so long too.

replies(1): >>45090129 #

5. mannyv ◴[01 Sep 25 05:46 UTC] No.45089801{3}[source]▶

>>45086318 #

One thing that we ran into back in the day was EEC failure on reboot.

We had a few Dell servers that ran great for a year or two. We rebooted one for some reason or another and it refused to POST due to an EEC failure.

Hauled down to the colo at 3AM and ripped the fucking ram out of the box and hoped it would restart.

Hardware fails. The RAM was fine for years, but something happened to it. Even Dell had no idea and just shipped us another stick, which we stuck in at the next downtime window.

To top it off, we dropped the failing RAM into another box at the office and it worked fine. <shrug>.

6. fragmede ◴[01 Sep 25 06:45 UTC] No.45090129{3}[source]▶

>>45086354 #

Obviously. But you get duplicate hardware, set up HA, get vendor support contracts, use multiple colors in disparit location. Cloud providers have figured this out fairly well, as we all did in the aughts. (Well, some of us anyway.) You can definitely determine that's a bunch of really annoying work and just pay a cloud provider to deal with it, or not, and go your own way. But if you want to be credible when saying that hardware fails, maybe people shouldn't use a problem from three decades ago as their example and use anything more recent?

replies(1): >>45091600 #

7. garganzol ◴[01 Sep 25 10:55 UTC] No.45091600{4}[source]▶

>>45090129 #

When you get duplicate hardware, it is not "One Big Server" anymore. "Two Big Servers" at least. In September 2015, the failure rate caused by capacitors was still around 30% [1].

[1] https://www.researchgate.net/figure/Failure-rates-of-differe...

replies(1): >>45092804 #

8. fragmede ◴[01 Sep 25 14:09 UTC] No.45092804{5}[source]▶

>>45091600 #

Respectfully, 2015 was still a decade ago.