←back to thread

Use One Big Server (2022)

(specbranch.com)
343 points antov825 | 4 comments | | HN request time: 0.723s | source
1. dang ◴[] No.45086225[source]
HN uses two—one live and one backup, so we can fail over if there's a hardware issue or we need to upgrade something.

It's a nice pattern. Just don't make them clones of each other, or they might go BLAM at the same time!

https://news.ycombinator.com/item?id=32049205

https://news.ycombinator.com/item?id=32032235

https://news.ycombinator.com/item?id=32028511 (<-- this is where it got figured out)

---

Edit: both these points are mentioned in the OP.

replies(2): >>45087078 #>>45093270 #
2. bpye ◴[] No.45087078[source]
Whilst not as fatal as a failing SSD, AMD also had a fun errata where a CPU core would hang in CC6 after ~1044 days.

https://www.servethehome.com/amd-epyc-7002-rome-cpus-hang-af...

3. d_burfoot ◴[] No.45093270[source]
Any stats on HN downtime over the years? I remember one or two outages in the last decade or so, but I would guess the uptime is about 99.99%.
replies(1): >>45094530 #
4. dang ◴[] No.45094530[source]
We don't specifically track that, no. The worst one was when we went down for (IIRC) a couple days because of a disk failure, I think in Jan 2014. It was after that that we added a failover box.

HN goes down when we restart the server process, usually as part of updating the code - but only for a few seconds. The message "Restarting the server. Shouldn't take long." displays when that is happening.

There are also, to my exasperation, still moments of brownout during certain traffic spikes or moments of obscure resource contention. But these are at least rarer than they used to be.