Use One Big Server (2022)

1. synack ◴[01 Sep 25 07:36 UTC] No.45090405[source]▶

The complexity you introduce trying to achieve 100% uptime will often undermine that goal. Most businesses can tolerate an hour or two of downtime or data loss occasionally. If you set this expectation early on, you can engineer a much simpler system. Simpler systems are more reliable.

replies(2): >>45090440 #>>45094345 #

2. hvb2 ◴[01 Sep 25 07:40 UTC] No.45090440[source]▶

>>45090405 (TP) #

Much less expensive too.

I think in general that expectation is NOT acceptable though especially around data loss. Because the non engineering stakeholders don't believe it is.

Engineers don't make decisions in a vacuum, if you can manage the expectations, good for you. But in most cases that's very much an uphill battle which might make you look incompetent because you cannot guarantee no data loss.

3. tgtweak ◴[01 Sep 25 16:50 UTC] No.45094345[source]▶

>>45090405 (TP) #

We had single-datacenter resiliency (meaning n+1 on power, cooling, network + isp, servers) and it was fine. You still need offsite DRS strategy here - this is one of the things having that hybrid cloud is great for: you can replicate your critical workloads like databases and services to the cloud in no-load standby, or delta-copy your backups to a cheap cloud provider for simplified recovery in a disaster scenario (ie: entire datacenter gets taken out). The cost of this is relatively low since data into the cloud is free and you're only really incurring costs in a disaster recovery scenario. Most virtualized platforms (veeam etc) support offsite secondary incremental backups with relative ease, recovery is also pretty straightforward.

That being said I've lost a lot of VMs on ec2 and had entire regions go down in gcp and aws in the last 3 years alone, so going to the public cloud isn't a solves it all solution - knock on wood the colo we've been using hasn't been down once in 12+ years.