Unfortunately very few actually think about failure modes, set realistic targets, and actually test the process. Everyone thinks they need 100% uptime and consistency, few actually achieve it in practice (many think they do, but when shit hits the fan it uncovers an edge-case they haven't thought of), but it turns out that in most cases it doesn't matter and they could've saved themselves a lot of trouble and complexity.
I'd github can afford the amount of downtime they do, it's likely that your business can afford 15 minutes of downtime every once in a while due to a failing server.
Also, the less servers you have overall, the least common a failure will be.
Backups and cold failover server are mandatory, but anything past that should be weighted on a rational cost/benefit analysis, and for most people the cost/benefit ratio just isn't enough to justify infrastructure complexity.