My biggest takeaway was to have my core database tables (user, subscription, etc) backed up every 10 minutes, and the rest every hour, and test their restoration. (When I shut down the site it was 1.2TB.) Having a script to quickly provision a new node—in case I ever needed it—would have something up within 8 minutes from hitting enter.
When I compare this to the startups I’ve consulted for, who choose k8s because it’s what Google uses yet they only push out 1000s of database queries per day with a handful of background jobs and still try to optimize burn, I shake my head.
I’d do it again. Like many of us I don’t have the need for higher-complexity setups. When I did need to scale, I just added more vCPUs and RAM.