Cloud is staggeringly expensive compared to your own physical servers. Has been for all but trivial (almost toy) workloads since day 1. And that's before you pay for bandwidth.
I was spending a decent chunk of change monthly on cloud boxes just for my personal hosting projects, and eventually realized I could get a stonking 1U box, colo at a local data center, pay for the server in the savings in a year or two, and have radically more capability in the deal.
If you need a "droplet" type VM, with a gig of RAM, a couple gig of disk, and bandwidth, they're not bad. DigitalOcean works well for that, and is way cheaper on bandwidth than other places (1TB per droplet per month, combined pool). So I'll use that for basic proxy nodes and such.
But if you start wanting lots of RAM (I run, among other things, a Matrix homeserver, and some game servers for friends, so RAM use goes up in a hurry), or disk measured in TB, cloud costs start to go vertical, in a hurry. It's really nice having a box with enough RAM I can just toss RAM at VMs, do offsite backup of "everything," etc.
If you're spending more than a few hundred a month on cloud hosting, it's worth evaluating what a physical box would save you.
//EDIT: By "go vertical," I mean "To get a cloud box with similar specs to what I racked up would be half the cost of the entire 1U, per month."
Sure, we could use physical boxes. But those will go to procurement. The budget will have to be approved. Orders are sent to suppliers. Hardware arrives, it is a colo is not so bad, but it will be installed according to the colo timelines. If it's your own DC you may have staff on hand, or it could very likely be a third party, and now you have to work with them etc etc. It can easily take months for any non trivially sized company. In the meantime, _we need capacity now_ and customers won't wait. I can provision thousands of machines on demand with a simple pull request and they will be online in minutes. And I can do that without exceeding the pre-approved budget, because those machines may not be needed forever; as soon as they are no longer needed, they are destroyed very quickly.
And then, a random machine fails somewhere. Do you have staff to detect and diagnose the problem? I don't care how good your monitoring system is, there are some thorny issues that are difficult to identify and fix without highly specialized staff on board. Staff that you are paying for. Me? I don't care. If a VM somewhere is misbehaving, it is automatically nuked. We don't care why it had issues (unless it's a recurring thing). That happens a few times daily when you have 5 to 6 digit number of machines, and that's either initiated by us when our system detect health check failures, or initiated by AWS (either planned or unplanned maintenance).
Don't think just how much an individual machine costs. It's all the supporting personnel that matters, with their (expensive) specialized skills. Managing one machine is doable (I have a few servers at home). Managing 50k? You have a beefy team now, with many specialized skills. You are probably running more exotic hardware.
You also need to measure apples to apples. You 'disk measured in TB' is a locally attached disk almost certainly. In the cloud, that's likely to be a network attached storage. That _is_ more expensive (try buying something similar for your home lab), but it gives a lot of flexibility, flexibility that may not be necessary in a homelab, but it is certainly needed in larger environments. That's what allows our VMs to be fungible and easily destroyed or recreated, as they themselves don't store any state. That storage will also be more resilient too (AWS EBS backs up snapshots on S3, with 11 nines of durability, and can automatically retrieve blocks if they go bad).
That said, even for large enterprises, the AWS egress costs are extortionate (more so if you use their NAT gateways). And there could be uses for workloads that don't change too much where it might be a good idea to have a hybrid model, and some physical boxes (but please try to not make them pets!).
"Cloud as a workaround for internal corporate dysfunction" is certainly a novel argument for cloud. I'm aware of the OpEx vs CapEx issues at a lot of companies, I just happen to think it's a really stupid reason to spend a lot more money than you otherwise would for some set of capabilities.
> You also need to measure apples to apples. You 'disk measured in TB' is a locally attached disk almost certainly. In the cloud, that's likely to be a network attached storage.
If I want to stuff 2TB of files into somewhere that's not-local, why does it particularly matter to me what the exact technology used for storing them is?
I mean, obviously "cloud" is quite successful, and comes with the ability to be able to say "Not our problem!" when AWS is down for some reason or another. But none of the problems you talk about are new, and all of them were quite well solved 20 years ago by companies running their own hardware. Been there, admin'd that. A four-machine cluster (two web front ends doing the bulk of the compute, two SQL database servers replicating to each other, and some disk storage regularly synced between the two database servers) could handle a staggering amount of traffic when properly tuned. The same is true today, without any of the problems of rotational disk latency. SQL on NVMe solves an awful lot of problems.
But, again, not my money to spend. I just find it baffling that a lot of people today don't even seem to realize that physical servers are still a thing.