Most active commenters

seabrookmx(4)
bmicraft(3)

SSDs have become fast, except in the cloud

(databasearchitects.blogspot.com)

Show context

c0l0 ◴[20 Feb 24 17:37 UTC] No.39444187[source]▶

Seeing the really just puny "provisioned IOPS" numbers on hugely expensive cloud instances made me chuckle (first in disbelief, then in horror) when I joined a "cloud-first" enterprise shop in 2020 (having come from a company that hosted their own hardware at a colo).

It's no wonder that many people nowadays, esp. those who are so young that they've never experienced anything but cloud instances, seem to have little idea of how much performance you can actually pack in just one or two RUs today. Ultra-fast (I'm not parroting some marketing speak here - I just take a look at IOPS numbers, and compare them to those from highest-end storage some 10-12 years ago) NVMe storage is a big part of that astonishing magic.

replies(3): >>39448208 #>>39448367 #>>39449930 #

1. Aurornis ◴[20 Feb 24 23:12 UTC] No.39448208[source]▶

>>39444187 #

> It's no wonder that many people nowadays, esp. those who are so young that they've never experienced anything but cloud instances, seem to have little idea of how much performance you can actually pack in just one or two RUs today.

On the contrary, young people often show up having learned on their super fast Apple SSD or a top of the line gaming machine with NVMe SSD.

Many know what hardware can do. There’s no need to dunk on young people.

Anyway, the cloud performance realities are well know to anyone who works in cloud performance. It’s part of the game and it’s learned by anyone scaling a system. It doesn’t really matter what you could do if you build a couple RUs yourself and hauled them down to the data center, because beyond simple single-purpose applications with flexible uptime requirements, that’s not a realistic option.

replies(2): >>39448623 #>>39449212 #

2. zten ◴[21 Feb 24 00:07 UTC] No.39448623[source]▶

>>39448208 (TP) #

> On the contrary, young people often show up having learned on their super fast Apple SSD or a top of the line gaming machine with NVMe SSD.

Yes, this is often a big surprise. You can test out some disk-heavy app locally on your laptop and observe decent performance, and then have your day completely ruined when you provision a slice of an NVMe SSD instance type (like, i4i.2xlarge) and discover you're only paying for SATA SSD performance.

replies(1): >>39450654 #

3. EB66 ◴[21 Feb 24 01:33 UTC] No.39449212[source]▶

>>39448208 (TP) #

> because beyond simple single-purpose applications with flexible uptime requirements, that’s not a realistic option.

I frequently hear this point expressed in cloud vs colo debates. The notion that you can't achieve high availability with simple colo deploys is just nonsense.

Two colo deploys in two geographically distinct datacenters, two active physical servers with identical builds (RAIDed drives, dual NICs, A+B power) in both datacenters, a third server racked up just sitting as a cold spare, pick your favorite container orchestration scheme, rig up your database replication, script the database failover activation process, add HAProxy (or use whatever built-in scheme your orchestration system offers), sprinkle in a cloud service for DNS load balancing/failover (Cloudflare or AWS Route 53), automate and store backups off-site and you're done.

Yes it's a lot of work, but so is configuring a similar level of redundancy and high availability in AWS. I've done it both ways and I prefer the bare metal colo approach. With colo you get vastly more bang for your buck and when things go wrong, you have a greater ability to get hands on, understand exactly what's going on and fix it immediately.

replies(1): >>39452305 #

4. seabrookmx ◴[21 Feb 24 06:05 UTC] No.39450654[source]▶

>>39448623 #

This doesn't stop at SSD's.

Spin up an E2 VM in Google Cloud and there's a good chance you'll get a nearly 9 year Broadwell architecture chip running your workload!

replies(1): >>39477637 #

5. joshstrange ◴[21 Feb 24 10:51 UTC] No.39452305[source]▶

>>39449212 #

I doubt you’ll find anyone who disagrees that colo is much cheaper and that it’s possible to have failover with little to no downtime. Same with higher performance on bare metal vs a public cloud. Or at least I’ve never thought differently.

The difference is setting up all of that and maintaining it/debugging when something goes wrong is not a small task IMHO.

For some companies with that experience in-house I can understand doing it all yourself. As a solo founder and an employee of a small company we don’t have the bandwidth to do all of that without hiring 1+ more people which are more expensive than the cloud costs.

If we were drive-speed-constrained and getting that speed just wasn’t possible then maybe the math would shift further in favor of colo but we aren’t. Also upgrading the hardware our servers run on is fairly straightforward vs replacing a server on a rack or dealing with failing/older hardware.

6. bmicraft ◴[23 Feb 24 06:34 UTC] No.39477637{3}[source]▶

>>39450654 #

What this tells me is that the price of running inefficient cpus seemingly isn't nearly as high as I thought it would or should be (in terms of usd/kWh)

replies(1): >>39488811 #

7. seabrookmx ◴[24 Feb 24 03:12 UTC] No.39488811{4}[source]▶

>>39477637 #

Well they bill you for the instance not for some unit of computation. I'd imagine many users of E2 instances don't realize that they could be getting much much worse performance per vcore than if they picked a different instance type.

From Google's perspective, if the hardware is paid for, still reliable, and they can still make money on it, they can put new hardware in new racks rather than replacing the old hardware. This suggests Google's DC's aren't space constrained but I'm not surprised after looking at a few via satellite images!

replies(1): >>39503542 #

8. bmicraft ◴[25 Feb 24 18:50 UTC] No.39503542{5}[source]▶

>>39488811 #

Well not exactly. In my mind the price of running such old cpus for say the last (say, 4?) years would have been higher than buying new+new runtime costs. Those would definitely be considered opportunity costs that ought to be avoided.

replies(1): >>39516066 #

9. seabrookmx ◴[26 Feb 24 19:55 UTC] No.39516066{6}[source]▶

>>39503542 #

> the price of running such old cpus for say the last (say, 4?) years would have been higher than buying new+new runtime costs

I don't think this is true, because the old chips don't use more power outright[1][2][3]. In fact in many cases new chips use more power due to the higher core density. The new chips are way more efficient because they do more work per watt, but like I said in my previous comment you aren't paying for a unit of work. The billing model for the cloud providers is that of a rental: you pay per minute for the instance.

There's complexity here like being able to pack more "instances" (VM's) onto a physical host with the higher core count machines, but simply saying the new hardware is cheaper to run I don't think is clear cut.

[1]: https://cloud.google.com/compute/docs/cpu-platforms#intel_pr...

[2]: https://www.intel.com/content/www/us/en/products/sku/93792/i...

[3]: https://www.intel.com/content/www/us/en/products/sku/231746/...

replies(1): >>39539036 #

10. bmicraft ◴[28 Feb 24 15:19 UTC] No.39539036{7}[source]▶

>>39516066 #

True, although they could very well do upgrades on the kind of VPS' where they were already oversubscribed. If you're not paying for physical cores I don't think that argument works.

replies(1): >>39567739 #

11. seabrookmx ◴[01 Mar 24 22:48 UTC] No.39567739{8}[source]▶

>>39539036 #

Sure but my comment was about Google's E2 instances specifically, which are billed this way. For Cloud Run or the Google Services they host, I agree it would be odd for them to use old chips given the inefficiency.

↑