←back to thread

SSDs have become fast, except in the cloud

(databasearchitects.blogspot.com)
589 points greghn | 1 comments | | HN request time: 0s | source
Show context
pclmulqdq ◴[] No.39443994[source]
This was a huge technical problem I worked on at Google, and is sort of fundamental to a cloud. I believe this is actually a big deal that drives peoples' technology directions.

SSDs in the cloud are attached over a network, and fundamentally have to be. The problem is that this network is so large and slow that it can't give you anywhere near the performance of a local SSD. This wasn't a problem for hard drives, which was the backing technology when a lot of these network attached storage systems were invented, because they are fundamentally slow compared to networks, but it is a problem for SSD.

replies(30): >>39444009 #>>39444024 #>>39444028 #>>39444046 #>>39444062 #>>39444085 #>>39444096 #>>39444099 #>>39444120 #>>39444138 #>>39444328 #>>39444374 #>>39444396 #>>39444429 #>>39444655 #>>39444952 #>>39445035 #>>39445917 #>>39446161 #>>39446248 #>>39447169 #>>39447467 #>>39449080 #>>39449287 #>>39449377 #>>39449994 #>>39450169 #>>39450172 #>>39451330 #>>39466088 #
jsolson ◴[] No.39450172[source]
This is untrue of Local SSD (https://cloud.google.com/local-ssd) in Google Cloud. Local SSDs are PCIe peripherals, not network attached.

There are also multiple Persistent Disk (https://cloud.google.com/persistent-disk) offerings that are backed by SSDs over the network.

(I'm an engineer on GCE. I work directly on the physical hardware that backs our virtualization platform.)

replies(1): >>39450847 #
jiggawatts ◴[] No.39450847[source]
It's notable that your second link has a screenshot for 24(!) NVMe SSDs totalling 9 terabytes, but the aggregate performance is 2.4M IOPS and 9.3 GB/s for reads. In other words, just 100K/400MB per individual SSD, which is very low these days.

For comparison, a single 1 TB consumer SSD can deliver comparable numbers (lower IOPS but higher throughput).

If I plugged 24 consumer SSDs into a box, I would expect over 30M IOPS and near the memory bus limit for throughput (>50 GB/s).

replies(2): >>39451102 #>>39453911 #
barrkel ◴[] No.39453911[source]
There's a quadrant of the market which is poorly served by the Cloud model of elastic compute: persistent local SSDs across shutdown and restart.

Elastic compute means you want to be able to treat compute hardware as fungible. Persistent local storage makes that a lot harder because the Cloud provider wants to hand out that compute to someone else after shutdown, so the local storage needs to be wiped.

So you either get ephemeral local SSDs (and have to handle rebuild on restart yourself) or network-attached SSDs with much higher reliability and persistence, but a fraction of the performance.

Active instances can be migrated, of course, with sufficient cleverness in the I/O stack.

replies(1): >>39456071 #
1. jsolson ◴[] No.39456071{3}[source]
GCE fares a little better than this:

- VMs with SSDs can (in general -- there are exceptions for things like GPUs and exceptionally large instances) live migrate with contents preserved.

- GCE supports a timeboxed "restart in place" feature where the VM stays in limbo ("REPAIRING") for some amount of time waiting for the host to return to service: https://cloud.google.com/compute/docs/instances/host-mainten.... This mostly only applies to transient failures like power-loss beyond battery/generator sustaining thresholds, software crashes, etc.

- There is a related feature, also controlled by the `--discard-local-ssd=` flag, which allows preservation of local SSD data on a customer initiated VM stop.