←back to thread

SSDs have become fast, except in the cloud

(databasearchitects.blogspot.com)
589 points greghn | 2 comments | | HN request time: 0.612s | source
Show context
pclmulqdq ◴[] No.39443994[source]
This was a huge technical problem I worked on at Google, and is sort of fundamental to a cloud. I believe this is actually a big deal that drives peoples' technology directions.

SSDs in the cloud are attached over a network, and fundamentally have to be. The problem is that this network is so large and slow that it can't give you anywhere near the performance of a local SSD. This wasn't a problem for hard drives, which was the backing technology when a lot of these network attached storage systems were invented, because they are fundamentally slow compared to networks, but it is a problem for SSD.

replies(30): >>39444009 #>>39444024 #>>39444028 #>>39444046 #>>39444062 #>>39444085 #>>39444096 #>>39444099 #>>39444120 #>>39444138 #>>39444328 #>>39444374 #>>39444396 #>>39444429 #>>39444655 #>>39444952 #>>39445035 #>>39445917 #>>39446161 #>>39446248 #>>39447169 #>>39447467 #>>39449080 #>>39449287 #>>39449377 #>>39449994 #>>39450169 #>>39450172 #>>39451330 #>>39466088 #
jsnell ◴[] No.39444096[source]
According to the submitted article, the numbers are from AWS instance types where the SSD is "physically attached" to the host, not about SSD-backed NAS solutions.

Also, the article isn't just about SSDs being no faster than a network. It's about SSDs being two orders of magnitude slower than datacenter networks.

replies(3): >>39444161 #>>39444353 #>>39448728 #
pclmulqdq ◴[] No.39444161[source]
It's because the "local" SSDs are not actually physically attached and there's a network protocol in the way.
replies(14): >>39444222 #>>39444248 #>>39444253 #>>39444261 #>>39444341 #>>39444352 #>>39444373 #>>39445175 #>>39446024 #>>39446163 #>>39446271 #>>39446742 #>>39446840 #>>39446893 #
jsnell ◴[] No.39444373[source]
I think you're wrong about that. AWS calls this class of storage "instance storage" [0], and defines it as:

> Many Amazon EC2 instances can also include storage from devices that are located inside the host computer, referred to as instance storage.

There might be some wiggle room in "physically attached", but there's none in "storage devices located inside the host computer". It's not some kind of AWS-only thing either. GCP has "local SSD disks"[1], which I'm going to claim are likewise local, not over the network block storage. (Though the language isn't as explicit as for AWS.)

[0] https://aws.amazon.com/ec2/instance-types/

[1] https://cloud.google.com/compute/docs/disks#localssds

replies(5): >>39444464 #>>39445545 #>>39447509 #>>39449306 #>>39450882 #
wstuartcl ◴[] No.39447509[source]
the tests were for these local (metal direct connect ssds). The issue is not network overhead -- its that just like everything else in cloud the performance of 10 years ago was used as the baseline that carries over today with upcharges to buy back the gains.

there is a reason why vcpu performance is still locked to the typical core from 10 years ago when every core on a machine today in those data scenters is 3-5x or more speed basis. Its cause they can charge you for 5x the cores to get that gain.

replies(2): >>39448553 #>>39450455 #
wmf ◴[] No.39448553[source]
vcpu performance is still locked to the typical core from 10 years ago

No. In some cases I think AWS actually buys special processors that are clocked higher than the ones you can buy.

replies(2): >>39449463 #>>39450390 #
gowld ◴[] No.39449463[source]
You are talking about real CPU not virtual cpu
replies(1): >>39449912 #
1. wmf ◴[] No.39449912[source]
Generally each vCPU is a dedicated hardware thread, which has gotten significantly faster in the last 10 years. Only lambdas, micros, and nanos have shared vCPUs and those have probably also gotten faster although it's not guaranteed.
replies(1): >>39450872 #
2. jandrewrogers ◴[] No.39450872[source]
In fairness, there are a not insignificant number of workloads that do not benefit from hardware threads on CPUs [0], instead isolating processes along physical cores for optimal performance.

[0] Assertion not valid for barrel processors.