←back to thread

SSDs have become fast, except in the cloud

(databasearchitects.blogspot.com)
589 points greghn | 1 comments | | HN request time: 0.305s | source
Show context
pclmulqdq ◴[] No.39443994[source]
This was a huge technical problem I worked on at Google, and is sort of fundamental to a cloud. I believe this is actually a big deal that drives peoples' technology directions.

SSDs in the cloud are attached over a network, and fundamentally have to be. The problem is that this network is so large and slow that it can't give you anywhere near the performance of a local SSD. This wasn't a problem for hard drives, which was the backing technology when a lot of these network attached storage systems were invented, because they are fundamentally slow compared to networks, but it is a problem for SSD.

replies(30): >>39444009 #>>39444024 #>>39444028 #>>39444046 #>>39444062 #>>39444085 #>>39444096 #>>39444099 #>>39444120 #>>39444138 #>>39444328 #>>39444374 #>>39444396 #>>39444429 #>>39444655 #>>39444952 #>>39445035 #>>39445917 #>>39446161 #>>39446248 #>>39447169 #>>39447467 #>>39449080 #>>39449287 #>>39449377 #>>39449994 #>>39450169 #>>39450172 #>>39451330 #>>39466088 #
vlovich123 ◴[] No.39444024[source]
Why do they fundamentally need to be network attached storage instead of local to the VM?
replies(5): >>39444042 #>>39444055 #>>39444065 #>>39444132 #>>39444197 #
SteveNuts ◴[] No.39444132[source]
Because even if you can squeeze 100TB or more of SSD/NVMe in a server, and there are 10 tenants using the machine, you're limited to 10TB as a hard ceiling.

What happens when one tenant needs 200TB attached to a server?

Cloud providers are starting to offer local SSD/NVMe, but you're renting the entire machine, and you're still limited to exactly what's installed in that server.

replies(3): >>39444256 #>>39444774 #>>39446160 #
1. vel0city ◴[] No.39444774[source]
Given AWS and GCP offer multiple sizes for the same processor version with local SSDs, I don't think you have to rent the entire machine.

Search for i3en API names and you'll see:

i3en.large, 2x CPU, 1250GB SSD

i3en.xlarge, 4x CPU, 2500GB SSD

i3en.2xlarge, 8x CPU, 2x2500GB SSD

i3en.3xlarge, 12x CPU, 7500GB SSD

i3en.6xlarge, 24x CPU, 2x7500GB SSD

i3en.12xlarge, 48x CPU, 4x7500GB SSD

i3en.24xlarge, 96x CPU, 8x7500GB SSD

i3en.metal, 96x CPU, 8x7500GB SSD

So they've got servers with 96 CPUs and 8x7500GB SSDs. You can get a slice of one, or you can get the whole one. All of these are the ratio of 625GB of local SSD per CPU core.

https://instances.vantage.sh/

On GCP you can get a 2-core N2 instance type and attach multiple local SSDs. I doubt they have many physical 2-core Xeons in their datacenters.