SSDs have become fast, except in the cloud

(databasearchitects.blogspot.com)

Show context

pclmulqdq ◴[20 Feb 24 17:22 UTC] No.39443994[source]▶

This was a huge technical problem I worked on at Google, and is sort of fundamental to a cloud. I believe this is actually a big deal that drives peoples' technology directions.

SSDs in the cloud are attached over a network, and fundamentally have to be. The problem is that this network is so large and slow that it can't give you anywhere near the performance of a local SSD. This wasn't a problem for hard drives, which was the backing technology when a lot of these network attached storage systems were invented, because they are fundamentally slow compared to networks, but it is a problem for SSD.

replies(30): >>39444009 #>>39444024 #>>39444028 #>>39444046 #>>39444062 #>>39444085 #>>39444096 #>>39444099 #>>39444120 #>>39444138 #>>39444328 #>>39444374 #>>39444396 #>>39444429 #>>39444655 #>>39444952 #>>39445035 #>>39445917 #>>39446161 #>>39446248 #>>39447169 #>>39447467 #>>39449080 #>>39449287 #>>39449377 #>>39449994 #>>39450169 #>>39450172 #>>39451330 #>>39466088 #

1. ejb999 ◴[20 Feb 24 17:26 UTC] No.39444046[source]▶

>>39443994 #

How much faster would the network need to get, in order to meet (or at least approach) the speed of a local SSD? are we talking about needing to 2x or 3x the speed, or by factors of hundreds or thousands?

replies(5): >>39444115 #>>39444119 #>>39444137 #>>39444150 #>>39444218 #

2. ◴[20 Feb 24 17:32 UTC] No.39444115[source]▶

>>39444046 (TP) #

3. Filligree ◴[20 Feb 24 17:32 UTC] No.39444119[source]▶

>>39444046 (TP) #

The Samsung 990 in my desktop provides ~3.5 GB/s streaming reads, ~2 GB/s 4k random-access reads, all at a latency measured at around 20-30 microseconds. My exact numbers might be a little off, but that's the ballpark you're looking at, and a 990 is a relatively cheap device.

10GbE is about the best you can hope for from a local network these days, but that's 1/5th the bandwidth and many times the latency. 100GbE would work, except the latency would still mean any read dependencies would be far slower than local storage, and I'm not sure there's much to be done about that; at these speeds the physical distance matters.

In practice I'm having to architecture the entire system around the SSD just to not bottleneck it. So far ext4 is the only filesystem that even gets close to the SSD's limits, which is a bit of a pity.

replies(1): >>39451386 #

4. wmf ◴[20 Feb 24 17:33 UTC] No.39444137[source]▶

>>39444046 (TP) #

Around 4x-10x depending on how many SSDs you want. A single SSD is around the speed of a 100 Gbps Ethernet link.

5. selectodude ◴[20 Feb 24 17:34 UTC] No.39444150[source]▶

>>39444046 (TP) #

SATA3 is 6 Gbit, so each VM on a machine multiplied by 6 Gbit. For NVMe, probably closer to 4-5x that. You’d need some serious interconnects to get a server rack access to un-bottlenecked SSD storage.

6. Nextgrid ◴[20 Feb 24 17:39 UTC] No.39444218[source]▶

>>39444046 (TP) #

The problem isn't necessarily speed, it's random access latency. What makes SSDs fast and "magical" is their low random-access latency compared to a spinning disk. The sequential-access read speed is merely a bonus.

Networked storage negates that significantly, absolutely killing performance for certain applications. You could have a 100Gbps network and it still won't match a direct-attached SSD in terms of latency (it can only match it in terms of sequential access throughput).

For many applications such as databases, random access is crucial, thus why nowadays' mid-range consumer hardware often outperforms hosted databases such as RDS unless they're so overprovisioned on RAM that the dataset is effectively always in there.

replies(2): >>39444606 #>>39447996 #

7. baq ◴[20 Feb 24 18:09 UTC] No.39444606[source]▶

>>39444218 #

100Gbps direct shouldn't be too bad, but it might be difficult to get anyone to sell it to you for exclusive usage in a vm...

8. Ericson2314 ◴[20 Feb 24 22:46 UTC] No.39447996[source]▶

>>39444218 #

Um... why the hell does the network care whether I am doing random or sequential access? Your left that part out of your argument.

replies(1): >>39448284 #

9. Nextgrid ◴[20 Feb 24 23:20 UTC] No.39448284{3}[source]▶

>>39447996 #

Ah sorry, my bad. You are correct that you can fire off many random access operations in parallel and get good throughput that way.

The problem is that this is not possible when the next IO request depends on the result of a previous one, like in a database where you must first read the index to know the location of the row data itself.

replies(1): >>39448578 #

10. Ericson2314 ◴[21 Feb 24 00:01 UTC] No.39448578{4}[source]▶

>>39448284 #

OK thanks yes that makes sense. Pipelining problems are real.

replies(1): >>39459410 #

11. ants_a ◴[21 Feb 24 08:29 UTC] No.39451386[source]▶

>>39444119 #

Networking doesn't have to have high latency. You can buy network hardware that is able to provide sub-microsecond latency. Physical distance still matters, but 10% of typical NVMe latency gets you through a kilometer of fiber.

12. Ericson2314 ◴[21 Feb 24 20:54 UTC] No.39459410{5}[source]▶

>>39448578 #

(The network indeed doesn't care, but bandwidth of dependent rather than independent accesses depends on latency)

↑