SSDs have become fast, except in the cloud

(databasearchitects.blogspot.com)

589 points greghn | 1 comments | 20 Feb 24 16:59 UTC | HN request time: 0.001s | source

Show context

pclmulqdq ◴[20 Feb 24 17:22 UTC] No.39443994[source]▶

This was a huge technical problem I worked on at Google, and is sort of fundamental to a cloud. I believe this is actually a big deal that drives peoples' technology directions.

SSDs in the cloud are attached over a network, and fundamentally have to be. The problem is that this network is so large and slow that it can't give you anywhere near the performance of a local SSD. This wasn't a problem for hard drives, which was the backing technology when a lot of these network attached storage systems were invented, because they are fundamentally slow compared to networks, but it is a problem for SSD.

replies(30): >>39444009 #>>39444024 #>>39444028 #>>39444046 #>>39444062 #>>39444085 #>>39444096 #>>39444099 #>>39444120 #>>39444138 #>>39444328 #>>39444374 #>>39444396 #>>39444429 #>>39444655 #>>39444952 #>>39445035 #>>39445917 #>>39446161 #>>39446248 #>>39447169 #>>39447467 #>>39449080 #>>39449287 #>>39449377 #>>39449994 #>>39450169 #>>39450172 #>>39451330 #>>39466088 #

_Rabs_ ◴[20 Feb 24 17:24 UTC] No.39444028[source]▶

>>39443994 #

So much of this. The amount of times I've seen someone complain about slow DB performance when they're trying to connect to it from a different VPC, and bottlenecking themselves to 100Mbits is stupidly high.

Literally depending on where things are in a data center... If you're looking for closely coupled and on a 10G line on the same switch, going to the same server rack. I bet you performance will be so much more consistent.

replies(3): >>39444090 #>>39444438 #>>39505345 #

bugbuddy ◴[20 Feb 24 17:30 UTC] No.39444090[source]▶

>>39444028 #

Aren’t 10G and 100G connections standard nowadays in data centers? Heck, I thought they were standard 10 years ago.

replies(4): >>39444293 #>>39444309 #>>39444315 #>>39446155 #

pixl97 ◴[20 Feb 24 17:47 UTC] No.39444315[source]▶

>>39444090 #

Bandwidth delay product does not help serialized transactions. If you're reaching out to disk for results, or if you have locking transactions on a table the achievable operations drops dramatically as latency between the host and the disk increases.

replies(1): >>39444997 #

bee_rider ◴[20 Feb 24 18:36 UTC] No.39444997[source]▶

>>39444315 #

The typical way to trade bandwidth away for latency would, I guess, be speculative requests. In the CPU world at least. I wonder if any cloud providers have some sort of framework built around speculative disk reads (or maybe it is a totally crazy trade to make in this context)?

replies(3): >>39446077 #>>39446671 #>>39448491 #

1. pixl97 ◴[20 Feb 24 19:55 UTC] No.39446077{3}[source]▶

>>39444997 #

I mean we already have readahead in the kernel.

This said the problem can get more complex than this really fast. Write barriers for example and dirty caches. Any application that forces writes and the writes are enforced by the kernel are going to suffer.

The same is true for SSD settings. There are a number of tweakable values on SSDs when it comes to write commit and cache usage which can affect performance. Desktop OS's tend to play more fast and loose with these settings and servers defaults tend to be more conservative.

↑