←back to thread

SSDs have become fast, except in the cloud

(databasearchitects.blogspot.com)
589 points greghn | 2 comments | | HN request time: 0s | source
Show context
c0l0 ◴[] No.39444187[source]
Seeing the really just puny "provisioned IOPS" numbers on hugely expensive cloud instances made me chuckle (first in disbelief, then in horror) when I joined a "cloud-first" enterprise shop in 2020 (having come from a company that hosted their own hardware at a colo).

It's no wonder that many people nowadays, esp. those who are so young that they've never experienced anything but cloud instances, seem to have little idea of how much performance you can actually pack in just one or two RUs today. Ultra-fast (I'm not parroting some marketing speak here - I just take a look at IOPS numbers, and compare them to those from highest-end storage some 10-12 years ago) NVMe storage is a big part of that astonishing magic.

replies(3): >>39448208 #>>39448367 #>>39449930 #
jauntywundrkind ◴[] No.39448367[source]
NVMe has been ridiculously great. I'm excited to see what happens to prices as E1 form factor ramps up! Much physically bigger drives allows for consolidation of parts, a higher ratio of flash chips to everything else, which seems promising. It's more a value line, but Intel's P5315 is 15TB at a quite low $0.9/GB.

It might not help much with oops though. Amazing that we have PCIe 5.0 16GB/s and already are so near theoretical max (some lost to overhead), even on consumer cards.

Going enterprise for the drive-writes-per-day (DWPD) is 100% worth it for most folks, but I am morbidly curious how different the performance profile would be running enterprise vs non these days. But reciprocally the high DWPD drives (Kioxia CD8P-V for example is DWPD of 3) seems to often come with somewhat more mild sustained 4k write oops, making me think maybe there's a speed vs reliability tradeoff that could be taken advantage of from consumer drives in some cases; not sure who wants tons of iops but doesn't actually intend to hit their Total Drive Writes, but it save you some iops/$ if so. That said, I'm shocked to see the enterprise premium is a lot less absurd than it used to be! (If you can find stock.)

replies(1): >>39449159 #
bcaxis ◴[] No.39449159[source]
The main problem with consumer drives is the missing power loss protection (plp). M.2 drives just don't have space for the caps like an enterprise 2.5 u.2/u.3 drive will have.

This matters when the DB calls a sync and it's expecting the data to be written safely to disk before it returns.

A consumer drive basically stops everything until it can report success and your IOPS falls to like 1/100th of what the drive is capable of if it's happening alot.

An enterprise drive with plp will just report success knowing it has the power to finish the pending writes. Full speed ahead.

You can "lie" to the process at the VPS level by enabling unsafe write back cache. You can do it at the OS level by launching the DB with "eatmydata". You will get the full performance of your SSD.

In the event of power loss you may well end up in an unrecoverable corrupted condition with these enabled.

I believe that if you buy all consumer parts - an enterprise drive is the best place to up spend your money profitably on an enterprise bit.

replies(2): >>39449317 #>>39454914 #
tumult ◴[] No.39449317[source]
My experience lately is that consumer drives will also lie and use a cache, but then drop your data on the floor if the power is lost or there’s a kernel panic / BSOD. (Samsung and others.)
replies(1): >>39449891 #
bcaxis ◴[] No.39449891[source]
Rumors of that. I've never actually seen it myself.
replies(3): >>39450500 #>>39451747 #>>39451836 #
tumult ◴[] No.39451747[source]
I can get it to happen easily. 970 Evo Plus. Write a text file and kill the power within 20 seconds or so, assuming not much other write activity. File will be zeroes or garbage, or not present on the filesystem, after reboot.
replies(1): >>39452783 #
c0l0 ◴[] No.39452783[source]
This happens for you after you invoked an explicit sync() (et al.) before the power cut?
replies(1): >>39453372 #
tumult ◴[] No.39453372{3}[source]
Yep.
replies(1): >>39455489 #
1. c0l0 ◴[] No.39455489{4}[source]
That is highly interesting and contrary to a number of reports I've read about the Samsung 970 EVO Plus Series (and experienced for myself) specifically! Can you share more details about your particular setup and methodology? (Specific model name/capacity, Firmware release, Kernel version, filesystem, mkfs and mount options, any relevant block layer funny business you are conciously setting would be of greatest interest.) Do you have more than one drive where this can happen?
replies(1): >>39456879 #
2. tumult ◴[] No.39456879[source]
Yeah, it happens on two of the 970 EVO Plus models. One on the older revision, and one on the newer. (I think there are only two?) It happens on both Linux and Windows. Uhh, I'm not sure about the kernel versions. I don't remember what I had booted at the time. On Windows I've seen it happen as far back as 1607 and as recently as 21H2. I've also seen it happen on someone else's computer (laptop.)

It's really easy to reproduce (at least for me?) and I'm pretty sure anyone can do it if they try to on purpose.