SSDs have become fast, except in the cloud

(databasearchitects.blogspot.com)

589 points greghn | 1 comments | 20 Feb 24 16:59 UTC | HN request time: 1.599s | source

Show context

siliconc0w ◴[20 Feb 24 17:23 UTC] No.39444011[source]▶

Core count plus modern nvme actually make a great case for moving away from the cloud- before it was, "your data probably fits into memory". These are so fast that they're close enough to memory so it's "your data surely fits on disk". This reduces the complexity of a lot of workloads so you can just buy a beefy server and do pretty insane caching/calculation/serving with just a single box or two for redundancy.

replies(3): >>39444040 #>>39444175 #>>39444225 #

malfist ◴[20 Feb 24 17:36 UTC] No.39444175[source]▶

>>39444011 #

I keep hearing that, but that's simply not true. SSDs are fast, but they're several orders of magnitude slower than RAM, which is orders of magnitude slower than CPU Cache.

Samsung 990 Pro 2TB has a latency of 40 μs

DDR4-2133 with a CAS 15 has a latency of 14 nano seconds.

DDR4 latency is 0.035% of one of the fastest SSDs, or to put it another way, DDR4 is 2,857x faster than an SSD.

L1 cache is typically accessible in 4 clock cycles, in 4.8 ghz cpu like the i7-10700, L1 cache latency is sub 1ns.

replies(5): >>39444275 #>>39444384 #>>39447096 #>>39448236 #>>39453512 #

LeifCarrotson ◴[20 Feb 24 17:52 UTC] No.39444384[source]▶

>>39444175 #

I wonder how many people have built failed businesses that never had enough customer data to exceed the DDR4 in the average developer laptop, and never had so many simultaneous queries it couldn't be handled by a single core running SQLite, but built the software architecture on a distributed cloud system just in case it eventually scaled to hundreds of terabytes and billions of simultaneous queries.

replies(5): >>39444867 #>>39444883 #>>39445536 #>>39445790 #>>39448007 #

malfist ◴[20 Feb 24 18:29 UTC] No.39444883[source]▶

>>39444384 #

I totally hear you about that. I work for FAANG, and I'm working on a service that has to be capable of sending 1.6m text messages in less than 10 minutes.

The amount of complexity the architecture has because of those constraints is insane.

When I worked at my previous job, management kept asking for that scale of designs for less than 1/1000 of the throughput and I was constantly pushing back. There's real costs to building for more scale than you need. It's not as simple as just tweaking a few things.

To me there's a couple of big breakpoints in scale:

* When you can run on a single server

* When you need to run on a single server, but with HA redundancies

* When you have to scale beyond a single server

* When you have to adapt your scale to deal with the limits of a distributed system, i.e. designing for DyanmoDB's partition limits.

Each step in that chain add irrevocable complexity, adds to OE, adds to cost to run and cost to build. Be sure you have to take those steps before you decide too.

replies(3): >>39446187 #>>39446459 #>>39446823 #

disqard ◴[20 Feb 24 20:04 UTC] No.39446187[source]▶

>>39444883 #

I'm trying to guess what "OE" stands for... over engineering? operating expenditure? I'd love to know what you meant :)

replies(2): >>39446438 #>>39447039 #

1. malfist ◴[20 Feb 24 21:18 UTC] No.39447039[source]▶

>>39446187 #

Sorry, thought it was a common term. Operational Excellence. All the effort and time it takes to keep a service online, on call included

↑