SSDs have become fast, except in the cloud

(databasearchitects.blogspot.com)

589 points greghn | 1 comments | 20 Feb 24 16:59 UTC | HN request time: 0.209s | source

Show context

siliconc0w ◴[20 Feb 24 17:23 UTC] No.39444011[source]▶

Core count plus modern nvme actually make a great case for moving away from the cloud- before it was, "your data probably fits into memory". These are so fast that they're close enough to memory so it's "your data surely fits on disk". This reduces the complexity of a lot of workloads so you can just buy a beefy server and do pretty insane caching/calculation/serving with just a single box or two for redundancy.

replies(3): >>39444040 #>>39444175 #>>39444225 #

malfist ◴[20 Feb 24 17:36 UTC] No.39444175[source]▶

>>39444011 #

I keep hearing that, but that's simply not true. SSDs are fast, but they're several orders of magnitude slower than RAM, which is orders of magnitude slower than CPU Cache.

Samsung 990 Pro 2TB has a latency of 40 μs

DDR4-2133 with a CAS 15 has a latency of 14 nano seconds.

DDR4 latency is 0.035% of one of the fastest SSDs, or to put it another way, DDR4 is 2,857x faster than an SSD.

L1 cache is typically accessible in 4 clock cycles, in 4.8 ghz cpu like the i7-10700, L1 cache latency is sub 1ns.

replies(5): >>39444275 #>>39444384 #>>39447096 #>>39448236 #>>39453512 #

LeifCarrotson ◴[20 Feb 24 17:52 UTC] No.39444384[source]▶

>>39444175 #

I wonder how many people have built failed businesses that never had enough customer data to exceed the DDR4 in the average developer laptop, and never had so many simultaneous queries it couldn't be handled by a single core running SQLite, but built the software architecture on a distributed cloud system just in case it eventually scaled to hundreds of terabytes and billions of simultaneous queries.

replies(5): >>39444867 #>>39444883 #>>39445536 #>>39445790 #>>39448007 #

malfist ◴[20 Feb 24 18:29 UTC] No.39444883[source]▶

>>39444384 #

I totally hear you about that. I work for FAANG, and I'm working on a service that has to be capable of sending 1.6m text messages in less than 10 minutes.

The amount of complexity the architecture has because of those constraints is insane.

When I worked at my previous job, management kept asking for that scale of designs for less than 1/1000 of the throughput and I was constantly pushing back. There's real costs to building for more scale than you need. It's not as simple as just tweaking a few things.

To me there's a couple of big breakpoints in scale:

* When you can run on a single server

* When you need to run on a single server, but with HA redundancies

* When you have to scale beyond a single server

* When you have to adapt your scale to deal with the limits of a distributed system, i.e. designing for DyanmoDB's partition limits.

Each step in that chain add irrevocable complexity, adds to OE, adds to cost to run and cost to build. Be sure you have to take those steps before you decide too.

replies(3): >>39446187 #>>39446459 #>>39446823 #

kuschku ◴[20 Feb 24 20:27 UTC] No.39446459[source]▶

>>39444883 #

Maybe I'm misunderstanding something, but that's about 2700 a second. Or about 3Mbps.

Even a very unoptimized application running on a dev laptop can serve 1Gbps nowadays without issues.

So what are the constraints that demand a complex architecture?

replies(1): >>39448463 #

rdoherty ◴[20 Feb 24 23:42 UTC] No.39448463[source]▶

>>39446459 #

I'm not the OP but a few things:

* Reading/fetching the data - usernames, phone number, message, etc.

* Generating the content for each message - it might be custom per person

* This is using a 3rd party API that might take anywhere from 100ms to 2s to respond, and you need to leave a connection open.

* Retries on errors, rescheduling, backoffs

* At least once or at most once sends? Each has tradeoffs

* Stopping/starting that many messages at any time

* Rate limits on some services you might be using alongside your service (network gateway, database, etc)

* Recordkeeping - did the message send? When?

replies(2): >>39452358 #>>39452892 #

1. joshstrange ◴[21 Feb 24 10:58 UTC] No.39452358[source]▶

>>39448463 #

I literally spent the last week speccing out a system just like this and you are completely correct. You’ve touched on almost every single thing we ran into.

↑