SSDs have become fast, except in the cloud

(databasearchitects.blogspot.com)

589 points greghn | 2 comments | 20 Feb 24 16:59 UTC | HN request time: 0.413s | source

Show context

wistlo ◴[21 Feb 24 13:21 UTC] No.39453430[source]▶

At my job at a telco, I had a 13 billion record file to scan and index for duplicates and bad addresses.

Consultants brought in to move our apps (some of which were Excel macros, others SAS scripts running on old desktop) to Azure. The Azure architects identified Postgres as the best tool. Consultants attempted to create a Postgres index in a small Azure instance but their tests would fail without completion (they were string concatenation rather than the native indexing function).

Consultants' conclusion: file too big for Postgres.

I disputed this. Plenty of literature out there on Pg handling bigger files. The Postgres (for Windows!) instance on my Core I7 laptop with an nVME drive could index the file about an hour. As an experiment I spun up a bare metal nVME instance on a Ryzen 7600 (lowest power, 6 core) Zen 4 CPU pc with a 1TB Samsung PCIe 4 nVME drive.

Got my index in 10 minutes.

I then tried to replicate this in Azure, upping the CPUs, memory, and to the nVME Azure CPU family (Ebsv5). Even at a $2000/mo level, I could not get the Azure instance any faster than one fifth (about an hour) of the speed of my bare metal experiment. I probably could have matched it eventually with more cores, but did not want to get called on the carpet for a ten grand Azure bill.

All this happened while I was working from home (one can't spin up an experimental bare metal system at a drop-in spot in the communal workroom).

What happened next I don't know, because I left in the midst of RTO fever. I was given the option of moving 1000 miles to commute to a hub office, or retire "voluntarily with severance." I chose the latter.

replies(4): >>39454045 #>>39454106 #>>39458620 #>>39462271 #

rcarmo ◴[21 Feb 24 14:18 UTC] No.39454106[source]▶

>>39453430 #

As someone who works with Azure daily, I am amazed not just at the initial consultant's conclusion (that is, alas, typical of folk who do not understand database engines), but also to your struggle with NVMe storage (I have some pretty large SQLite databases on my personal projects).

You should not have needed an Ebsv5 (memory-optimised) instance. For that kind of thing, you should only have needed a D-series VM with a premium storage data disk (or, if you wanted a hypervisor-adjacent, very low latency volume, a temp volume in another SKU).

Anyway, many people fail to understand that Azure Storage works more like a SAN than a directly attached disk--when you attach a disk volume to the VM, you are actually attaching a _replica set_ of that storage that is at least three-way replicated and distributed across the datacenter to avoid data loss. You get RAID for free, if you will.

That is inherently slower than a hypervisor-adjacent (i.e., on-board) volume.

replies(2): >>39454503 #>>39454726 #

silverquiet ◴[21 Feb 24 14:50 UTC] No.39454503[source]▶

>>39454106 #

> Anyway, many people fail to understand that Azure Storage works more like a SAN than a directly attached disk--when you attach a disk volume to the VM, you are actually attaching a _replica set_ of that storage that is at least three-way replicated and distributed across the datacenter to avoid data loss. You get RAID for free, if you will.

I've said this a bit more sarcastically elsewhere in this thread, but basically, why would you expect people to understand this? Cloud is sold as abstracting away hardware details and giving performance SLAs billed by the hour (or minute, second, whatever). If you need to know significant details of their implementation, then you're getting to the point where you might as well buy your own hardware and save a bunch of money (which seems to be gaining some steam in a minor but noticeable cloud repatriation movement).

replies(2): >>39456435 #>>39458683 #

rcarmo ◴[21 Feb 24 17:06 UTC] No.39456435[source]▶

>>39454503 #

Well, in short, people need to understand that cloud is not their computer. It is resource allocation with underlying assumptions around availability, redundancy and performance at a scale well beyond what they would experience in their own datacenter.

And they absolutely must understand this to avoid mis-designing things. Failure to do so is just bad engineering, and a LOT of time is spent educating customers on these differences.

A case in point that aligns with is that I used to work with Hadoop clusters, where you would use data replication for both redundancy and distributed processing. Moving Hadoop to Azure and maintaining conventional design rules (i.e., tripling the amount of disks) is the wrong way do do things, because it isn't required neither for redundancy nor for performance (they are both catered for by the storage resources).

(Of course there are better solutions than Hadoop these days - Spark being one that is very nice from a cloud resource perspective - but many people have nine times the storage they need allocated in their cloud Hadoop clusters because of lack of understanding...)

replies(1): >>39456979 #

1. silverquiet ◴[21 Feb 24 17:46 UTC] No.39456979[source]▶

>>39456435 #

I would think that lifting and shifting a Hadoop setup into the cloud would be considered an anti-pattern anyway; typically you would be told to find a managed, cloud-native solution.

replies(1): >>39458992 #

2. rcarmo ◴[21 Feb 24 20:23 UTC] No.39458992[source]▶

>>39456979 (TP) #

You would be surprised at what corporate thinking and procurement departments actually think is best.

↑