Most active commenters
  • zokier(4)

←back to thread

SSDs have become fast, except in the cloud

(databasearchitects.blogspot.com)
589 points greghn | 13 comments | | HN request time: 0s | source | bottom
Show context
zokier ◴[] No.39444037[source]
> Since then, several NVMe instance types, including i4i and im4gn, have been launched. Surprisingly, however, the performance has not increased; seven years after the i3 launch, we are still stuck with 2 GB/s per SSD.

AWS marketing claims otherwise:

    Up to 800K random write IOPS
    Up to 1 million random read IOPS
    Up to 5600 MB/second of sequential writes
    Up to 8000 MB/second of sequential reads

https://aws.amazon.com/blogs/aws/new-storage-optimized-amazo...
replies(1): >>39444172 #
sprachspiel ◴[] No.39444172[source]
This is for 8 SSDs and a single modern PCIe 5.0 has better specs than this.
replies(2): >>39444346 #>>39444404 #
1. nik_0_0 ◴[] No.39444404[source]
Is it? The line preceding the bullet list on that page seems to state otherwise:

“”

  Each storage volume can deliver the following performance (all measured using 4 KiB blocks):

  * Up to 8000 MB/second of sequential reads
“”
replies(1): >>39444564 #
2. sprachspiel ◴[] No.39444564[source]
Just tested a i4i.32xlarge:

  $ lsblk
  NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
  loop0          7:0    0  24.9M  1 loop /snap/amazon-ssm-agent/7628
  loop1          7:1    0  55.7M  1 loop /snap/core18/2812
  loop2          7:2    0  63.5M  1 loop /snap/core20/2015
  loop3          7:3    0 111.9M  1 loop /snap/lxd/24322
  loop4          7:4    0  40.9M  1 loop /snap/snapd/20290
  nvme0n1      259:0    0     8G  0 disk 
  ├─nvme0n1p1  259:1    0   7.9G  0 part /
  ├─nvme0n1p14 259:2    0     4M  0 part 
  └─nvme0n1p15 259:3    0   106M  0 part /boot/efi
  nvme2n1      259:4    0   3.4T  0 disk 
  nvme4n1      259:5    0   3.4T  0 disk 
  nvme1n1      259:6    0   3.4T  0 disk 
  nvme5n1      259:7    0   3.4T  0 disk 
  nvme7n1      259:8    0   3.4T  0 disk 
  nvme6n1      259:9    0   3.4T  0 disk 
  nvme3n1      259:10   0   3.4T  0 disk 
  nvme8n1      259:11   0   3.4T  0 disk
Since nvme0n1 is the EBS boot volume, we have 8 SSDs. And here's the read bandwidth for one of them:

  $ sudo fio --name=bla --filename=/dev/nvme2n1 --rw=read --iodepth=128 --ioengine=libaio --direct=1 --blocksize=16m
  bla: (g=0): rw=read, bs=(R) 16.0MiB-16.0MiB, (W) 16.0MiB-16.0MiB, (T) 16.0MiB-16.0MiB, ioengine=libaio, iodepth=128
  fio-3.28
  Starting 1 process
  ^Cbs: 1 (f=1): [R(1)][0.5%][r=2704MiB/s][r=169 IOPS][eta 20m:17s]
So we should have a total bandwidth of 2.7*8=21 GB/s. Not that great for 2024.
replies(6): >>39444657 #>>39444735 #>>39444982 #>>39445321 #>>39445456 #>>39543485 #
3. Nextgrid ◴[] No.39444657[source]
If you still have this machine, I wonder if you can get this bandwidth in parallel across all SSDs? There could be some hypervisor-level or host-level bottleneck that means while any SSD in isolation will give you the observed bandwidth, you can't actually reach that if you try to access them all in parallel?
4. Aachen ◴[] No.39444735[source]
So if I'm reading it right, the quote from the original article that started this thread was ballpark correct?

> we are still stuck with 2 GB/s per SSD

Versus the ~2.7 GiB/s your benchmark shows (bit hard to know where to look on mobile with all that line-wrapped output, and when not familiar with the fio tool; not your fault but that's why I'm double checking my conclusion)

5. dangoodmanUT ◴[] No.39444982[source]
that's 16m blocks, not 4k
replies(1): >>39445601 #
6. zokier ◴[] No.39445321[source]
I wonder if there is some tuning that needs to be done here, it seems suprising that the advertised rate would be this much off otherwise.
replies(1): >>39445596 #
7. dekhn ◴[] No.39445456[source]
Can you addjust --blocksize to correspond to the block size on the device? And with/without --direct=1
8. jeffbee ◴[] No.39445596{3}[source]
I would start with the LBA format, which is likely to be suboptimal for compatibility.
replies(1): >>39447787 #
9. wtallis ◴[] No.39445601{3}[source]
Last I checked, Linux splits up massive IO requests like that before sending them to the disk. But there's no benefit to splitting a sequential IO request all the way down to 4kB.
10. zokier ◴[] No.39447787{4}[source]
somehow I4g drives don't like to get formatted

    # nvme format /dev/nvme1 -n1 -f
    NVMe status: INVALID_OPCODE: The associated command opcode field is not valid(0x2001)
    # nvme id-ctrl /dev/nvme1 | grep oacs
    oacs      : 0
but the LBA format indeed is sus:

    LBA Format  0 : Metadata Size: 0   bytes - Data Size: 512 bytes - Relative Performance: 0 Best (in use)
replies(1): >>39448000 #
11. jeffbee ◴[] No.39448000{5}[source]
It's a shame. The recent "datacenter nvme" standards involving fb, goog, et al mandate 4K LBA support.
replies(1): >>39448552 #
12. zokier ◴[] No.39448552{6}[source]
it'd be great if you'd manage to throw together quick blogpost about i4g io perf, there obviously something funny going on and I imagine you guys could figure it out much easier than anybody else, especially if you are already having some figures in the marketing.
13. highfrequency ◴[] No.39543485[source]
The aggregate throughput matches the advertised number of 22,400 MB/s: https://aws.amazon.com/blogs/aws/new-storage-optimized-amazo...