←back to thread

SSDs have become fast, except in the cloud

(databasearchitects.blogspot.com)
589 points greghn | 3 comments | | HN request time: 1.189s | source
1. StillBored ◴[] No.39447602[source]
Its worse than the article mentions. Because bandwidth isn't the problem its IOPS that are the problem.

Last time (about a year ago) I ran a couple random IO benchmarks against a storage optimized instances and the random IOPs behavior is closer to a large spinning RAID array than SSDs if the disk size is over some threshold.

IIRC, What it looks like is that there is a fast local SSD cache with a couple hundred GB of storage and then the rest is backed by remote spinning media.

Its one of the many reasons I have a hard time taking cloud optimization seriously, the lack of direct tiering controls means that database/etc style workloads are not going to optimize well and that will end up costing a lot of $$$$$.

So, maybe it was the instance types/configuration I was using, but <shrug> it was just something I was testing in passing.

replies(1): >>39448132 #
2. zokier ◴[] No.39448132[source]

  # fio --name=read_iops_test   --filename=/dev/nvme1n1 --filesize=1500G   --time_based --ramp_time=1s --runtime=15s   --ioengine=io_uring --fixedbufs --direct=1 --verify=0 --randrep
  eat=0   --bs=4K --iodepth=256 --rw=randread   --iodepth_batch_submit=256  --iodepth_batch_complete_max=256
  read_iops_test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256
  fio-3.32
  Starting 1 process
  Jobs: 1 (f=1): [r(1)][100.0%][r=2082MiB/s][r=533k IOPS][eta 00m:00s]
  read_iops_test: (groupid=0, jobs=1): err= 0: pid=34235: Tue Feb 20 22:57:00 2024
    read: IOPS=534k, BW=2086MiB/s (2187MB/s)(30.6GiB/15001msec)
      slat (nsec): min=713, max=255840, avg=31174.74, stdev=16248.45
      clat (nsec): min=1419, max=1175.6k, avg=443782.26, stdev=277389.66
      lat (usec): min=133, max=1240, avg=474.96, stdev=274.50
      clat percentiles (usec):
      |  1.00th=[  169],  5.00th=[  198], 10.00th=[  217], 20.00th=[  243],
      | 30.00th=[  265], 40.00th=[  285], 50.00th=[  306], 60.00th=[  334],
      | 70.00th=[  396], 80.00th=[  865], 90.00th=[  922], 95.00th=[  947],
      | 99.00th=[  996], 99.50th=[ 1012], 99.90th=[ 1045], 99.95th=[ 1057],
      | 99.99th=[ 1074]
    bw (  MiB/s): min= 2080, max= 2092, per=100.00%, avg=2086.72, stdev= 2.35, samples=30
    iops        : min=532548, max=535738, avg=534199.13, stdev=601.82, samples=30
    lat (usec)   : 2=0.01%, 100=0.01%, 250=23.06%, 500=50.90%, 750=0.28%
    lat (usec)   : 1000=24.90%
    lat (msec)   : 2=0.87%
    cpu          : usr=14.17%, sys=67.83%, ctx=156851, majf=0, minf=37
    IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
      submit    : 0=0.0%, 4=7.8%, 8=11.3%, 16=39.7%, 32=30.6%, 64=10.5%, >=64=0.1%
      complete  : 0=0.0%, 4=5.3%, 8=9.5%, 16=40.3%, 32=32.4%, 64=12.4%, >=64=0.1%
      issued rwts: total=8010661,0,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=256

  Run status group 0 (all jobs):
    READ: bw=2086MiB/s (2187MB/s), 2086MiB/s-2086MiB/s (2187MB/s-2187MB/s), io=30.6GiB (32.8GB), run=15001-15001msec

  Disk stats (read/write):
    nvme1n1: ios=8542481/0, merge=0/0, ticks=3822266/0, in_queue=3822266, util=99.37%

tldr: random 4k reads pretty much saturate the available 2GB/s bandwidth (this is on m6id)
replies(1): >>39453912 #
3. kmxdm ◴[] No.39453912[source]
Just for fun, ran the same workload on a locally-attached Gen4 enterprise-class 7.68TB NVMe SSD on "bare metal" (which is my home i9 system with an ecore/pcore situation so added cpus_allowed):

  sudo fio --name=read_iops_test   --filename=/dev/nvme0n1 --filesize=1500G   --time_based --ramp_time=1s --runtime=15s   --ioengine=io_uring --fixedbufs --direct=1 --verify=0 --randrepeat=0   --bs=4K --iodepth=256 --rw=randread   --iodepth_batch_submit=256  --iodepth_batch_complete_max=256 --cpus_allowed=0-7
  read_iops_test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256
  fio-3.28
  Starting 1 process
  Jobs: 1 (f=1): [r(1)][100.0%][r=6078MiB/s][r=1556k IOPS][eta 00m:00s]
  read_iops_test: (groupid=0, jobs=1): err= 0: pid=11085: Wed Feb 21 08:57:35 2024
    read: IOPS=1555k, BW=6073MiB/s (6368MB/s)(89.0GiB/15001msec)
      slat (nsec): min=401, max=93168, avg=7547.42, stdev=4396.47
      clat (nsec): min=1426, max=1958.2k, avg=154599.19, stdev=92730.02
       lat (usec): min=56, max=1963, avg=162.15, stdev=92.68
      clat percentiles (usec):
       |  1.00th=[   71],  5.00th=[   78], 10.00th=[   83], 20.00th=[   92],
       | 30.00th=[  100], 40.00th=[  111], 50.00th=[  124], 60.00th=[  141],
       | 70.00th=[  165], 80.00th=[  200], 90.00th=[  265], 95.00th=[  334],
       | 99.00th=[  519], 99.50th=[  603], 99.90th=[  807], 99.95th=[  898],
       | 99.99th=[ 1106]
     bw (  MiB/s): min= 5823, max= 6091, per=100.00%, avg=6073.70, stdev=47.56, samples=30
     iops        : min=1490727, max=1559332, avg=1554866.87, stdev=12174.38, samples=30
    lat (usec)   : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=30.18%
    lat (usec)   : 250=58.12%, 500=10.55%, 750=1.00%, 1000=0.13%
    lat (msec)   : 2=0.02%
    cpu          : usr=25.41%, sys=74.57%, ctx=2395, majf=0, minf=58
    IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
       submit    : 0=0.0%, 4=5.7%, 8=14.8%, 16=54.8%, 32=24.3%, 64=0.3%, >=64=0.1%
       complete  : 0=0.0%, 4=2.9%, 8=13.0%, 16=56.9%, 32=26.8%, 64=0.3%, >=64=0.1%
       issued rwts: total=23320075,0,0,0 short=0,0,0,0 dropped=0,0,0,0
       latency   : target=0, window=0, percentile=100.00%, depth=256
  
  Run status group 0 (all jobs):
     READ: bw=6073MiB/s (6368MB/s), 6073MiB/s-6073MiB/s (6368MB/s-6368MB/s), io=89.0GiB (95.5GB), run=15001-15001msec
  
  Disk stats (read/write):
    nvme0n1: ios=24547748/0, merge=1/0, ticks=3702834/0, in_queue=3702835, util=99.35%
And then again with IOPS limited to ~2GB/s:

  sudo fio --name=read_iops_test   --filename=/dev/nvme0n1 --filesize=1500G   --time_based --ramp_time=1s --runtime=15s   --ioengine=io_uring --fixedbufs --direct=1 --verify=0 --randrepeat=0   --bs=4K --iodepth=256 --rw=randread   --iodepth_batch_submit=256  --iodepth_batch_complete_max=256 --cpus_allowed=0-7 --rate_iops=534000
  read_iops_test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256
  fio-3.28
  Starting 1 process
  Jobs: 1 (f=1), 0-534000 IOPS: [r(1)][100.0%][r=2086MiB/s][r=534k IOPS][eta 00m:00s]
  read_iops_test: (groupid=0, jobs=1): err= 0: pid=11114: Wed Feb 21 08:59:30 2024
    read: IOPS=534k, BW=2086MiB/s (2187MB/s)(30.6GiB/15001msec)
      slat (nsec): min=817, max=88336, avg=41533.20, stdev=7711.33
      clat (usec): min=7, max=485, avg=93.19, stdev=39.73
       lat (usec): min=65, max=536, avg=134.72, stdev=37.83
      clat percentiles (usec):
       |  1.00th=[   32],  5.00th=[   41], 10.00th=[   47], 20.00th=[   59],
       | 30.00th=[   70], 40.00th=[   79], 50.00th=[   89], 60.00th=[   98],
       | 70.00th=[  110], 80.00th=[  122], 90.00th=[  145], 95.00th=[  167],
       | 99.00th=[  217], 99.50th=[  235], 99.90th=[  277], 99.95th=[  293],
       | 99.99th=[  334]
     bw (  MiB/s): min= 2084, max= 2086, per=100.00%, avg=2086.08, stdev= 0.38, samples=30
     iops        : min=533715, max=534204, avg=534037.57, stdev=97.91, samples=30
    lat (usec)   : 10=0.01%, 20=0.04%, 50=12.42%, 100=49.30%, 250=37.97%
    lat (usec)   : 500=0.28%
    cpu          : usr=11.48%, sys=27.35%, ctx=2278177, majf=0, minf=58
    IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.1%, >=64=100.0%
       submit    : 0=0.0%, 4=0.4%, 8=0.2%, 16=0.1%, 32=0.1%, 64=0.1%, >=64=99.3%
       complete  : 0=0.0%, 4=95.4%, 8=4.5%, 16=0.1%, 32=0.1%, 64=0.1%, >=64=0.0%
       issued rwts: total=8009924,0,0,0 short=0,0,0,0 dropped=0,0,0,0
       latency   : target=0, window=0, percentile=100.00%, depth=256

  Run status group 0 (all jobs):
     READ: bw=2086MiB/s (2187MB/s), 2086MiB/s-2086MiB/s (2187MB/s-2187MB/s), io=30.6GiB (32.8GB), run=15001-15001msec

  Disk stats (read/write):
    nvme0n1: ios=8543389/0, merge=0/0, ticks=934147/0, in_queue=934148, util=99.33%
edit: formatting...