←back to thread

129 points ozgrakkurt | 1 comments | | HN request time: 0s | source
Show context
database64128 ◴[] No.45155611[source]
I see you use a hard-coded constant ALIGN = 512. Many NVMe drives actually allow you to raise the logical block size to 4096 by re-formatting (nvme-format(1)) the drive.
replies(2): >>45155982 #>>45163274 #
HippoBaro ◴[] No.45155982[source]
It’s really the hardware block size that matters in this case (direct I/O). That value is a property of the hardware and can’t be changed.

In some situations, the “logical” block size can differ. For example, buffered writes use the page cache, which operates in PAGE_SIZE blocks (usually 4K). Or your RAID stripe size might be misconfigured, stuff like that. Otherwise they should be equal for best outcomes.

In general, we want it to be as small as possible!

replies(2): >>45156110 #>>45159159 #
1. imtringued ◴[] No.45159159[source]
Why would you want the block size to be as small as possible? You will only benefit from that for very small files, hence the sweet spot is somewhere between "as small as possible" and "small multiple of the hardware block size".

If you have bigger files, then having bigger blocks means less fixed overhead from syscalls and NVMe/SATA requests.

If your native device block size is 4KiB, and you fetch 512 byte blocks, you need storage side RAM to hold smaller blocks and you have to address each block independently. Meanwhile if you are bigger than the device block size you end up with fewer requests and syscalls. If it turns out that the requested block size is too large for the device, then the OS can split your large request into smaller device appropriate requests to the storage device, since the OS knows the hardware characteristics.

The most difficult to optimize case is the one where you issue many parallel requests to the storage device using asynchronous file IO for latency hiding. In that case, knowing the device's exact block size is important, because you are IOPs bottlenecked and a block size that is closer to what the device supports natively will mean fewer IOPs per request.