You might end up with even more than that due to filesystem metadata (inode records, checksums), metadata of an underlying RAID mechanism or, when working via some sort of networking, stuff like ethernet frame sizes/MTU.
In an ideal world, there would be a clear interface which a program can use to determine for any given combination of storage media, HW RAID, transport layer (local attach vs stuff like iSCSI or NFS), SW RAID (i.e. mdraid), filesystem and filesystem features what the most sensible minimum changeable unit is to avoid unnecessary write amplification bloat.
They actually charge for IOPS, the throughput is just an upper bound that is easier to sell.
For gp SSDs, if requested pages are continuous 256K they will be merged into a single operation. For io page size is 256K on Nitro instances and 16K on everything else, st and sc have 1M page size.