FWIW, this is only true for newer hardware. ie if you plugged in a pcie gen3x16 device into a pcie gen4x8 slot, although the bandwidth provided is in the same ballpark, the device will only run at pcie gen3x8.
So we'll need until the devices upgrade themselves to gen4 in this scenario to make use of higher bandwidth.
800G ethernet is here at the switches (Dell Z9864F-ON is beautiful... 128 ports of 400G), but not yet at the server/NIC level, that comes with PCIe6. We are also limited to 16 chassis/128 GPUs in a single cluster right now.
NVMe is getting faster all the time, but is pretty standard now. We put 122TB into each server, so that enables local caching of data, if needed.
All of this is designed for the highest speed available today that we can get on the various bus where data is transferred.
Not all pci-e lanes on your motherboard are created equal: Some are directly attached to the CPU others are connected to the chipset, which in turn is connected to the CPU.
It's possible to convert a single 5.0 x16 connection coming from the CPU to 2 4.0 x16 connections.
I upgraded to a 4 TB NVMe drive that theoretically reads/writes at up to 7.7 GB/s, but I only get 3.5 GB/s because my CPU is still an i9-9900K running PCI-E 3.0.
Planning on upgrading once the next Intel generation drops.
I expect consumer machines to keep doing some conversion and expansion in the chipset, but nowhere else. I expect servers to directly attach almost everything and drop down to smaller lane counts for large numbers of devices.
It's worth noting that when Kioxia first put out PCIe 5.0 EDSFF drives, they were marketing them as being optimized for 2 lanes at the higher speed.