Yes but to compute a token it has to eventually read the data, either cached in RAM or from storage. There is no way that a fast SSD can compete with RAM in terms of I/O speed. To achieve any speed benefit the whole file has to be cached in RAM. This has different benefits eg threads can share memory, and the file does not have to be reread next time it is called because it is already cached in RAM, but, in final analysis, you need to have the RAM, or then you are reading from the disk, and reading 20gb for each token means you need to read 1T for a paragraph of 50 tokens. My m1, which by no means has a slow SSD, reads the file at 500-600mb/s, while a thunderbolt pci-4 enclosure reads at 700-800mb/s, even if you double that it will still take 10-20 seconds per token. To get less than 1 second per token for the 30B model one has to read there at 20gb/s. At the time we have done that, there will be even huger (v)RAMs and even larger models.