I had bet that matmult would be in transformer-optimized hardware costing a fraction of GPUs first class in torch 2 years ago with no reason to use GPUs any more. Wrong.
replies(2):
So really GPU v not GPU (e.g. TPU) doesn't matter a whole lot if you've got fundamentally the same memory architecture.