As I understand, Vulkan allows to run custom code on GPU, including the code to multiply matrices. Can one simply use Vulkan and ignore CUDA, PyTorch and ROCm?
of course, but then you are just recreating CUDA. And that won’t scale well across an industry since each company would have their own language. AMD can just do what you are describing and then sell it as a standard.
I mean they literally did that, but then dropped it so yea