←back to thread

23 points robertvc | 1 comments | | HN request time: 0.648s | source
Show context
subharmonicon ◴[] No.45154175[source]
TLDR: In order to get good performance you need to use vendor-specific extensions that result in the same lock-in Modular has been claiming they will enable you to avoid.
replies(2): >>45154429 #>>45156295 #
1. imtringued ◴[] No.45156295[source]
Correct. There is too much architectural divergence between GPU vendors. If they really wanted to avoid vendor specific extensions in user level code, they would have gone with something that could be said to be loosely inspired by tiny grad (which isn't ready yet).

Basically, you need a good description of the hardware and the compiler automatically generates the state of the art GEMM kernel.

Maybe it's 20% worse than Nvidia's hand written kernels, but you can switch hardware vendors or build arbitrary fused kernels at will.