←back to thread

548 points nsagent | 1 comments | | HN request time: 0.21s | source
Show context
lukev ◴[] No.44567263[source]
So to make sure I understand, this would mean:

1. Programs built against MLX -> Can take advantage of CUDA-enabled chips

but not:

2. CUDA programs -> Can now run on Apple Silicon.

Because the #2 would be a copyright violation (specifically with respect to NVidia's famous moat).

Is this correct?

replies(9): >>44567309 #>>44567350 #>>44567355 #>>44567600 #>>44567699 #>>44568060 #>>44568194 #>>44570427 #>>44577999 #
saagarjha ◴[] No.44567309[source]
No, it's because doing 2 would be substantially harder.
replies(2): >>44567356 #>>44567414 #
hangonhn ◴[] No.44567414[source]
Is CUDA tied very closely to the Nvidia hardware and architecture so that all the abstraction would not make sense on other platforms? I know very little about hardware and low level software.

Thanks

replies(4): >>44567469 #>>44567535 #>>44568191 #>>44568597 #
1. lcnielsen ◴[] No.44568191[source]
The kind of CUDA you or I would write is not very hardware specific (a few constants here and there) but the kind of CUDA behind cuBLAS with a million magic flags, inline PTX ("GPU assembly") and exploitation of driver/firmware hacks is. It's like the difference between numerics code in C and and numerics code in C with tons of in-line assembly code for each one of a number of specific processors.

You can see similar things if you buy datacenter-grade CPUs from AMD or Intel and compare their per-model optimized BLAS builds and compilers to using OpenBLAS or swapping them around. The difference is not world ending but you can see maybe 50% in some cases.