Most active commenters

Popular/hot comments

>>44568550 #
>>44569316 #

←back to thread

Apple's MLX adding CUDA support

(github.com)

Show context

lukev ◴[15 Jul 25 02:18 UTC] No.44567263[source]▶

>>44565668 (OP) #

So to make sure I understand, this would mean:

1. Programs built against MLX -> Can take advantage of CUDA-enabled chips

but not:

2. CUDA programs -> Can now run on Apple Silicon.

Because the #2 would be a copyright violation (specifically with respect to NVidia's famous moat).

Is this correct?

replies(9): >>44567309 #>>44567350 #>>44567355 #>>44567600 #>>44567699 #>>44568060 #>>44568194 #>>44570427 #>>44577999 #

1. quitit ◴[15 Jul 25 02:39 UTC] No.44567355[source]▶

>>44567263 #

It's 1.

It means that a developer can use their relatively low-powered Apple device (with UMA) to develop for deployment on nvidia's relatively high-powered systems.

That's nice to have for a range of reasons.

replies(5): >>44568550 #>>44568740 #>>44569683 #>>44570543 #>>44571119 #

2. _zoltan_ ◴[15 Jul 25 06:47 UTC] No.44568550[source]▶

>>44567355 (TP) #

"relatively high powered"? there's nothing faster out there.

replies(4): >>44568714 #>>44568716 #>>44568748 #>>44569262 #

3. chvid ◴[15 Jul 25 07:11 UTC] No.44568714[source]▶

>>44568550 #

Relative to what you can get in the cloud or on a desktop machine.

4. MangoToupe ◴[15 Jul 25 07:12 UTC] No.44568716[source]▶

>>44568550 #

Is this true per watt?

replies(1): >>44569017 #

5. chvid ◴[15 Jul 25 07:15 UTC] No.44568740[source]▶

>>44567355 (TP) #

If Apple cannot do their own implementation of CUDA due to copyright second best is this; getting developers to build for LMX (which is on their laptops) and still get NVIDIA hardware support.

Apple should do a similar thing for AMD.

replies(2): >>44569645 #>>44570359 #

6. sgt101 ◴[15 Jul 25 07:16 UTC] No.44568748[source]▶

>>44568550 #

I wonder what Apple would have to do to make metal + its processors run faster than nVidia? I guess that it's all about the interconnects really.

replies(1): >>44569316 #

7. spookie ◴[15 Jul 25 08:07 UTC] No.44569017{3}[source]▶

>>44568716 #

It doesn't matter for a lot of applications. But fair, for a big part of them it is either essential or a nice to have. But completely off the point if we are waging fastest compute no matter what.

replies(1): >>44570777 #

8. quitit ◴[15 Jul 25 08:55 UTC] No.44569262[source]▶

>>44568550 #

Relative to the apple hardware, the nvidia is high powered.

I appreciate that English is your second language after your Hungarian mother-tongue. My comment reflects upon the low and high powered compute of the apple vs. nvidia hardware.

9. summarity ◴[15 Jul 25 09:06 UTC] No.44569316{3}[source]▶

>>44568748 #

Right now, for LLMs, the only limiting factor on Apple Silicon is memory bandwidth. There hasn’t been progress on this since the original M1 Ultra. And since abandoning UltraFusion, we won’t see progress here anytime soon either.

replies(3): >>44569480 #>>44569623 #>>44569854 #

10. glhaynes ◴[15 Jul 25 10:04 UTC] No.44569623{4}[source]▶

>>44569316 #

Have they abandoned UltraFusion? Last I’d heard, they’d just said something like “not all generations will get an Ultra chip” around the time the M4 showed up (the first M chip lacking an Ultra variation), which makes me think the M5 or M6 is fairly likely to get an Ultra.

11. ◴[15 Jul 25 10:11 UTC] No.44569645[source]▶

>>44568740 #

12. ◴[15 Jul 25 10:19 UTC] No.44569683[source]▶

>>44567355 (TP) #

13. librasteve ◴[15 Jul 25 10:57 UTC] No.44569854{4}[source]▶

>>44569316 #

this is like saying the only limiting factor on computers is the von neumann bottleneck

14. xd1936 ◴[15 Jul 25 12:17 UTC] No.44570359[source]▶

>>44568740 #

I thought that the US Supreme Court decision in Google v. Oracle and the Java reimplementation provided enough case precedent to allow companies to re-implement something like CUDA APIs?

https://www.theverge.com/2021/4/5/22367851/google-oracle-sup...

https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_....

replies(1): >>44574685 #

15. randomNumber7 ◴[15 Jul 25 12:41 UTC] No.44570543[source]▶

>>44567355 (TP) #

What is the performance penalty compared to a program in native CUDA?

16. johnboiles ◴[15 Jul 25 13:12 UTC] No.44570777{4}[source]▶

>>44569017 #

...fastest compute no matter watt

17. karmakaze ◴[15 Jul 25 13:45 UTC] No.44571119[source]▶

>>44567355 (TP) #

It would be great for Apple if enough developers took this path and Apple could later release datacenter GPUs that support MLX without CUDA.

replies(1): >>44574044 #

18. nightski ◴[15 Jul 25 18:01 UTC] No.44574044[source]▶

>>44571119 #

It's the other way around. If Apple released data center GPUs then developers might take that path. Apple has shown time and again they don't care for developers, so it's on them.

19. timhigins ◴[15 Jul 25 19:03 UTC] No.44574685{3}[source]▶

>>44570359 #

Exactly and see also ROCM/HIP which is AMD’s reimplementation of CUDA for their gpus.

replies(2): >>44579461 #>>44593560 #

20. pjmlp ◴[16 Jul 25 06:55 UTC] No.44579461{4}[source]▶

>>44574685 #

Reimplementation of CUDA C++, not CUDA.

CUDA is a set of four compilers, namely C, C++, Fortran and Python JIT DSLs, a bytecode and two compiler backend libraries, a set of compute libraries collection for the languages listed above, plugins for Eclipse and Visual Studio, a GPU graphical debugger and profiler.

21. qalmakka ◴[17 Jul 25 14:02 UTC] No.44593560{4}[source]▶

>>44574685 #

There's ZLUDA for AMD that actually implements CUDA, but it's still quite immature yet

↑