Rust CUDA Project | slacker news

1. shmerl ◴[11 Apr 25 18:25 UTC] No.43656833[source]▶

Looks like a dead end. Why CUDA? There should be some way to use Rust for GPU programming in general fashion, without being tied to Nvidia.

replies(5): >>43656967 #>>43657008 #>>43657034 #>>43658709 #>>43659892 #

2. kouteiheika ◴[11 Apr 25 18:38 UTC] No.43656967[source]▶

>>43656833 (TP) #

There's no cross-vendor API which exposes the full power of the hardware. For example, you can use Vulkan to do compute on the GPU, but it doesn't expose all of the features that CUDA exposes, and you need to do the legwork yourself reimplementing all of the well optimized libraries (like e.g. cublas or cudnn) that you get for free with CUDA.

replies(1): >>43658043 #

3. the__alchemist ◴[11 Apr 25 18:41 UTC] No.43657008[source]▶

>>43656833 (TP) #

CUDA is the easiest-to-use and most popular GPGPU framework. I agree that it's unfortunate there aren't good alternatives! As kouteiheika pointed out, you can use Vulkan (Or OpenCL), but they are not as pleasant.

replies(1): >>43658049 #

4. pjmlp ◴[11 Apr 25 18:43 UTC] No.43657034[source]▶

>>43656833 (TP) #

Because others so far have failed to deliver anything worthwhile using, with the same tooling ecosystem as CUDA.

replies(3): >>43657851 #>>43658002 #>>43658007 #

5. coffeeaddict1 ◴[11 Apr 25 19:55 UTC] No.43657851[source]▶

>>43657034 #

While I agree, that CUDA is the best in class API for GPU programming, OpenCL, Vulkan compute shaders and Sycl are alternatives that are usable. I'm for example, using compute shaders for writing GPGPU algorithms that work on Mac, AMD, Intel and Nvidia. It works ok. The debugging experience and ecosystem sucks compared to CUDA, but being able to run the algorithms across platforms is a huge advantage over CUDA.

replies(3): >>43658021 #>>43658035 #>>43658602 #

6. ◴[11 Apr 25 20:11 UTC] No.43658002[source]▶

>>43657034 #

7. shmerl ◴[11 Apr 25 20:11 UTC] No.43658007[source]▶

>>43657034 #

To deliver, you need to make Rust target the GPU in a general way, like some IR, and then may be compile that into GPU machine code for each GPU architecture specifically.

So this project is a dead end, because it's them who are these "others" - they are developing it and they are doing it wrong.

replies(1): >>43658699 #

8. keldaris ◴[11 Apr 25 20:12 UTC] No.43658021{3}[source]▶

>>43657851 #

How are you writing compute shaders that work on all platforms, including Mac? Are you just writing Vulkan and relying on MoltenVK?

AFAIK, the only solution that actually works on all major platforms without additional compatibility layers today is OpenCL 1.2 - which also happens to be officially deprecated on MacOS, but still works for now.

replies(2): >>43658633 #>>43658666 #

9. fragmede ◴[11 Apr 25 20:13 UTC] No.43658035{3}[source]▶

>>43657851 #

why do you need to run across all those platforms? what's the cost benefit for doing so?

replies(1): >>43658724 #

10. shmerl ◴[11 Apr 25 20:14 UTC] No.43658043[source]▶

>>43656967 #

Make a compiler that takes Rust and compiles into some IR, then another compiler that compiles that IR into GPU machine code. Then it can work and that's going to be your API (what you developed in Rust).

That's the whole point of what's missing. Not some wrapper around CUDA.

11. shmerl ◴[11 Apr 25 20:15 UTC] No.43658049[source]▶

>>43657008 #

It defeats the purpose. Easy to use should be something in Rust, not CUDA.

replies(1): >>43660550 #

12. pjmlp ◴[11 Apr 25 21:09 UTC] No.43658602{3}[source]▶

>>43657851 #

No they aren't, because they lack the polyglot support from CUDA and as you acknowledge the debugging experience and ecosystem sucks.

13. pjmlp ◴[11 Apr 25 21:11 UTC] No.43658633{4}[source]▶

>>43658021 #

And is stuck with C99, versus C++20, Fortran, Julia, Haskell, C#, anything else someone feels like targeting PTX with.

replies(1): >>43658760 #

14. coffeeaddict1 ◴[11 Apr 25 21:14 UTC] No.43658666{4}[source]▶

>>43658021 #

Yes, MoltenVK works fine. Alternatively, you can also use WebGPU (there are C++ and Rust native libs) which is a simpler but more limiting API.

replies(1): >>43658775 #

15. pjmlp ◴[11 Apr 25 21:16 UTC] No.43658699{3}[source]▶

>>43658007 #

Plus IDE support, Nsight level debugging, GPU libraries, yes most likely bound to fail unless NVidia, like it happened with other languages sees enough business value to give an helping hand.

They are already using Rust in Dynamo, even though the public API is Python.

16. ◴[11 Apr 25 21:17 UTC] No.43658709[source]▶

>>43656833 (TP) #

17. coffeeaddict1 ◴[11 Apr 25 21:18 UTC] No.43658724{4}[source]▶

>>43658035 #

Well it really depends on the kind of work you're doing. My (non-AI) software allows users to run my algorithms on whatever server-side GPU or local device they have. This is a big advantage IMO.

replies(1): >>43659681 #

18. keldaris ◴[11 Apr 25 21:22 UTC] No.43658760{5}[source]▶

>>43658633 #

Technically, OpenCL can also include inline PTX assembly in kernels (unlike any compute shader API I've ever seen), which is relevant for targeting things like tensor cores. You're absolutely right about the language limitation, though.

replies(1): >>43662463 #

19. keldaris ◴[11 Apr 25 21:24 UTC] No.43658775{5}[source]▶

>>43658666 #

WebGPU has no support for tensor cores (or their Apple Silicon equivalents). Vulkan has an Nvidia extension for it, is there any way to make MoltenVK use simdgroup_matrix instructions in compute shaders?

replies(1): >>43658912 #

20. coffeeaddict1 ◴[11 Apr 25 21:39 UTC] No.43658912{6}[source]▶

>>43658775 #

AFAIK, MoltenVK doesn't. Dawn (Google's C++ WebGPU implementation) does have some experimental support for it [0][1].

[0] https://issues.chromium.org/issues/348702031

[1] https://github.com/gpuweb/gpuweb/issues/4195

21. fragmede ◴[11 Apr 25 23:09 UTC] No.43659681{5}[source]▶

>>43658724 #

interesting! Can you say more about what kind of algorithms your software runs?

replies(1): >>43662446 #

22. LegNeato ◴[11 Apr 25 23:39 UTC] No.43659892[source]▶

>>43656833 (TP) #

You can look to https://github.com/Rust-GPU/rust-gpu/ for vulkan.

replies(1): >>43688827 #

23. adastra22 ◴[12 Apr 25 01:36 UTC] No.43660550{3}[source]▶

>>43658049 #

The purpose is to get shit done.

24. coffeeaddict1 ◴[12 Apr 25 08:17 UTC] No.43662446{6}[source]▶

>>43659681 #

My work is primarily about the processing of medical images (which usually are large 3D images). Doing this on the GPU, can be up to 10-20x faster.

replies(1): >>43667306 #

25. pjmlp ◴[12 Apr 25 08:20 UTC] No.43662463{6}[source]▶

>>43658760 #

At which point why bother, PTX is CUDA.

replies(1): >>43666392 #

26. keldaris ◴[12 Apr 25 17:33 UTC] No.43666392{7}[source]▶

>>43662463 #

Generally, the reason to bother with this approach is if you have a project that only needs tensor cores in a tiny part of the code and otherwise benefits from the cross platform nature of OpenCL, so you have a mostly shared codebase with a small vendor-specific optimization in a kernel or two. I've been in that situation and do find that approach valuable, but I'll be the first to admit the modern GPGPU landscape is full of unpleasant compromises whichever way you look.

27. fragmede ◴[12 Apr 25 19:33 UTC] No.43667306{7}[source]▶

>>43662446 #

But what about that wants to be multi-platform instead of picking one an specializing, probably picking up some more optimizations along the way?

28. shmerl ◴[15 Apr 25 03:53 UTC] No.43688827[source]▶

>>43659892 #

That's a good example of of what it should be.