Most active commenters
  • pjmlp(7)
  • coffeeaddict1(5)
  • the__alchemist(4)
  • shmerl(4)
  • keldaris(4)
  • fragmede(3)
  • LegNeato(3)

Rust CUDA Project

(github.com)
143 points sksxihve | 47 comments | | HN request time: 2.163s | source | bottom
1. porphyra ◴[] No.43656375[source]
Very cool to see this project get rebooted. I'm hoping it will have the critical mass needed to actually take off. Writing CUDA kernels in C++ is a pain.

In theory, since the NVVM IR is based on LLVM IR, rust in CUDA should be quite doable. In practice, though, of course it is an extreme amount of work.

replies(1): >>43657023 #
2. the__alchemist ◴[] No.43656376[source]
Summary, from someone who uses CUDA on rust in several projects (Computational chemistry and cosmology simulations):

  - This lib has been in an unusable and unmaintained state for years. I.e., to get it working, you need to use specific, several-years-old variants of both rustc, and CUDA.
  - It was recently rebooted. I haven't tried the Github branch, but there isn't a release yet. Has anyone verified if this is working on current Rustc and CUDA yet?
  - The Cudarc library (https://github.com/coreylowman/cudarc) is actively maintained, and works well. It does not, however, let you share host and device data structures; you will [de]serialize as a byte stream, using functions the lib provides. Works on any (within past few years at least) CUDA version and GPU.
I highlight this as a trend I see in software libs, in Rust more than others: The projects that are promoted the most are often not the most practical or well-managed ones. It's not clear from the description, but maybe rust-CUDA intends to allow shared data structures between host and device? That would be nice.
replies(5): >>43656540 #>>43656624 #>>43656639 #>>43658890 #>>43659897 #
3. sksxihve ◴[] No.43656540[source]
I think that's true in most newer languages, there's always a rush of libraries once a language starts to get popular, for example Go has lots http client libraries even though it also has an http library in the standard library.

relevant xkcd, https://xkcd.com/927/

replies(1): >>43656696 #
4. hobofan ◴[] No.43656624[source]
Damn. I transfered ownership over the cudnn and cudnn-sys crates (they are by now almost 10 year old crates that I'm certain nobody ever managed to use them for anything useful) to the maintainers a few years back as it looked to be on a good trajectory, but it seems like they never managed to actually release the crates. Hope that the reboot pulls through!
5. gbin ◴[] No.43656639[source]
We observed the same thing here at Copper Robotics where we absolutely need to have good Cuda bindings for our customers and in general the lack thereof has been holding back Rust in robotics for years. Finally with cudarc we have some hope for a stable project that keeps up with the ecosystem. The last interesting question at that point is why Nvidia is not investing in the rust ecosystem?
replies(2): >>43657179 #>>43658584 #
6. pests ◴[] No.43656696{3}[source]
I think this also was in small part due to them (Rob Pike perhaps? Or Brad) live-streaming them creating an http server back in the early days and it was good tutorial fodder.
7. shmerl ◴[] No.43656833[source]
Looks like a dead end. Why CUDA? There should be some way to use Rust for GPU programming in general fashion, without being tied to Nvidia.
replies(5): >>43656967 #>>43657008 #>>43657034 #>>43658709 #>>43659892 #
8. kouteiheika ◴[] No.43656967[source]
There's no cross-vendor API which exposes the full power of the hardware. For example, you can use Vulkan to do compute on the GPU, but it doesn't expose all of the features that CUDA exposes, and you need to do the legwork yourself reimplementing all of the well optimized libraries (like e.g. cublas or cudnn) that you get for free with CUDA.
replies(1): >>43658043 #
9. the__alchemist ◴[] No.43657008[source]
CUDA is the easiest-to-use and most popular GPGPU framework. I agree that it's unfortunate there aren't good alternatives! As kouteiheika pointed out, you can use Vulkan (Or OpenCL), but they are not as pleasant.
replies(1): >>43658049 #
10. pjmlp ◴[] No.43657023[source]
Unless NVIDIA actually embraces this, it will never be better than the C++, alone given the whole IDE integration, graphical debugging and libraries ecosystem.

Unless one is prepared to do lots of yak shaving, and who knows, then NVIDIA will actually pay attention, like it has happened with CUDA support for other ecosystems.

replies(1): >>43663898 #
11. pjmlp ◴[] No.43657034[source]
Because others so far have failed to deliver anything worthwhile using, with the same tooling ecosystem as CUDA.
replies(3): >>43657851 #>>43658002 #>>43658007 #
12. adityamwagh ◴[] No.43657179{3}[source]
I was talking to one person from the CUDA Core Compute Libraries team. They hinted that in the next 5 years, NVIDIA could support Rust as a language to program CUDA GPUs.

I also read a comment on a post on r/Rust that Rust’s safe nature makes it hard to use it to program GPUs. Don’t know the specifics.

Let’s see how it happens!

13. nuc1e0n ◴[] No.43657666[source]
Shouldn't it be called RUDA?
14. coffeeaddict1 ◴[] No.43657851{3}[source]
While I agree, that CUDA is the best in class API for GPU programming, OpenCL, Vulkan compute shaders and Sycl are alternatives that are usable. I'm for example, using compute shaders for writing GPGPU algorithms that work on Mac, AMD, Intel and Nvidia. It works ok. The debugging experience and ecosystem sucks compared to CUDA, but being able to run the algorithms across platforms is a huge advantage over CUDA.
replies(3): >>43658021 #>>43658035 #>>43658602 #
15. ◴[] No.43658002{3}[source]
16. shmerl ◴[] No.43658007{3}[source]
To deliver, you need to make Rust target the GPU in a general way, like some IR, and then may be compile that into GPU machine code for each GPU architecture specifically.

So this project is a dead end, because it's them who are these "others" - they are developing it and they are doing it wrong.

replies(1): >>43658699 #
17. keldaris ◴[] No.43658021{4}[source]
How are you writing compute shaders that work on all platforms, including Mac? Are you just writing Vulkan and relying on MoltenVK?

AFAIK, the only solution that actually works on all major platforms without additional compatibility layers today is OpenCL 1.2 - which also happens to be officially deprecated on MacOS, but still works for now.

replies(2): >>43658633 #>>43658666 #
18. fragmede ◴[] No.43658035{4}[source]
why do you need to run across all those platforms? what's the cost benefit for doing so?
replies(1): >>43658724 #
19. shmerl ◴[] No.43658043{3}[source]
Make a compiler that takes Rust and compiles into some IR, then another compiler that compiles that IR into GPU machine code. Then it can work and that's going to be your API (what you developed in Rust).

That's the whole point of what's missing. Not some wrapper around CUDA.

20. shmerl ◴[] No.43658049{3}[source]
It defeats the purpose. Easy to use should be something in Rust, not CUDA.
replies(1): >>43660550 #
21. jjallen ◴[] No.43658127[source]
I’ve been using the cudarc crate professionally for a while to write and call cuda from rust. Can highly recommend. You don’t have to use super old rustc versions. Although I haven’t looked exactly what you do need to use recently.
replies(1): >>43658643 #
22. pjmlp ◴[] No.43658584{3}[source]
They kind of are, but not in CUDA directly.

https://github.com/ai-dynamo/dynamo

> NVIDIA Dynamo is a high-throughput low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments.

> Built in Rust for performance and in Python for extensibility,

Says right there where they see Rust currently.

23. pjmlp ◴[] No.43658602{4}[source]
No they aren't, because they lack the polyglot support from CUDA and as you acknowledge the debugging experience and ecosystem sucks.
24. pjmlp ◴[] No.43658633{5}[source]
And is stuck with C99, versus C++20, Fortran, Julia, Haskell, C#, anything else someone feels like targeting PTX with.
replies(1): >>43658760 #
25. the__alchemist ◴[] No.43658643[source]
Works on any recent rust and Cuda version. The maintainer historically adds support for new GPU series and Cuda versions fast.
replies(1): >>43660695 #
26. coffeeaddict1 ◴[] No.43658666{5}[source]
Yes, MoltenVK works fine. Alternatively, you can also use WebGPU (there are C++ and Rust native libs) which is a simpler but more limiting API.
replies(1): >>43658775 #
27. pjmlp ◴[] No.43658699{4}[source]
Plus IDE support, Nsight level debugging, GPU libraries, yes most likely bound to fail unless NVidia, like it happened with other languages sees enough business value to give an helping hand.

They are already using Rust in Dynamo, even though the public API is Python.

28. ◴[] No.43658709[source]
29. coffeeaddict1 ◴[] No.43658724{5}[source]
Well it really depends on the kind of work you're doing. My (non-AI) software allows users to run my algorithms on whatever server-side GPU or local device they have. This is a big advantage IMO.
replies(1): >>43659681 #
30. keldaris ◴[] No.43658760{6}[source]
Technically, OpenCL can also include inline PTX assembly in kernels (unlike any compute shader API I've ever seen), which is relevant for targeting things like tensor cores. You're absolutely right about the language limitation, though.
replies(1): >>43662463 #
31. keldaris ◴[] No.43658775{6}[source]
WebGPU has no support for tensor cores (or their Apple Silicon equivalents). Vulkan has an Nvidia extension for it, is there any way to make MoltenVK use simdgroup_matrix instructions in compute shaders?
replies(1): >>43658912 #
32. efnx ◴[] No.43658890[source]
I’m a rust-GPU maintainer and can say that shared types on host and GPU are definitely intended. We’ve mostly been focused on graphics, but are shifting efforts to more general compute. There’s a lot of work though, and we all have day jobs - we’re looking for help. If you’re interested in helping you should say so at our GitHub.
replies(1): >>43659137 #
33. coffeeaddict1 ◴[] No.43658912{7}[source]
AFAIK, MoltenVK doesn't. Dawn (Google's C++ WebGPU implementation) does have some experimental support for it [0][1].

[0] https://issues.chromium.org/issues/348702031

[1] https://github.com/gpuweb/gpuweb/issues/4195

34. the__alchemist ◴[] No.43659137{3}[source]
What is the intended distinguisher between this and WGPU for graphics? I didn't realize that was a goal; have seen it mostly discussed in context of CUDA. There doesn't have to be, but I'm curious, as the CUDA/GPGPU side of the ecosystem is less developed, while catching up to WGPU may be a tall order. From a skim of its main page, it seems like it may also focus on writing shaders in rust.

Tangent; What is the intended distinguishes between Rust-CUDA, and Cudarc? Rust shaders with shared data structures I'm guessing is the big one. That would be great! There of course doesn't have to be. More tools to choose from, and that encourages progress from each other.

replies(1): >>43660060 #
35. fragmede ◴[] No.43659681{6}[source]
interesting! Can you say more about what kind of algorithms your software runs?
replies(1): >>43662446 #
36. LegNeato ◴[] No.43659892[source]
You can look to https://github.com/Rust-GPU/rust-gpu/ for vulkan.
37. LegNeato ◴[] No.43659897[source]
Maintainer here. It works on recent rust and latest CUDA. See https://rust-gpu.github.io/blog/2025/03/18/rust-cuda-update
38. LegNeato ◴[] No.43660060{4}[source]
wgpu is CPU side, rust-gpu is GPU side. The projects work together (our latest post uses wgpu and we fixed bugs in it: https://rust-gpu.github.io/blog/2025/04/10/shadertoys )
39. adastra22 ◴[] No.43660550{4}[source]
The purpose is to get shit done.
40. jjallen ◴[] No.43660695{3}[source]
Exactly. I thought it did, just didn't want to claim too much about it. Been a couple of months since I looked at it. I wish things would coalesce around this one.
41. coffeeaddict1 ◴[] No.43662446{7}[source]
My work is primarily about the processing of medical images (which usually are large 3D images). Doing this on the GPU, can be up to 10-20x faster.
replies(1): >>43667306 #
42. pjmlp ◴[] No.43662463{7}[source]
At which point why bother, PTX is CUDA.
replies(1): >>43666392 #
43. Asraelite ◴[] No.43663898{3}[source]
Yeah, I would prefer to see this effort being done for AMD GPUs. They're not as powerful, but at least everything is open source and developer friendly. They should be rewarded for that.
44. keldaris ◴[] No.43666392{8}[source]
Generally, the reason to bother with this approach is if you have a project that only needs tensor cores in a tiny part of the code and otherwise benefits from the cross platform nature of OpenCL, so you have a mostly shared codebase with a small vendor-specific optimization in a kernel or two. I've been in that situation and do find that approach valuable, but I'll be the first to admit the modern GPGPU landscape is full of unpleasant compromises whichever way you look.
45. ein0p ◴[] No.43667295[source]
For this to take off it has to be supported by Nvidia, at feature parity with C++. Otherwise it'll remain a niche tool, unfortunately. I do hope Nvidia either supports this or rolls out their own alternative. I've been looking to get off the C++ train for years, but stuff like this keeps me there.
46. fragmede ◴[] No.43667306{8}[source]
But what about that wants to be multi-platform instead of picking one an specializing, probably picking up some more optimizations along the way?