CUDA is not an API, it is a set of libraries written by NVIDIA. You'd have to reimplement those libraries, and for people to care at all you'd have to reimplement the optimizations in those libraries. That does get into various IP issues.

replies(3): >>44568506 #>>44568575 #>>44570953 #

4. Imustaskforhelp ◴[15 Jul 25 06:43 UTC] No.44568506[source]▶

>>44568387 #

Even if its not as optimized, it would still be nice to see a CUDA alternative really

Also I do wonder what the difference b/w a API and a set of libraries are, couldn't an API be exposed from that set of libraries which could be used? Its a little confusing I guess

replies(1): >>44568626 #

5. pjmlp ◴[15 Jul 25 06:51 UTC] No.44568575[source]▶

>>44568387 #

CUDA is neither an API, nor a set of libraries, people get this wrong all the time.

CUDA is an ecosystem of programming languages, libraries and developer tools.

Composed by compilers for C, C++, Fortran, Python JIT DSLs, provided by NVidia, plus several others with either PTX or NVVM IR.

The libraries, which you correctly point out.

And then the IDE integrations, the GPU debugger that is on par with Visual Studio like debugging, profiler,...

Hence why everyone that focus on copying only CUDA C, or CUDA C++, without everything else that makes CUDA relevant keeps failing.

replies(1): >>44572506 #

6. adastra22 ◴[15 Jul 25 06:57 UTC] No.44568626{3}[source]▶

>>44568506 #

> couldn't an API be exposed from that set of libraries which could be used

And now you've entered that copyright violation territory.

replies(1): >>44568978 #

7. hnfong ◴[15 Jul 25 07:27 UTC] No.44568806[source]▶

>>44568364 #

Indeed.

Unfortunately when that case went to the Supreme Court they basically just said "yeah for this case it's fair use, but we're not going to comment on whether APIs in general are copyrightable"...

8. Someone ◴[15 Jul 25 07:59 UTC] No.44568978{4}[source]▶

>>44568626 #

IP infringement, not copyright violation.

A clean room reimplementation of cuda would avoid any copyright claims, but would not necessary avoid patents infringement.

https://en.wikipedia.org/wiki/Clean-room_design:

“Clean-room design is useful as a defense against copyright infringement because it relies on independent creation. However, because independent invention is not a defense against patents, clean-room designs typically cannot be used to circumvent patent restrictions.”

replies(2): >>44573410 #>>44578961 #

9. vFunct ◴[15 Jul 25 13:30 UTC] No.44570953[source]▶

>>44568387 #

So if people aren't aware, you can have AI reimplement CUDA libraries for any hardware, as well as develop new ones.

You wouldn't believe me if you didn't try it and see for yourself, so try it.

NVidia's CUDA moat is no more.

replies(1): >>44578964 #

10. CamperBob2 ◴[15 Jul 25 15:57 UTC] No.44572506{3}[source]▶

>>44568575 #

Only the runtime components matter, though. Nobody cares about the dev tools beyond the core compiler. What people want is to be able to recompile and run on competitive hardware, and I don't understand why that's such an intractable problem.

replies(4): >>44573430 #>>44573665 #>>44573734 #>>44579484 #

11. dragonwriter ◴[15 Jul 25 17:05 UTC] No.44573410{5}[source]▶

>>44568978 #

> A clean room reimplementation of cuda would avoid any copyright claims,

Assuming APIs are either not copyirghtable or that API reimplementation is always fair use of the API, neither of which there is sufficient precedent to justify as a conclusion; Oracle v. Google ended with “well, it would be fair use in the exact factual circumstances in this case so we don't have to reach the thornier general questions”.

12. outworlder ◴[15 Jul 25 17:08 UTC] No.44573430{4}[source]▶

>>44572506 #

It is not.

However, companies may still be hoping to get their own solutions in place instead of CUDA. If they do implement CUDA, that cements its position forever. That ship has probably already sailed, of course.

13. StillBored ◴[15 Jul 25 17:28 UTC] No.44573665{4}[source]▶

>>44572506 #

Because literally the entire rest of the ecosystem is immature demoware. Rather than each vendor buying into opencl+SPIRV and building a robust stack around it, they are all doing their own half baked tech demos hoping to lock up some portion of the market to duplicate nvidia's success, or at least carve out a niche. While nvidia continues to extend and mature their ecosystem. Intel has oneAPI, AMD has ROCM, Arm has ACL/Kleidi/etc, and a pile of other stacks like MLX, Windows ML, whatever. Combined with a confusing mix of pure software plays like pytorch and windows ML.

A lot of people talk about 'tooling' quality and no one hears them. I just spent a couple weeks porting a fairly small library to some fairly common personal hardware and hit all the same problems you see everywhere. Bugs aren't handled gracefully. Instead of returning "you messed up here", the hardware locks up, and power cycling is the only solution. Not a problem when your writing hello world, but trolling through tens of thousands of lines of GPU kernel code to find the error is going to burn engineer time without anything to show for it. Then when its running, spending weeks in an open feedback loop trying to figure out why the GPU utilization metrics are reporting 50% utilization (if your lucky enough to even have them) and the kernel is running at 1/4 the expected performance is again going to burn weeks. All because there isn't a functional profiler.

And the vendors can't even get this stuff working. People rant about the ROCm support list not supporting, well the hardware people actually have. And it is such a mess, that in some cases it actually works but AMD says it doesn't. And of course, the only reason you hear people complaining about AMD is because they are literally the only company that has a hardware ecosystem that in theory spans the same breadth of devices from small embedded systems to giant data center grade products that NVIDIA does. Everyone else wants a slice of the market, but take apple here, they have nothing in the embedded/edge space that isn't a fixed function device (ex a watch, or apple TV), and their GPU's while interesting are nowhere near the level of the datacenter grade stuff, much less even top of the line AIC boards for gamers.

And its all gotten to be such an industry wide pile of trash that people can't even keep track of basic feature capabilities. Like, a huge pile of hardware actually 'supports' openCL, but its buried to the point where actual engineers working on say ROCm are unaware its actually part of the ROCm stack (imagine my surprise!). And its been the same for nvidia, they have at times supported openCL, but the support is like a .dll they install with the GPU driver stack and don't even bother to document that its there. Or tensorflow that seems to have succumbed to the immense gravitational black hole it had become, where just building it on something that wasn't the blessed platform could take days.

14. int_19h ◴[15 Jul 25 17:34 UTC] No.44573734{4}[source]▶

>>44572506 #

It's the same essential problem as with e.g. Wine - if you're trying to reimplement someone else's constantly evolving API with a closed-source implementation, it takes a lot of effort just to barely keep up.

As far as portability, people who care about that already have the option of using higher-level APIs that have CUDA backend among several others. The main reason why you'd want to do CUDA directly is to squeeze that last bit of performance out of the hardware, but that is also precisely the area where deviation in small details starts to matter a lot.

15. adastra22 ◴[16 Jul 25 05:25 UTC] No.44578961{5}[source]▶

>>44568978 #

I didn't read the GP post as talking about clean-room reimplementation, but rather just serving the same NVIDIA-written libraries on top of AMD hardware.

16. adastra22 ◴[16 Jul 25 05:26 UTC] No.44578964{3}[source]▶

>>44570953 #

If it is so easy, please go do so.

17. pjmlp ◴[16 Jul 25 06:59 UTC] No.44579484{4}[source]▶

>>44572506 #

CUDAs adoption prove otherwise.

↑