Nvidia Warp: A Python framework for high performance GPU simulation and graphics

1. w-m ◴[14 Jun 24 14:58 UTC] No.40681522[source]▶

I was playing around with taichi a little bit for a project. Taichi lives in a similar space, but has more than an NVIDIA backend. But its development has stalled, so I’m considering switching to warp now.

It’s quite frustrating that there’s seemingly no long-lived framework that allows me to write simple numba-like kernels and try them out in NVIDIA GPUs and Apple GPUs. Even with taichi, the Metal backend was definitely B-tier or lower: Not offering 64 bit ints, and randomly crashing/not compiling stuff.

Here’s hoping that we’ll solve the GPU programming space in the next couple years, but after ~15 years or so of waiting, I’m no longer holding my breath.

https://github.com/taichi-dev/taichi

replies(6): >>40681650 #>>40681817 #>>40681931 #>>40682203 #>>40688118 #>>40689271 #

2. panagathon ◴[14 Jun 24 15:14 UTC] No.40681650[source]▶

>>40681522 (TP) #

This is the library I've always wanted. Look at that Julia set. Gorgeous. Thanks for this. I'm sorry to hear about the dev issues. I wish I could help.

3. paulmd ◴[14 Jun 24 15:35 UTC] No.40681817[source]▶

>>40681522 (TP) #

the problem with the GPGPU space is that everything except CUDA is so fractally broken that everything eventually converges to the NVIDIA stuff that actually works.

yes, the heterogeneous compute frameworks are largely broken, except for OneAPI, which does work, but only on CUDA. SPIR-V, works best on CUDA. OpenCL: works best on CUDA.

Even once you get past the topline "does it even attempt to support that", you'll find that AMD's runtimes are broken too. Their OpenCL runtime is buggy and has a bunch of paper features which don't work, and a bunch of AMD-specific behavior and bugs that aren't spec-compliant. So basically you have to have an AMD-specific codepath anyway to handle the bugs. Same for SPIR-V: the biggest thing they have working against them is that AMD's Vulkan Compute support is incomplete and buggy too.

https://devtalk.blender.org/t/was-gpu-support-just-outright-...

https://render.otoy.com/forum/viewtopic.php?f=7&t=75411 ("As of right now, the Vulkan drivers on AMD and Intel are not mature enough to compile (much less ship) Octane for Vulkan")

If you are going to all that effort anyway, why are you (a) targeting AMD at all, and (b) why don't you just use CUDA in the first place? So everyone writes more CUDA and nothing gets done. Cue some new whippersnapper who thinks they're gonna cure all AMD's software problems in a month, they bash into the brick wall, write blog post, becomes angry forums commenter, rinse and repeat.

And now you have another abandoned cross-platform project that basically only ever supported NVIDIA anyway.

Intel, bless their heart, is actually trying and their stuff largely does just work, supposedly, although I'm trying to get their linux runtime up and running on a Serpent Canyon NUC with A770m and am having a hell of a time. But supposedly it does work especially on windows (and I may just have to knuckle under and use windows, or put a pcie card in a server pc). But they just don't have the marketshare to make it stick.

AMD is stuck in this perpetual cycle of expecting anyone else but themselves to write the software, and then not even providing enough infrastructure to get people to the starting line, and then surprised-pikachu nothing works, and surprise-pikachu they never get any adoption. Why has nvidia done this!?!? /s

The other big exception is Metal, which both works and has an actual userbase. The reason they have Metal support for cycles and octane is because they contribute the code, that's really what needs to happen (and I think what Intel is doing - there's just a lot of work to come from zero). But of course Metal is apple-only, so really ideally you would have a layer that goes over the top...

replies(1): >>40686837 #

4. szvsw ◴[14 Jun 24 15:51 UTC] No.40681931[source]▶

>>40681522 (TP) #

I’ve been in love with Taichi for about a year now. Where’s the news source on development being stalled? It seemed like things were moving along at pace last summer and fall at least if I recall correctly.

replies(2): >>40681980 #>>40681991 #

5. w-m ◴[14 Jun 24 15:58 UTC] No.40681980[source]▶

>>40681931 #

https://github.com/taichi-dev/taichi/discussions/8506

replies(2): >>40682128 #>>40687309 #

6. contravariant ◴[14 Jun 24 16:00 UTC] No.40681991[source]▶

>>40681931 #

There's only been 7 commits to master in the last 6 month, half of those purely changes to test or documentation, so it kind of sounds like you're both right.

7. szvsw ◴[14 Jun 24 16:17 UTC] No.40682128{3}[source]▶

>>40681980 #

Ha, interesting timing, last post 6Hr ago. Sounds like they are dogfooding it at least which is good. And I would agree with the assessment that 1.x is fairly feature complete, at least from my experience using it (scientific computing). And good to hear that they are planning on pushing support patches for eg python 3.12 cuda 12 etc

8. talldayo ◴[14 Jun 24 16:25 UTC] No.40682203[source]▶

>>40681522 (TP) #

> Here’s hoping that we’ll solve the GPU programming space in the next couple years, but after ~15 years or so of waiting, I’m no longer holding my breath.

It feels like the ball is entirely in Apple's court. Well-designed and Open Source GPGPU libraries exist, even ones that Apple has supported in the past. Nvidia supports many of them, either through CUDA or as a native driver.

9. sorenjan ◴[15 Jun 24 01:43 UTC] No.40686837[source]▶

>>40681817 #

> yes, the heterogeneous compute frameworks are largely broken, except for OneAPI, which does work, but only on CUDA.

> Intel, bless their heart, is actually trying and their stuff largely does just work, supposedly, .... But they just don't have the marketshare to make it stick.

Isn't OneAPI basically SYCL? And there are different SYCL runtimes that run on Intel, Cuda, and Rocm? So what's lost if you use oneAPI instead of Cuda, and run it on Nvidia GPUs? In an ideal world that should work about the same as Cuda, but can also be run on other hardware in the future, but since the world rarely is ideal I would welcome any insight into this. Is writing oneAPI code mainly for use on Nvidia a bad idea? Or how about AdaptiveCpp (previously hipSYCL/Open SYCL)?

I've been meaning to try some GPGPU programming again (I did some OpenCL 10+ years ago), but the landscape is pretty confusing and I agree that it's very tempting to just pick Cuda and get started with something that works, but if at all possible I would prefer something that is open and not locked to one hardware vendor.

replies(1): >>40688359 #

10. bsavery ◴[15 Jun 24 03:33 UTC] No.40687309{3}[source]▶

>>40681980 #

Yeah this discussion is pretty interesting. I was wondering what what happening with development as well. Taichi is cool tech, that if I had to be honest, it seems like they lacked direction of how to monetize it. For example they tried doing this "Taitopia thing" https://taitopia.design/ (which is already EOL'ed).

IMO if they had focused from the beginning on ML similar to Mojo, they would be in a better place.

replies(1): >>40690992 #

11. zelphirkalt ◴[15 Jun 24 07:35 UTC] No.40688118[source]▶

>>40681522 (TP) #

But when it is made by Nvidia, that means it will not work on AMD GPUs. I do not consider this kind of thing a solution, but rather a vendor lock in.

12. pjmlp ◴[15 Jun 24 08:35 UTC] No.40688359{3}[source]▶

>>40686837 #

OneAPI started as Data Parallel C++, and is SYSCL with special Intel sauce on top.

13. ◴[15 Jun 24 12:06 UTC] No.40689271[source]▶

>>40681522 (TP) #

14. szvsw ◴[15 Jun 24 17:04 UTC] No.40690992{4}[source]▶

>>40687309 #

It developed out of a dissertation at MIT (honestly a pretty damn impressive one IMO) and it seems like without some sort of significant support from universities, foundations or corporations, it would be pretty difficult to “monetize” - these in turn require some sort of substantial industry adoption or dogfooding it in some other job/contracts, which is tough I assume.