> Unit (thread in CUDA, invocation in Vulkan/Wgpu): the smallest execution entity performing computations.
> Plane (warp in CUDA, subgroup in Vulkan/Wgpu): a group of (typically 32) units executing in lockstep and able to share data efficiently through registers.
> Cube (thread block in CUDA, workgroup in Vulkan/Wgpu): a group of units that execute on the same SM, sharing memory and able to synchronize
It's already bad enough that the vendors themselves insisted on different names but why in the bejesus would you rename these concepts and diverge from literally all existing naming conventions when you're providing middleware. Ie when using your tool I'm still going to reference NVIDIA's or AMD's docs to understand how the hardware actually works. Like do you really think otherwise - that your thing is gonna be end of the line???
FYI the word warp isn't random techno babble but is actually a very clever pun that actually fits very well conceptually:
There you go you've hit basically two of 3 completely (AMD and Vulkan) and are close enough to CUDA that people would get it.
I have no idea what a plane connotes and a cube literally gives a distinct enough picture from block that I will be continuously reminding myself of the mapping.
What you did was pointless - you assigned new words to objects that you don't own and now your conceptual framework is askew from the actual underlying (true) conceptual framework.
> CubeCL to CPU
There is zero affinity between GPU programing models and multicore CPU programing models. If you don't believe me go ask the OpenMP people how they're doing supporting GPUs.
Congrats - I have no idea what this means lol.
If you have a measure of correctness, and a measure of performance. Is there a maximum value of correctness per some unit of processing that exists below a full matrix multiply
Obviously it can be done with precision, since that is what floating point is. But is there anything where you can save x% of computation and have fewer than x% incorrect values in a matrix multiplications?
Gradient descent wouldn't really care about a few (Reliably) dud values.
Just commenting to share, personally I have no naming preference but the hierarchal abstractions in general are incredibly useful.
It does come with some mental overhead, but let’s be honest, there’s no objectively “good” choice here without introducing bias toward a specific vendor API.
Learning the core concepts takes effort, but if CubeCL is useful for your work, it’s definitely worth it.
It is... it's in GPUs lol
> first class in torch
It is
> costing a fraction of GPUs
Why would anyone give you this for cheaper than GPUs lol?