so, same mistake intel made before.
so, same mistake intel made before.
If the now very clearly well functioning implementation continues to perform as well as it is, the community may be able to keep it funded and functioning.
And the other side of this is that with renewed AMD interest/support for the rocm/HIP project, it might be just good enough as a stopgap step to push projects towards rocm/HIP adoption. (included below is another blurb from the readme).
> I am a developer writing CUDA code, does this project help me port my code to ROCm/HIP?
> Currently no, this project is strictly for end users. However this project could be used for a much more gradual porting from CUDA to HIP than anything else. You could start with an unmodified application running on ZLUDA, then have ZLUDA expose the underlying HIP objects (streams, modules, etc.), allowing to rewrite GPU kernels one at a time. Or you could have a mixed CUDA-HIP application where only the most performance sensitive GPU kernels are written in the native AMD language.
Who was responsible at AMD for this project and why is he still not fired???????? How brain dead someone have to be to reject the major market share??????
If AMD could get 90% of the CUDA ML stuff to seamlessly run on AMD hardware, and could provide hardware at a competitive cost-per-performance (which I assume they probably could since NVIDIA must have an insane profit margin on their GPUs), wouldn't that be the opportunity to eat NVIDIA's lunch?
Either they are very stupid, or open sourcing the library stops NVidia from suing them in a repeat of the Oracle/Google lawsuit over Java APIs?
I'm not sure what the reason is?
Ryzen was a surprise to everyone not because it was good, but because they didn't fuck it up within two generations.
AMD cards have more raw compute than nvidia, they are better than nvidia, yet the software is so bad that I gave up on using it and switched to nvidia. Two weeks of debugging driver errors vs 30 minutes of automated updates.
At least Nvidia, which I fucking hate, will happily hold out their hand for cash even from individuals.
So now we’re in a hilarious situation where people from hobbyists to enterprise devs are hoping for intel to save the day.
Time will tell if that strategy is going to pan out. Ceding the ML "training" market entirely to Nvidia is certainly a bold move
A better level to target compatibility would be at the framework level such as PyTorch, where the building blocks of neural networks (convolution, multi-head attention, etc, etc) are high level and abstract enough to allow flexibility in mapping them onto AMD hardware without compromising performance.
However, these frameworks are forever changing and playing continual catch-up there still wouldn't be a great place to be, especially without a large staff dedicated to the effort (writing hand-optimized kernels), which AMD don't seem to be able/willing to muster.
So, finally, perhaps the strategically best place for AMD to invest would be in compilers and software tools to allow kernels to be written in a high level language. Becoming a first class Mojo target wouldn't be a bad place to start, assuming they are not already in partnership.
AMD cannot keep up with arbitrarily changing hardware and software while trying to please developers that want what was just released. They would always be a generation behind at tremendous expense.
The situation in reality is quite actually quite bad.
Given that I have a M2 Max and no nVidia cards, I've tried enough PyTorch-based ML libraries that at some point, I basically expect them to flat out show an error saying CUDA 10.x+ is required once the dependencies are installed (eg. one of them being the bitsandbytes library -- in fairness, there's apparently some effort trying to port the code to other platforms as well).
As of today, the whole field is moving too fast that it's simply not worth it for a solo dev or even a small team to even attempt getting a non-CUDA stack up and running, especially with the other major GPU vendors not (able to?) hiring people to port the hand-optimized CUDA kernels.
Hopefully the situation will change after these couple years of frenzy, but in the time being I don't see any viable way to avoid using a CUDA stack if one is serious with getting ML stuff done.