This is absolutely crazy.
My only guess is they have a parallel skunkworks working on the same thing, but in a way that they can keep it closed-source - that this was a hedge they think they no longer need, and they are missing the forest for the trees on the benefits of cross-pollination and open source ethos to their business.
This approach actually would make sense if AMD felt, like most of us perhaps, that the NVIDIA ecosystem is too entrenched, but perhaps they made the decision recently to discontinue funding because they (now?) feel otherwise.
but then stopped
People are criticizing AMD for dropping this, but it makes sense to stop paying for development when the dev has stopped doing the work, no?
And if he means that AMD stopped paying 3 years ago - well, that was before dinosaurs and ChatGPT, and alot has changed since then.
"Radeon Open Compute Platform"
https://github.com/ROCm/ROCm/issues/1628
And they wonder why they are losing. Branding absolutely matters.
Do you expect them to be able to capitalize on the AI fad so much (and quickly enough!) that it's worth dropping the ball on projects they're now doing well in? Or perhaps continue investing into the part of the market where they're doing much better than nVidia?
oof
What happens here is that the original vendor loses control of the API once there are multiple implementations. That's the best possible outcome for AMD.
In either case, they have a limited window to be adopted, and that's more important. The abstraction layer here helps too. AMD code is !@#$%. If this were adopted, it makes it easier to fix things underneath. All that is a lot more important than a dream of disrupting CUDA.
> ROCm is a brand name for ROCm™ open software platform (for software) or the ROCm™ open platform ecosystem (includes hardware like FPGAs or other CPU architectures).
> Note, ROCm no longer functions as an acronym.
But management at AMD should be above petty team politics and fund both because at the company level they do not care which solution wins in the end.
They very much plan to compete in this space, and hope to ship $3.5B of these chips in the next year. Small compared to Nvidia's revenues of $59B (includes both consumer and data centre), but AMD hopes to match them. It's too big a market to ignore, and they have the hardware chops to match Nvidia. What they lack is software, and it's unclear if they'll ever figure that out.
762 changed files with 252,017 additions and 39,027 deletions.
https://github.com/vosen/ZLUDA/commit/1b9ba2b2333746c5e2b05a...The margins on supercompute-related sales are very high. Simplifying, but you can basically take a consumer chip, unlock a few things, add more memory capacity, relicense, and your margin goes up by a huge factor.
So, again, it's not at all clear that AMD being in the compute GPU game is the automatic win for them in the future. There's plenty of companies that killed themselves trying to run after big profitable new fad markets (see: Nokia and Windows Phone, and many other cases).
So let's examine that - does AMD actually have a good shot of taking a significant chunk of market that will offset them not investing in some other market?
Meanwhile CUDA supports anything with Nvidia stamped on it before it's even released. They'll even go as far as doing things like adding support for new GPUs/compute families to older CUDA versions (see Hopper/Ada and CUDA 11.8).
You can go out and buy any Nvidia GPU the day of release, take it home, plug it in, and everything just works. This is what people expect.
AMD seems to have no clue that this level of usability is what it will take to actually compete with Nvidia and it's a real shame - their hardware is great.
the hardware is already good enough, people would be happy to use it and accept that's it's not quite as optimized for DL as Nvidia.
people would even accept that the software is not as optimized as CUDA, I think, as long as it is correct and reasonably fast.
the problem is just that every time i've tried it, it's been a pain in the ass to install and there are always weird bugs and crashes. I don't think it's hubris to say that they could fix these sorts of problems if they had the will.
Here it is: https://arstechnica.com/tech-policy/2021/04/how-the-supreme-...
It's a pure business decision based on simple math.
If the estimated revenues from selling to the underserved market are higher than the cost of funding the project (they probably are, considering the obscene margins from NVIDIA), then it's a no-brainer.
Oof x2
For years I want to get off the Nvidia train for AI, but I'm forced to buy another Nvidia card b/c AMD stuff just doesn't work, and all examples work with Nvidia cards as they should.
Because what else?
If so, then i think that this is crazy because software is harder to change than hardware
Now the only thing they need to do is make sure ROCm itself is stable.
AMD seems to be a firm believer in separating the consumer chips for gaming and the compute chips for everything else. This probably makes a lot of sense from a chip design and current business perspective, but I think it's shortsighted and a bad idea. GPUs are very competent compute devices, and basically wasting all that performance for "only" gaming is strange to me. AI and other compute is getting more and more important for things like image and video processing, language models, etc. Not only for regular consumers, but for enthusiasts and developers it makes a lot of sense to be able to use your 10 TFLOPS chip even when you're not gaming.
While reading through the AMD CDNA whitepaper I saw this and got a good chuckle. "culmination of years of effort by AMD" indeed.
> The computational resources offered by the AMD CDNA family are nothing short of astounding. However, the key to heterogeneous computing is a software stack and ecosystem that easily puts these abilities into the hands of software developers and customers. The AMD ROCm 4.0 software stack is the culmination of years of effort by AMD to provide an open, standards-based, low-friction ecosystem that enables productivity creating portable and efficient high-performance applications for both first- and third-party developers.
https://www.amd.com/content/dam/amd/en/documents/instinct-bu...
"Support" means that the card is actively tested and presumably has some sort of SLA-style push to fix bugs for. As their stack matures, a bunch of cards that don't have official support will work well [0]. I have an unsupported card. There are horrible bugs. But the evidence I've seen is that the card will work better with time even though it is never going to be officially supported. I don't think any of my hardware is officially supported by the manufacturer, but the kernel drivers still work fine.
> Meanwhile CUDA supports anything with Nvidia stamped on it before it's even released...
A lot of older Nvidia cards don't support CUDA v9 [1]. It isn't like everything supports everything, particularly in the early part of building out capability. The impression I'm getting is that in practice the gap in strategy here is not as large as the current state makes it seem.
[0] If anyone has bought an AMD card for their machine to multiply matrices they've been gambling on whether the capability is there. This comment is reasonable speculation, but I want to caveat the optimism by asserting that I'm not going to put money into AMD compute until there is some some actual evidence on the table that GPU lockups are rare.
https://github.com/vosen/ZLUDA/tree/v3?tab=readme-ov-file#fa...
As a total outsider it seems to me that maybe one of AMD's big problems is they just aren't set up to take advantage of the global talent pool in the same way Nvidia is.
When you're #1, you can go all-in on your own proprietary stack, knowing that network effects will drive your market share higher and higher for you for free.
When you're #2, you need to follow de-facto standards and work on creating and following truly open ones, and try to compete on actual value, rather than rent-seeking. AMD of all companies should know this.
Read an article about it recently, but when trying to remember the details / find it again just now I'm not seeing it. :(
Isn't translation one of the strengths of LLMs?
In rational world their stock price would collapse if they don’t focus on it and are unable to deliver anything competitive in the upcoming year or two
> of the market where they're doing much better than nVidia?
So the market that’s hardly growing, Nvidia is not competing in and Intel still has bigger market share and is catching up performance wise? AMD’s valuation is this highly only because they are seen as the only company that could directly compete with Nvidia in the data center GPU market.
Nvidia could always just half their prices one day, and wipe out every non-state-funded competitor. But Nvidia prefers to collect their extreme margins and funnel it into even more R&D in AI.
At the same time, open source projects can be pretty nimble in chasing things like changing APIs, potentially frustrating the effectiveness of API pivoting by NVIDIA in a second way.
> Those pointers point to undocumented functions forming CUDA Dark API. It's impossible to tell how many of them exist, but debugging experience suggests there are tens of function pointers across tens of tables. A typical application will use one or two most common. Due to they undocumented nature they are exclusively used by Runtime API and NVIDIA libraries (and in by CUDA applications in turn). We don't have names of those functions nor names or types of the arguments. This makes implementing them time-consuming. Dark API functions are are reverse-engineered and implemented by ZLUDA on case-by-case basis once we observe an application making use of it.
You’re right about that but it seems that it’s pretty clear that not being in the compute GPU game is an automatic loss for them (look at their recent revenue growth in the past quarter and two by in each sector)
I don't know about that. You could kinda argue the opposite. "We improved CUDA. Oh it stopped working for you on AMD hardware? Too bad. Buy Nvidia next time"
IBM and Microsoft made OS/2. The first version worked on 286s and was stable but useless.
The second version worked only on 386s and was quite good, and even had wonderful windows 3.x compatibility. "Better windows than windows!"
At that point Microsoft wanted out of the deal and they wanted to make their newer version of windows, NT, which they did.
IBM now had a competitor to "new" windows and a very compatible version of "old" windows. Microsoft killed OS2 by a variety of ways (including just letting IBM be IBM) but also by making it very difficult for last month's version of OS/2 to run next month's bunch of Windows programs.
To bring this back to the point -- IBM vs Microsoft is akin to AMD vs Nvidia -- where nvidia has the standard that AMD is implementing, and so no matter what if you play in the backward compatibility realm you're always going to be playing catch-up and likely always in a position where winning is exceedingly hard.
As WOPR once said "interesting game; the only way to win is to not play."
I'm curious about this. Sure some CUDA code has already been written. If something new comes along that provides better performance per dollar spent, why continue writing CUDA for new projects? I don't think the argument that "this is what we know how to write" works in this case. These aren't scripts you want someone to knock out quickly.
If the put their stuff as OpenSource, including firmware, I think they will win out eventually.
And its also not a guarantee that Nvidia will always produce the superior hardware for that code.
This brings back memories of late 90s / early 00s of Microsoft pushing hard their proprietary graphic libraries (DirectX) vs open standards (OpenGL).
Fast forward 25-years and even today, Microsoft still dominates in PC gaming as a result.
There's a bad track record of open standard for GPUs.
Even Apple themselves gave up on OpenGL and has their own proprietary offering (Metal).
If they want me as a customer, and they have not created a viable alternative to CUDA, they need to pursue this.
They won’t be able to do that, their hardware isn’t fast enough.
Nvidia is beating them at hardware performance, AND ALSO has an exclusive SDK (CUDA) that is used by almost all deep learning projects. If AMD can get their cards to run CUDA via ROCm, then they can begin to compete with Nvidia on price (though not performance). Then, and only then, if they can start actually producing cards with equivalent performance (also a big stretch) they can try for an Embrace Extend Extinguish play against CUDA.
Worked fine for MS with Excel supporting Lotus 123 and Word supporting WordPerfect's formats when those were dominant...
Windows NT wasn't really relevant in that competition for much longer, only XP was finally for end consumers.
> where nvidia has the standard that AMD is implementing, and so no matter what if you play in the backward compatibility realm you're always going to be playing catch-up
That's not true. If AMD starts adding their own features and have their own advantages, that can flip.
It only takes a single generation of hardware, or a single feature for things to flip.
Look at Linux and Unix. Its started out with Linux implementing Unix, and now the Unix are trying to add compatibility with with Linux.
Is SGI still the driving force behind OpenGL/Vulcan? Did you think it was a bad idea for other companies to use OpenGL?
AMD was successful against Intel with x86_64.
There are lots of example of the company making something popular, not being able to take full advantage of it in the long run.
Good enough CUDA + New feature x gives them leverage in the inevitable court battle(S) and patten sharing agreement that everyone wants to see.
AMD' already stuck its toe in the water: new CPU's with their AI cores built in. If you can get a AM5 socket to run with 196 gigs, that's a large (all be it slow) model you can run.
The system package for HIP on Debian has been stuck on ROCm 5.2 / clang-15 for a while, but once I get it updated to ROCm 5.7 / clang-17, I expect that all discrete RDNA 3 GPUs will work.
That is... not accurate in the slightest.
Oracle v Google was not about patentability. Software patentability is its own separate minefield, since anyone who looks at the general tenor of SCOTUS cases on the issue should be able to figure out that SCOTUS is at best highly skeptical of software patents, even if it hasn't made any direct ruling on the topic. (Mostly this is a matter of them being able to tell what they don't like but not what they do like, I think). But I've had a patent attorney straight-out tell me that in absence of better guidance, they're just pretending the most recent relevant ruling (which held that X-on-a-computer isn't patentable) doesn't exist. In any case, a patent on software APIs (as opposed to software as a whole) would very clearly fall under the "what are you on, this isn't patentable" category of patentability.
The case was about the copyrightability of software APIs. Except if you read the decision itself, SCOTUS doesn't actually answer the question [1]. Instead, it focuses on whether or not Google's use of the Java APIs were fair use. Fair use is a dangerous thing to rely on for legal precedent, since there's no "automatic" fair use category, but instead a balancing test ostensibly of four factors but practically of one factor: does it hurt the original copyright owner's profits [2].
There's an older decision which held that the "structure, sequence, and organization" of code is copyrightable independent of the larger work of software, which is generally interpreted as saying that software APIs are copyrightable. At the same time, however, it's widespread practice in the industry to assume that "clean room" development of an API doesn't violate any copyright. The SCOTUS decision in Google v Oracle was widely interpreted as endorsing this interpretation of the law.
[1] There's a sentence or two that suggests to me there was previously a section on copyrightability that had been ripped out of the opinion.
[2] See also the more recent Andy Warhol SCOTUS decision which, I kid you not, says that you have to analyze this to figure out whether or not a use is "transformative". Which kind of implicitly overturns Google v Oracle if you think about it, but is unlikely to in practice.
For CUDA, it is not just AMD who would need to catch up. Developers also are not necessarily going to target the latest feature set immediately, especially if it only benefits (or requires) new hardware.
I accept the final statement, but that also means AMD for compute is gonna be dead like OS/2. Their stack just will not reach critical mass.
It's annoying as hell to you and me that they are not catering to the market of people who want to run stuff on their gaming cards.
But it's not clear it's bad strategy to focus on executing in the high-end first. They have been very successful landing MI300s in the HPC space...
Edit: I just looked it up: 25% of the GPU Compute in the current Top500 Supercomputers is AMD
https://www.top500.org/statistics/list/
Even though the list has plenty of V100 and A100s which came out (much) earlier. Don't have the data at hand, but I wouldn't be surprised if AMD got more of the Top500 new installations than nVidia in the last two years.
https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguis...
I think this is essentially the same situation as Proton+DXVK for Linux gaming. I think that that is a net positive for Linux, but I'm less sure about this. Getting good performance out of GPU compute requires much more tuning to the concrete architecture, which I'm afraid devs just won't do for AMD GPUs through this layer, always leaving them behind their Nvidia counterparts.
However, AMD desperately needs to do something. Story time:
On the weekend I wanted to play around with Stable Diffusion. Why pay for cloud compute, when I have a powerful GPU at home, I thought. Said GPU is a 7900 XTX, i.e. the most powerful consumer card from AMD at this time. Only very few AMD GPUs are supported by ROCm at this time, but mine is, thankfully.
So, how hard could it possibly to get Stable Diffusion running on my GPU? Hard. I don't think my problems were actually caused by AMD: I had ROCm installed and my card recognized by rocminfo in a matter of minutes. But the whole ML world is so focused on Nvidia that it took me ages to get a working installation of pytorch and friends. The InvokeAI installer, for example, asks if you want to use CUDA or ROCm, but then always installs the CUDA variant whatever you answer. Ultimately, I did get a model to load, but the software crashed my graphical session before generating a single image.
The whole experience left me frustrated and wanting to buy an Nvidia GPU again...
People writing CUDA apps don't just want stuff to run, performance is an extremely important factor else they would target CPUs which are easier to program for.
From their readme: > On Server GPUs, ZLUDA can compile CUDA GPU code to run in one of two modes: > Fast mode, which is faster, but can make exotic (but correct) GPU code hang. > Slow mode, which should make GPU code more stable, but can prevent some applications from running on ZLUDA.
CUDA shouldn't exist. We should have hardware manufacturers working together, using common APIs and standardizing instead of going for the throat. The further platforms drift apart, the more valuable Nvidia's vertical integration becomes.
The hardware may be great, but their software ecosystem is utter crap. As long as they stay the unchallenged leader in hardware, I expect Nvidia will continue to produce crap software.
I would push to switch our products in a heartbeat, if AMD actually gets their act together. If this alternative offers a path to evaluate our current application software stack on an AMD devkit, I would buy one tomorrow.
Meanwhile Excel was gaining features and winning users with them even before Windows was in play.
> PyTorch received very little testing. ZLUDA's coverage of cuDNN APIs is very minimal (just enough to run ResNet-50) and realistically you won't get much running.
Can you say why you had to change the name?
By what measures hasn't that happened already? CUDA been around and constantly improving for more than 15 years, and there is no competitors in sight so far. It's basically the de facto standard in many ecosystems.
As I mentioned elsewhere, 25% of GPU compute on the Top 500 Supercomputer list is AMD. This all on the back of a card that came out only three years ago. We are very rapidly moving towards a situation where there are many, many high-performance developers that will target ROCm.
Rosetta 2 runs apps at 80-90% their native speed.
With that momentum, CUDA got incorporated into a lot of high-performance computing applications. Few alternatives show up because there aren't many acceleration frameworks that are as large or complete as CUDA. Nvidia pushed forward by scaling down to robotics and edge-compute scale hardware, and now are scaling up with their DGX/Grace platforms.
Today, Nvidia is prevalent because all attempts to subvert them have failed. Khronos Group tried to get the industry to rally around OpenCL as a widely-supported alternative, but too many stakeholders abandoned it before the initial crypto/AI booms kicked off the demand for GPGPU compute.
At which point why tie yourself to the competitor's language. Probably much more effective to just write a well optimized library that serves the MLIR/whatever is popular API in order to run big ML jobs.
CUDA currently has the better raw performance, better availability, and a long record indicating that the platform won't just disappear in a couple of years. You can use it on pretty much any NVIDIA GPU and it's properly supported. The same CUDA code that ran on a GTX680 can run on an RTX4090 with minimal changes if any (maybe even the same binary).
In comparison, AMD has a very spotty record with their compute technologies, stuff gets released and becomes effectively abandonware, or after just a few years support gets dropped regardless of the hardware's popularity. For several generations they basically led people on with promises of full support on consumer hardware that either never arrived or arrived when the next generation of cards were already available, and despite the general popularity of the rx580 and the popularity of the Radeon VII in compute applications, they dropped 'official' support. AMD treats its 'consumer' cards as third class citizens for compute support, but you aren't going to convince people to seriously look into your platform like that. Plus, it's a lot more appealing to have "GPU acceleration will allow us to take advantage of newer supercomputers, while also offering massive benefits to regular users" than just the former.
This was ultimately what removed AMD as a consideration for us when we were deciding on which to focus on for GPU acceleration in our application. Many of us already had access to an NVIDIA GPU of any sort, which would make development easier, while the entire facility had one ROCm capable AMD GPU at the time, specifically so they could occasionally check in on its status.
So while Intel had to bow to AMD's success and give up Itanium, they weren't then limited by that and could proceed to iterate on top of it.
Meanwhile it'll be a cold day in hell before Nvidia licenses anything about CUDA to AMD, much less allows AMD to iterate on top of it.
The right path for AMD has always been to make their own API that runs on all of their own hardware, just as CUDA does for Nvidia, and push support for that API into all the open source ML projects (but mostly PyTorch), while attacking Nvidia's price discrimination by providing features they use to segment the market (e.g. virtualization, high VRAM) at lower price points.
Perhaps one day AMD will realize this. It seems like they're slowly moving in the right direction now, and all it took for them to wake up was Nvidia's market cap skyrocketing to 4th in the world on the back of their AI efforts...
Well, then I guess CUDA is not really the problem, so being able to run CUDA on AMD hardware wouldn't solve anything.
> try for an Embrace Extend Extinguish play against CUDA
They wouldn't need to go that route. They just need a way to run existing CUDA code on AMD hardware. Once that happens, their customers have the option to save money by writing ROCm or whatever AMD is working on at that time.
Not at all, the performance hit was in the low 10s %, before natively supporting Apple Silicon most of the apps I use for music/video/photography didn't seem to have a performance impact at all, even more when the M1 machines were so much faster than the Intels.
ROCm has different bugs, which the application workarounds tend to miss.
Even if AMD lagged support on CUDA versioning, I think it would be widely accepted if the performance per dollar at certain price points was better.
Taking the whole market from NVIDIA is not really an option, it's better to attack certain price points and niches and then expand from there. The CUDA ship sailed a long time ago in my view.
So the contract is: as long as your future program does not touch any intrinsics etc that do not exist in CUDA 1.0, you can export the new program from CUDA 27.0 as PTX, and the GTX 6800 driver will read the PTX and let your gpu run it as CUDA 1.0 code… so it is quite literally just as they describe, unlimited forward and backward capability/support as long as you go through PTX in the middle.
https://docs.nvidia.com/cuda/archive/10.1/parallel-thread-ex...
so, same mistake intel made before.
A lot went wrong with os/2. For CUDA, I think a better analogy is vhs. The standard, in the effective not open sense, is what it is. AMD sucks at software and views it as an expense rather than an advantage.
However, that same logic doesn't apply to consumers, and since they continued to fail to learn that lesson now IBM doesn't even target the consumer market given that they never learned how to be competitive and could only ever effectively function when they had a monopoly or at least a vendor lock-in.
https://en.wikipedia.org/wiki/Acquisition_of_the_IBM_PC_busi...
Ahhhh, your hindsight is well developed. I would be interested to know the background on the reasons why Lotus made that bet. We can't know the counterfactual, but Lotus delivering on a platform owned by their deadly competitor Microsoft would seem to me to be a clearly worrysome idea to Lotus at the time. Turned out it was an existentially bad idea. Did Lotus fear Microsoft? "DOS ain't done till Lotus won't run" is a myth[1] for a reason. Edit: DRDOS errors[2] were one reason Lotus might fear Microsoft. We can just imagine a narritive of a different timeline where Lotus delivered on Windows but did some things differently to beat Excel. I agree, Lotus made other mistakes and Microsoft made some great decisions, but the point remains.
We can also suspect that AMD have a similar choice now where they are forked. Depending on Nvidea/CUDA may be a similar choice for AMD - fail if they do and fail if they don't.
[1] http://www.proudlyserving.com/archives/2005/08/dos_aint_done...
[2] https://www.theregister.com/1999/11/05/how_ms_played_the_inc...
> It apparently came down to an AMD business decision to discontinue the effort
Bad decision if that's the case. May be someone can pick it up, since it's open now.
Proton, Wine, and all of the compatibility fixes and drive improvements that the community has made in the last 16 years has been amazing, and every day is another day where you can say that it has never been easier to switch away from Windows.
However, Microsoft has definitely been drinking the IBM koolaid a little to long and has lost the mandate of heaven. I think in the next 7-10 years we will reach a point where there is nothing Windows can do that linux cannot do better and easier without spying on you, and we may be 3-5 years from a "killer app" that is specifically built to be incompatible with Windows just as a big FU to them, possibly in the VR world, possibly in AR, and once that happens maybe, maybe, maybe it will finally actually be the year of the linux desktop.
I guess awhile ago it was found that Nvidia was bypassing the kernels GPL license driver check and I read that kernel 6.6 was going to lock that driver out if they didn't fix it, and from what I've read there was no reply or anything done by nvidia yet. Which I think I probably just can't find.
Am I wrong about that part?
We're on kernel 6.7.4 now and I'm still using the same drivers. Did it get pushed back, did nvidia fix it?
Also, while trying to find answers myself I came across this 21 year old post which is pretty funny and very apt for the topic https://linux-kernel.vger.kernel.narkive.com/eVHsVP1e/why-is...
I'm seeing conflicting info all over the place so I'm not really sure what the status of this GPL nvidia driver block thing is.
As such Fermi seems to be the shortest supported architecture, and it was around for 7 years. GCN4 (Polaris) was introduced in 2016, and seems to have been officially dropped around 2021, just 5 years in. While you could still get it working with various workarounds, I don't see the evidence of Nvidia being even remotely as hasty as AMD with removing support, even for early architectures like Tesla and Fermi.
I guess that might answer my "Why would AMD find that having a CUDA competitor isn't a business case unless they couldn't do it or the cards underperformed significantly."
I tried to get it working this weekend but it was a huge PITA so I switched to putting everything into WSL2 then in arch on there pytorch etc in containers so I could flip versions easily now that I know how SPECIFIC the versions are to one another.
I'm still working on that part, halfway into it my WSL2 completely broke and I had to reinstall windows. I'm scared to mount the vhdx right now. I did ALL of my work and ALL of my documentation is inside of the WSL2 archlinux and NOT on my windows machine. I have EVERYTHING I need to quickly put another server up (dotfiles, configs) sitting in a chezmoi git repo ON THE VM. That I only git committed one init like 5 mins into everything. THAT was a learning experience, now I have no idea if I should follow the "best practice" of keeping projects in wsl or having wsl reach out to windows, there's a performance drop. The 9p networking stopped working and no matter what I reinstalled, reset, removed features, reset windows, etc, it wouldn't start. But at least I have that WSL2 .vhdx image that will hopefully mount and start. And probably break WSL2 again. I even SPECIFICALLY took backups of the image as tarballs every hour in case I broke LINUX, not WSL.
If anyone has done sd containers in wsl2 already let me know. I've tried to use WSL for dev work (i use osx) like this 2-3 times in the last 4-5 years and I always run into some catastrophically broken thing that makes my WSL stop working. I hadn't used it in years so hoped it was super reliable by now. This is on 3 different desktops with completely different hardware, etc. I was terrified it would break this weekend and IT DID. At least I can be up in windows in 20 minutes thanks to chocolately and chezmoi. Wiped out my entire gaming desktop.
Sorry I'm venting now this was my entire weekend.
This repo is from a deepspeed contrib (iirc) and lists the reqs for deepspeed + windows that mention the version matches
https://github.com/S95Sedan/Deepspeed-Windows
> conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
It may sound weird to do any of this in Windows, or maybe not, but if it does just remember that it's a lot of gamers like me with 4090s who just want to learn ML stuff as a hobby. I have absolutely no idea what I'm doing but thank god I know containers and linux like the back of my hand.
Again, you are missing the point. Java is both a language (java source) and a machine (the JVM). The latter is a hardware ISA - there are processors that implement Java bytecode as their ISA format. Yet most people who are running Java are not doing so on java-machine hardware, yet they are using the java ISA in the process.
https://en.wikipedia.org/wiki/Java_processor
https://en.wikipedia.org/wiki/Bytecode#Execution
any bytecode is an ISA, the bytecode spec defines the machine and you can physically build such a machine that executes bytecode directly. Or you can translate via an intermediate layer, like how Transmeta Crusoe processors executed x86 as bytecode on a VLIW processor (and how most modern x86 processors actually use RISC micro-ops inside).
these are completely fungible concepts. They are not quite the same thing but bytecode is clearly an ISA in itself. Any given processor can choose to use a particular bytecode as either an ISA or translate it to its native representation, and this includes both PTX, Java, and x86 (among all other bytecodes). And you can do the same for any other ISA (x86 as bytecode representation, etc).
furthermore, what most people think of as "ISAs" aren't necessarily so. For example RDNA2 is an ISA family - different processors have different capabilities (for example 5500XT has mesh shader support while 5700XT does not) and the APUs use a still different ISA internally etc. GFX1101 is not the same ISA as GFX1103 and so on. These are properly implementations not ISAs, or if you consider it to be an ISA then there is also a meta-ISA encompassing larger groups (which also applies to x86's numerous variations). But people casually throw it all into the "ISA" bucket and it leads to this imprecision.
like many things in computing, it's all a matter of perspective/position. where is the boundary between "CMT core within a 2-thread module that shares a front-end" and "SMT thread within a core with an ALU pinned to one particular thread"? It's a matter of perspective. Where is the boundary of "software" vs "hardware" when virtually every "software" implementation uses fixed-function accelerator units and every fixed-function accelerator unit is running a control program that defines a flow of execution and has schedulers/scoreboards multiplexing the execution unit across arbitrary data flows? It's a matter of perspective.
They have a hard time to understand the pain points of their consumers, as they don't feel that pain, look trough their own organisation-coloured glases, and can't see the real pain points from the whiney-customer ones.
AMD probably thinks software ecosystems are the easy part, ready to take it on whenever they feel like it and throw a token amount at it. They've built a great engine, see the carossery as beneath them, and don't understand why the lazy customer wants them to build the rest of the car too.
And that someone usually isn't a manufacturer, lest the committee be accused of bias.
Consequently, you get (a) outdated features that SotA has already moved beyond, (b) designed in a way that doesn't correspond to actual practice, and (c) that are overly generalized.
There are some notable exceptions (e.g. IETF), but the general rule has been that open specs please no one, slowly.
IMHO, FRAND and liberal cross-licensing produce better results.
As much as I love Microsoft/Windows for the work they have put into WSL, I ended up just putting Kubuntu on my devices and use QEMU with GPU passthrough whenever I need Windows. Gaming perf is good. You need an iGPU or a cheap second GPU for Linux in order to hand off a 4090 etc. to Windows (unless maybe your motherboard happens to support headless boot but if it's a consumer board it doesn't). Dual boot with Windows always gave me trouble.
Same reason it wasn't when it was obvious Nvidia was taking over this space maybe 8 years ago now when they let OpenCL die then proceeded to do nothing till it's too late.
Speaking to anyone working in general purpose GPU coding back then they all just said the same thing, OpenCL was a nightmare to work with and CUDA was easy and mature compared to it. Writing was on the wall where things were heading the second you saw a photon based renderer running on GPU vs CPU all the way back then, AMD has only themselves to blame because Nvidia basically showed them the potential with CUDA.
Personally I want Nvidia to break the x86-64 monopoly, with how amazing properly spec'd Nvidia cards are to work with I can only dream of a world where Nvidia is my CPU too.
These were precisely the arguments for 'x86 will entrench Intel for all time', and we've seen AMD succeed at that game just fine.
First I thought it was hardware related in a Remote Desktop session leading me to think some weird audio driver thing
have you encountered anything like this at all?
(To be clear, HIP is about converting CUDA source code not running CUDA-compiled binaries but the Zluda project discussed in OP heavily relies on it.)
I bet there are at least two markets (or niches):
1. People who want the absolute best performance and the latest possible version and are willing to pay the premium for it;
2. People who want to trade performance by cost and accept working with not-the-latest versions.
In fact, I bet the market for (2) is much larger than (1).
I think the case of cuda vs an open standard is different from os2 vs Windows because the customers of cuda are programmers with access to source code while the customers of os2 were end users trying to run apps written by others.
If your shrink-wrapped software didn't run on os2, you'd have no choice but to go buy Windows. Otoh if your ai model doesn't run on an AMD device and the issue is something minor, you can edit the shader code.
If AMD invented the analogous to x86_64 for CUDA, this would increase competition and progress in AI by some huge fraction.
The big issue for Intel is pretty similar to that of AMD; everything is made for CUDA, and Intel has to either build their own solutions or convince people to build support for Intel. While I'm working on learning AI and plan to use an Nvidia card, its pretty the progress Intel has made in the last couple of years since introducing their first GPU to market has been pretty wild, and I think it really give AMD pause.
... after a couple decades of legal proceedings and a looming FTC monopoly case convinced Intel to throw in the towel, cross-license, and compete more fairly with AMD.
https://jolt.law.harvard.edu/digest/intel-and-amd-settlement
AMD didn't just magically do it on its own.
I got this up and running on my windows machine in short order and I don't even know what stable diffusion is.
But again, it would be nice to have first class support to locally participate in the fun.
that's a fascinating statement with the clear ascendancy of neural-assisted algorithms etc. Things like DLSS are the future - small models that just quietly optimize some part of a workload that was commonly considered impossible to the extent nobody even thinks about it anymore.
my prediction is that in 10 years we are looking at the rise of tag+collection based filesystems and operating system paradigms. all of us generate a huge amount of "digital garbage" constantly, and you either sort it out into the important stuff, keep temporarily, and toss, or you accumulate a giant digital garbage pile. AI systems are gonna automate that process, it's gonna start on traditional tree-based systems but eventually you don't need the tree at all, AI is what's going to make that pivot to true tag/collection systems possible.
Tags mostly haven't worked because of a bunch of individual issues which are pretty much solved by AI. Tags aren't specific enough: well, AI can give you good guesses at relevance. Tagging files and maintaining collections is a pain: well, the AI can generate tags and assign collections for you. Tags really require an ontology for "fuzzy" matching (search for "food" should return the tag "hot dog") - well, LLMs understand ontologies fine. Etc etc. And if you do it right, you can basically have the AI generate "inbox/outbox" for you, deduplicate files and handle versioning, etc, all relatively seamlessly.
microsoft and macos are both clearly racing for this with the "AI os" concept. It's not just better relevance searches etc. And the "generate me a whole paragraph before you even know what I'm trying to type" stuff is not how it's going to work either. That stuff is like specular highlights in video games around 2007 or whatever - once you had the tool, for a few years everything was w e t until developers learned some restraint with it. But there are very very good applications that are going to come out in the 10 year window that are going to reduce operator cognitive load by a lot - that is the "AI OS" concept. What would the OS look like if you truly had the "computer is my secretary" idea? Not just dictating memorandums, but assistance in keeping your life in order and keeping you on-task.
I simply cannot see linux being able to keep up with this change, in the same way the kernel can't just switch to rust - at some point you are too calcified to ever do the big-bang rewrite if there is not a BDFL telling you that it's got to happen.
the downside of being "the bazaar" is that you are standards-driven and have to deal with corralling a million whiny nerds constantly complaining about "spying on me just like microsoft" and continuing to push in their own other directions (sysvinit/upstart/systemd factions, etc) and whatever else, on top of all the other technical issues of doing a big-bang rewrite. linux is too calcified to ever pivot away from being a tree-based OS and it's going to be another 2-3 decades before they catch up with "proper support for new file-organization paradigms" etc even in the smaller sense.
that's really just the tip of the iceberg on the things AI is going to change, and linux is probably going to be left out of most of those commercial applications despite being where the research is done. It's just too much of a mess and too many nerdlingers pushing back to ever get anything done. Unix will be represented in this new paradigm but not Linux - the commercial operators who have the centralization and fortitude to build a cathedral will get there much quicker, and that looks like MacOS or Solaris not linux.
Or at least, unless I see some big announcement from KDE or Gnome or Canonical/Red Hat about a big AI-OS rewrite... I assume that's pretty much where the center of gravity is going to stay for linux.
It's a classic "between a rock and a hard place" scenario. Quite a conundrum.
If the players in the space have naturally coalesced around one over the last decade, can we skip the thrashing and just go with it this time?
I've done this on both a hackintosh and void linux. I was so excited to get the hackintosh working because I honestly hate day desktop linux, it's my day job to work on and I just don't want to deal with it after work.
Unfortunately both would break in significant ways and I'd have to trudge through and fix things. I had that void desktop backed up with Duplicacy (duplicati front end) and IIRC I tried to roll back after breaking qemu, it just dumps all your backup files into their dirs, and I think I broke it more.
I think at that point I was back up in Windows in 30 mins.. and all of its intricacies like bsoding 30% of the time that I either restart it or unplug a usb hub. But my Macbooks have a 30% chance of not waking up on Monday morning when I haven't used them all weekend without me having to grab them and open the screen.
WebGPU might be the thing that unifies the frontend API for folks writing cross-platform renderers, seeing as browsers will have to implement it on top of the platform APIs anyway.
https://linuxmusicians.com/viewtopic.php?t=25556
Could be completely unrelated though, RDP sessions can definitely act up, get audio out of sync etc. I try to never do pass through rdp audio, it's not even enabled by default in the mstsc client IIRC but that may just be a "probably server" thing.
I was actually advising an HN user against using Jetson just the other day because it's such an extreme outlier when it comes to Nvidia and software support. Frankly Jetson makes no sense unless you really need the power efficiency and form-factor.
Meanwhile, any seven year old >= Pascal card is fully supported in CUDA 12 and the most recent driver releases. That combined with my initial data points and others people have chimed in with on this thread is far from "utter crap".
Use the right tool for the job.
Are we talking about the same NVIDIA? The entire Nvidia GPU strategy for nvidia is - make a feature (or find existing one) that performs better on their cards - pay developers to use (and sometimes misuse) it extensively.
But yes, AMD was playing the "follow x86" game for a long time until they came up with x86-64, which evened the playing field in terms of architecture.
I would guess there are lots of people still running CUDA 11. Older clusters, etc. A lot of that software doesn't get updated very often.
DirectX was targetted at gaming and was a much more limited simpler API which made programming games in it easier. It couldn't do everything that OpenGL can which is why CAD programs didn't use it even on Windows. DirectX worked because it chose its market correctly and delivered what the customers want. Window's exceptional backwards compatibility helped greatly as well. Many simple game engines still use DX9 API to this day.
It is not so much about having an open standard, but being able to provide extra functionality and performance. Unlike the CPU-dominated areas where executing the common baseline ISA is very competitive, in accelerated computing using every single bit of performance and having new and niche features matter. So providing exceptional hardware with good software is critical for the competition. Closed APIs have much more quick delivery time and they don't have to deal with multiple vendors.
Nobody except Nvidia delivers good enough low level software and their hardware is exceptionally good. AMD's combination is neither. The hardware is slower and it is hard to program so they continuously lose the race.
The only thing it has going for it is being a free beer UNIX clone for headless environments, and even then, isn't that relevant on cloud environments where containers and managed languages abstract everything they run on.
Then there is the whole issue of extension spaghetti, and incompatibilities across OpenGL, OpenGL ES and WebGL, hardly possible to have portable code 1:1 everywhere, beyond toy examples.
Their leadership seems quite a bit more competent than random forum commenters give them credit for. I guess what they need, marketing wise, is a few successful halo GPU launches. They haven't done that in a while. Lisa acknowledged this years ago. It's marketing 101. I guess these things are easier said than done.
H100's are hard to get. Nearly impossible. CoreWeave and others have scooped them all up for the foreseeable future. So, if you are looking at only price as the factor, then it becomes somewhat irrelevant, if you can't even buy them [0]. I don't really understand the focus on price because of this fact.
Even if you do manage to score yourself some H100's. You also need to factor in the networking between nodes. IB (Infiniband) made by Mellanox, is owned by NVIDIA. Lead times on that equipment are 50+ weeks. Again, price becomes irrelevant if you can't even network your boxes together.
As someone building a business around MI300x (and future products), I don't care that much about price [!]. We know going in that this is a super capital intensive business and have secured the backing to support that. It is one of those things where "if you have to ask, you can't afford it."
We buy cards by the chassis, it is one price. I actually don't know the exact prices of the cards (but I can infer it). It is a lot about who you know and what you're doing. You buy more chassis, you get better pricing. Azure is probably paying half of what I'm paying [1]. But I'd also say that from what I've seen so far, their chassis aren't nearly as nice as mine. I have dual 9754's, 2x bonded 400G, 3TB ram, and 122TB nvme... plus the 8x MI300x. These are top of the top. They have Intel and I don't know what else inside.
[!] Before you harp on me, of course I care about price... but at the end of the day, it isn't what I'm focused on today as much as just being focused on investing all of the capex/opex that I can get my hands on, into building a sustainable business that provides as much value as possible to our customers.
[0] https://www.tomshardware.com/news/tsmc-shortage-of-nvidias-a...
[1] https://www.techradar.com/pro/instincts-are-massively-cheape...
Chasing CUDA compatibility is a fool's errand when the most important users of CUDA are open source. Just add explicit AMD support upstream and skip the never ending compatibility treadmill, and get better performance too. And once support is established and well used the community will pitch in to maintain it.
Maybe some Microsoft owned games makers will never make the shift, but if the majority of others do then that's the death knell.
"Building the DirectX shader compiler better than Microsoft?" (2024) https://news.ycombinator.com/item?id=39324800
E.g. llama.cpp already supports hipBLAS; is there an advantage to this ROCm CUDA-compatibility layer - ZLUDA on Radeon (and not yet Intel OneAPI) - instead or in addition? https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#hi... https://news.ycombinator.com/item?id=38588573
What can't WebGPU abstract away from CUDA unportability? https://news.ycombinator.com/item?id=38527552
With AMD's official 15GB(!) Docker image, I was now able to get the A1111 UI running. With SD 1.5 and 30 sample iterations, generating an image takes under 2s. I'm still struggling to get InvokeAI running.
For the non-vendor lock in AI's (copilot), casting as wide of a net as possible to catch customers as easily as possible should by default mean that they would invest the small amount of money to build linux integrations into their AI platforms.
Plus, the googs has a pretty deep investment into the linux ecosystem and should have little issue pushing bard or gemini or whatever they'll call it next week before they kill it out into a linux compatible interface, and if they do that then the other big players will follow.
And, don't overlook the next generation of VR headsets. People have gotten silly over the Apple headset, but Valve should be rolling out the Deckhard soon and others will start to compete in that space since Apple raised the price bar and should soon start rolling out hardware with more features and software to take advantage of it.
It is. All the things are the problem. AMD is behind on both hardware and software, for both gaming and compute workloads, and has been for many years. Their competitor has them beat in pretty much every vertical, and the lock-in from CUDA helps ensure that even if AMD can get their act together on the hardware side, existing compute workloads (there are oceans of existing workloads) won’t run on their hardware, so it won’t matter for professional or datacenter usage.
To compete with Nvidia in those verticals, AMD has to fix all of it. Ideally they’d come out with something better than CUDA, but they have not shown an aptitude for being able to do something like that. That’s why people keep telling them to just make a compatibility layer. It’s a sad place to be, but that’s the sad place where AMD is, and they have to play the hand they’ve been dealt.
If the now very clearly well functioning implementation continues to perform as well as it is, the community may be able to keep it funded and functioning.
And the other side of this is that with renewed AMD interest/support for the rocm/HIP project, it might be just good enough as a stopgap step to push projects towards rocm/HIP adoption. (included below is another blurb from the readme).
> I am a developer writing CUDA code, does this project help me port my code to ROCm/HIP?
> Currently no, this project is strictly for end users. However this project could be used for a much more gradual porting from CUDA to HIP than anything else. You could start with an unmodified application running on ZLUDA, then have ZLUDA expose the underlying HIP objects (streams, modules, etc.), allowing to rewrite GPU kernels one at a time. Or you could have a mixed CUDA-HIP application where only the most performance sensitive GPU kernels are written in the native AMD language.
On the other hand:
> The next major ROCm release (ROCm 6.0) will not be backward [source] compatible with the ROCm 5 series.
Even worse, not even the driver is backwards-compatible:
> There are some known limitations though like currently only targeting the ROCm 5.x API and not the newly-released ROCm 6.x releases.. In turn having to stick to ROCm 5.7 series as the latest means that using the ROCm DKMS modules don't build against the Linux 6.5 kernel now shipped by Ubuntu 22.04 LTS HWE stacks, for example. Hopefully there will be enough community support to see ZLUDA ported to ROCM 6 so at least it can be maintained with current software releases.
(Yes, that's obvious, but not so obvious when your GPU applications submitted to a cluster start crashing randomly for no apparent reason.)
AMD doesn't seem to understand that affordable entry-level hardware with good software support is key.
"While AMD ships pre-built ROCm/HIP stacks for the major enterprise Linux distributions, if you are using not one of them or just want to be adventurous and compile your own stack for building HIP programs for running on AMD GPUs, one of the AMD Linux developers has written a how-to guide. "(1)
(1)
"Building An AMD HIP Stack From Upstream Open-Source Code
Written by Michael Larabel in Radeon on 9 February 2024 at 06:45 AM EST."
Who was responsible at AMD for this project and why is he still not fired???????? How brain dead someone have to be to reject the major market share??????
The sad thing is people can absolutely run ROCm on gaming cards if they build from source. Weirdly GPU programmers seem determined to use proprietary binaries to run "supported" hardware, and thus stick with CUDA.
I don't understand why AMD won't write the names of some graphics cards under "supported", even if they didn't test them as carefully as the MI series, and I don't understand why developers are so opposed to compiling their toolchains from source. For one thing it means you can't debug the toolchain effectively when it falls over, weird limitation to inflict on oneself.
Strange world.
Side point, there's a driver in your linux kernel already that'll probably work. The driver that ships with rocm is a newer version of the same and might be worth building via dkms.
Very strange that the rocm github doesn't have build scripts but whatever, I've been trying to get people to publish those for almost five years now and it just doesn't seem to be feasible.
They're not going big enough dies at the top end to compete with nvidia for the halo, and they're refusing to undercut at the low end where nvidia's reputation for absurd pricing is at an all time high. AMD's GPU division is a clown show, it's impressively bad. Even though the hardware itself is fine they just can't stop either making terrible product launches, awful pricing strategies, or just brain dead software choices like shipping a feature that triggered anti-cheat, getting their customers predictably banned & angering game devs in the process
And relevant to this discussion Nvidia's refusal to add VRAM to their lower end cards is a prime opportunity for AMD to go after the lower-end compute / AI interested crowd who will become the next generation software devs. What are they doing with this? Well, they're not making ROCm available to basically anyone, that's apparently the winning strategy. ROCm 6.0 only supports the 7900 XTX and the... Radeon VII. The weird one-off Vega 20 refresh. Of all the random cards to support, why the hell would you pick that one???
1) billions of dollar at the stake
2) one of the most successful leadership
3) during hottest peroid of their business where they heard about Nvidia's moat probably thousands of times during last 18 months...
and you call some decision "crazy", then you probably do not have the same informations that they do
or they underperformed, who knows, but I bet on #1 reason.
I'm a maintainer (and CEO) of Invoke.
It's something we're monitoring as well.
ROCm has been challenging to work with - we're actively talking to AMD to keep apprised of ways we can mitigate some of the more troublesome experiences that users have with getting Invoke running on AMD (and hoping to expand official support to Windows AMD)
The problem is that a lot of the solutions proposed involve significant/unsustainable dev effort (i.e., supporting an entirely different inference paradigm), rather than "drop in" for the existing Torch/diffusers pipelines.
While I don't know enough about your set up to offer immediate solutions, if you join the discord, am sure folks would be happy to try walking through some manual troubleshooting/experimentation to get you up and running - discord.gg/invoke-ai
You don't win an overall market by focusing on several hundred million dollar bespoke HPC builds where the platform (frankly) doesn't matter at all. I'm working on a project on an AMD platform on the list (won't say - for now) and needless to say you build whatever you have to what's there, regardless of what it takes and the operators/owners and vendor support teams pour in whatever resources are necessary to make it work.
You win a market a generation at a time - supporting low end cards for tinkerers, the educational market, etc. AMD should focus on the low-end because that's where the next generation of AI devs, startups, innovation, etc is coming from and for now that's going to continue to be CUDA/Nvidia.
I worked at a baremetal CDN with 60 pops and a few years ago we had to switch to AMD because of PCIE bandwidth over to our smartNICs and nvmeOF sort of things. We'd long hit limits on Intel before the Epyc stuff came out so we had to have more servers running than we wanted because we had to limit how much we did with one server to not hit the limits and cause everything to lock.
And we were excited, not a single apprehension. Epyc crushed the server market, everyone is using them. Well, it's going ARM now but Epyc will still be around awhile.
If AMD could get 90% of the CUDA ML stuff to seamlessly run on AMD hardware, and could provide hardware at a competitive cost-per-performance (which I assume they probably could since NVIDIA must have an insane profit margin on their GPUs), wouldn't that be the opportunity to eat NVIDIA's lunch?
if that's the case you have billion-dollar opportunities waiting for you to prove it!
https://hpc.guix.info/blog/2024/01/hip-and-rocm-come-to-guix...
> AMD has just contributed 100+ Guix packages adding several versions of the whole HIP and ROCm stack
If their primary objective is to break the CUDA monopoly, they should up their game in software, which means going as far as implementing support for their hardware in the most popular user apps themselves, if necessary. But since they don't seem to want to do that, they should really go for option one, especially if a single engineer already got so far.
Let's say AMD sold a lot of cards with CUDA support. Now nvidia tries to cut them off. What will happen next? A lot of people will replace their cards with nvidia ones. But a lot of the rest will try to make their expensive AMD cards work regardless. And if AMD provides a platform for that, they will get that work for free.
"AMD’s client segment, mostly chips for PCs and laptops, rose 62% year over year to $1.46 billion in sales, thanks to recent chip launches.
Sales in AMD’s gaming segment, which includes “semi-custom” processors for Microsoft Xbox and Sony PlayStation consoles, fell 17%. "
* https://www.cnbc.com/2024/01/30/amd-earnings-report-q4-2024....
https://www.phoronix.com/forums/forum/linux-graphics-x-org-d...
And I'm on Linux Mint 21.3 and so how to change any instillation script to think that Mint is Ubuntu to get that to maybe work there but there's no how-to for Mint like the one that AMD provides for Ubuntu! And really that's compiled By AMD for the specific Linux Kernel so not any DKMS sort of methods there AFAIK! but I'm no Linux Expert and just want some one-click install or that to ship with the Distro already working so Blender 3D's iGPU/dGPU accelerated Cycles rendering is possible on AMD Radeon consumer GPUs.
You've already got an amdgpu driver in your kernel. Possibly an old one but it'll be there. ROCm is userspace.
Ryzen was a surprise to everyone not because it was good, but because they didn't fuck it up within two generations.
AMD cards have more raw compute than nvidia, they are better than nvidia, yet the software is so bad that I gave up on using it and switched to nvidia. Two weeks of debugging driver errors vs 30 minutes of automated updates.
Unity, Unreal and Godot all support compiling for Linux either by default or with inexpensive or possibly free add-ons. I'm sure many other game engines do as well, and when you're taking a few hours of work at most to add everyone who owns a steam deck or a steam deck clone as a potential customer to your customer base then that is not a tall order.
CUDA PTX is that Intermediate Language representation that's portable for cross platform usage but I'm not exactly sure how that is implemented for Blender 3.0/later.
P.S I have Ryzen 3000/Zen+ series APUs and Vega Integrated Graphics on 2 systems and the laptop has Ryzen 3550H/Vega 8CU iGPU and Polaris Radeon RX560X dGPU while the Mini Desktop PC has Ryzen 3400G/Vega 11CU iGPU only.
I have since gotten Invoke to run and was already able to get some results I'm really quite happy with, so thank you for your time and commitment working on Invoke!
I understand that ROCm is still challenging, but it seems my problems were less related to ROCm or Invoke itself and more to Python dependency management. It really boiled down to getting the correct (ROCm) versions of packages installed. Installing Invoke from PyPi always removed my Torch and installed CUDA-enabled Torch (as well as cuBLAS, cuDNN, ...). Once I had the correct versions of packages, everything just worked.
To me, your pyproject.toml looks perfectly sane, so I wasn't sure how to go about fixing the problem.
What ended up working for me was to use one of AMD's ROCm OCI base images, manually installing all dependencies, foregoing a virtual environment, cloning your repo (, building the frontend), and then installing from there.
The majority of my struggle would have been solved by a recent working Docker image containing a working setup. (The one on Docker Hub is 9 months old.) Trying to build the Dockerfile from your repo, I also ended up with a CUDA-enabled Torch. It did install the correct one first, but in a later step removed the ROCm-enabled Torch to switch it for the CUDA-enabled one.
I hope you'll consider investing some resources into publishing newer, working builds of your Docker image.
So there's a YouTube Video from some Supercomputer conference where the presenter goes over the support Matrix info for ROCm/HIP, CUDA/CUDA Tools, and OneAPI/Level-0 and they are similar in scope there.
Pretty sure Vulkan gonna work equally well, at the very least there’s an open source DXVK project which implements D3D11 on top of Vulkan.
Also, nothing is easier on Windows. It's a wonder that anything works there, except for the power of recalcitrance.
Not dogging Windows users, but once your brain heals, it just can't go back.
given how omnipresent she is with her live streaming, it's a bit like South Park's Worldwide Privacy Tour: https://www.youtube.com/watch?v=2N8_5LDkZwY
At least Nvidia, which I fucking hate, will happily hold out their hand for cash even from individuals.
So now we’re in a hilarious situation where people from hobbyists to enterprise devs are hoping for intel to save the day.
We do have Docker packages hosted on GH, but I'll be the first to admit that we haven't prioritized ROCm. Contributors who have AMDs are a scant few, but maybe we'll find some help in wrangling that problem now that we know there's an avenue to do so.
Time will tell if that strategy is going to pan out. Ceding the ML "training" market entirely to Nvidia is certainly a bold move
A better level to target compatibility would be at the framework level such as PyTorch, where the building blocks of neural networks (convolution, multi-head attention, etc, etc) are high level and abstract enough to allow flexibility in mapping them onto AMD hardware without compromising performance.
However, these frameworks are forever changing and playing continual catch-up there still wouldn't be a great place to be, especially without a large staff dedicated to the effort (writing hand-optimized kernels), which AMD don't seem to be able/willing to muster.
So, finally, perhaps the strategically best place for AMD to invest would be in compilers and software tools to allow kernels to be written in a high level language. Becoming a first class Mojo target wouldn't be a bad place to start, assuming they are not already in partnership.
You can't install the PyTorch that's best for the currently running platform using a pyproject.toml with a setuptools backend, for starters. Invoke would have to author a setup.py that deals with all the issues, in a way that is compatible with build isolation.
> The majority of my struggle would have been solved by a recent working Docker image containing a working setup. (The one on Docker Hub is 9 months old.)
Why? Given the state of the ecosystem, what guarantee is there really that the documentation for Docker Desktop with AMD ROCm device binding is going to actually work for your device? (https://rocm.docs.amd.com/projects/MIVisionX/en/latest/docke...)
There is a lot of ad-hoc reinvention of tooling in this space.
I mean they literally did that, but then dropped it so yea
We already see things like Google abandoning tensorflow support for Windows, because they don't have enough devs using Windows to easily maintain it.
And of course, we have a changing of the guard in terms of a generation of software developers who primarily worked on Windows, because that was the way to do it, starting to retire. Younger devs came up in the Google era where Linux is a first class citizen alongside MacOS.
I think these factors are going to change the face of technology in the coming 15 years, and that's likely to affect how businesses and consumers consume technology, even if they don't understand what's actually running under the hood.
MS has put a collosal amount money into catching up to at least be able to take advantage of the AI wave, that much is clear. Maybe for consumers this will be enough, but R&D wise I don't see them ever being the default choice.
And this is potentially a huge problem for them in the long run, because OS choice by industry is driven by the available tooling. If they lose ML, they could potentially lose traditional engineering if fields like robotics start relying on Linux more heavily.
If anything, the Asahi devs are the ones acting out.
If they don't want HN to criticize them, then they should expect to not get the free publicity that HN offers. Seems fair enough.
Also, between accusing HN of "supporting trans genocide" (which is some mix between "impossible" and "false"), and poisoning links with HN referrer URLs, they don't seem like very good people themselves.
I love the direct, "no bullshit" style of writing.
Some gems:
> Anyone familiar with C++ will instantly understand that compiling it is a complicated affair.
> Additionally CUDA allows, to a large degree, mixing CPU code and GPU code. What does all this complexity mean for ZLUDA? Absolutely nothing
> Since an application can dynamically link to either Driver API or Runtime API, it would seem that ZLUDA needs to provide both. In reality very few applications dynamically link to Runtime API. For the vast majority of applications it's sufficient to provide Driver API for dynamic (runtime) linking.
I'm pretty sure Torvalds was giving the finger over the subject of GPU drivers (which run on the CPU), not programming on the Nvidia GPU itself. Particularly, they namedropped Bumblebee (and maybe Optimus?) which was more about power-management and making Nvidia cooperate with a non-Nvidia integrated GPU than it was about the Nvidia GPU itself.
And it's really not surprising that people, GPU programmers included, doesn't want to spend time and money on trying out unsupported hardware and software combinations when again, it's supposed to be a tool to get a job done. If I got some Phillips head screws I'm not reaching for a flat head screwdriwer even though it probably will work, and if it's the only thing I have I'll buy some Phillips head ones for the next project.
AMD cannot keep up with arbitrarily changing hardware and software while trying to please developers that want what was just released. They would always be a generation behind at tremendous expense.
Meanwhile NVidia was adding C++, Fortran, PTX, supporting other programming language communities trying to target GPUS (Java, .NET, Haskell,..).
Making it as easy to debug GPUs as modern graphical debuggers for CPUs, building libraries,...
Intel, and AMD together with Khronos did this to themselves.
> Also, nothing is easier on Windows.
As much as I, too, dislike Windows, I still have to disagree. I have encountered (proprietary) software which was much easier to get working on Windows. For example, Cisco AnyConnect with SmartCard authentication has been a nightmare for me on Linux.
Then people act surprised CUDA was won the hearts of the scientific developer community, that rather spend their time actually doing research work.
FOSS folks make this a bigger issue than it really is, game studios make a pluggable API on their engine and call it a day, move on into everything else that matters in actually delivering a game.
I see. I do know Python, but my knowledge of setuptools, pip, poetry and whatever else have you. To get my working setup, I specified an --index-url for my Torch installation. Does that not work while using their current setup?
> Why? Given the state of the ecosystem, what guarantee is there really that the documentation for Docker Desktop with AMD ROCm device binding is going to actually work for your device?
Well, they did work for me. Though I think only passing /dev/{dri,kfd} and setting seccomp=unconfined was sufficient. So for my particular case, getting a working image was the only missing step.
From a more general POV: it might not make sense to invest in a ROCm OCI image from a short-term business perspective, but in the long term and based purely on principal, I do think the ecosystem should strive to be less reliant on CUDA and only CUDA.
The situation in reality is quite actually quite bad.
Given that I have a M2 Max and no nVidia cards, I've tried enough PyTorch-based ML libraries that at some point, I basically expect them to flat out show an error saying CUDA 10.x+ is required once the dependencies are installed (eg. one of them being the bitsandbytes library -- in fairness, there's apparently some effort trying to port the code to other platforms as well).
As of today, the whole field is moving too fast that it's simply not worth it for a solo dev or even a small team to even attempt getting a non-CUDA stack up and running, especially with the other major GPU vendors not (able to?) hiring people to port the hand-optimized CUDA kernels.
Hopefully the situation will change after these couple years of frenzy, but in the time being I don't see any viable way to avoid using a CUDA stack if one is serious with getting ML stuff done.
So if you'd want to ignore CUDA+PyTorch and reimplement all of what you need on top of Vulkan.... well, that becomes worthy of discussion only if you expect to spend a lot on hardware, if you really consider that savings on hardware can recoup many engineer-years of costs - otherwise it's more effective to just go with the flow.
This doesn't make the "play" button any different. People only care if the Proton version is buggy or noticeably less performant, and native ports have no trouble being both of those (see: Rust (game) before the devs dropped Linux support)
It limits Nvidia's profit margin - if Nvidia cards run twice as fast but cost more than twice as much, then people will just buy two AMD cards. Meanwhile, it gives AMD some revenue with which to fund an improved CUDA stack.
>their customers have the option to save money by writing ROCm
CUDA saves money by having a fuckton of pre-written CUDA code and being supported as default basically everywhere.
Lock people in to something that didn’t exist in a way any user could use before it existed? I get people hate CUDAs dominance but no one else was pushing this before CUDA and Apple+AMD completely fumbled OpenCL.
Can’t hate on something good just because it’s successful and I can’t be angry the talent behind the success wanting to profit.
Yeah, most communities have bad actors, but in HN's case, most of the bad comments are either user-flagged or killed directly by dang. The crazy part is that some of these people (e.g. sussmannbaka in the thread you linked) actually think that that means that those comments are somehow endorsed or something, which is completely insane - the comment literally says "dead" or "flagged", that means the community doesn't think it's acceptable.
The behavior in both the Mastodon post and that thread is why I don't want these people on HN - they're not interested in intellectual curiosity, they just want to have a flamewar over nothing.
Crackle would happen so rarely that I KNOW it definitely happened but it wasn't like a 2 day thing it was probably like, once in a year or 6 months, etc.
What's nice about BLAS is that there are optimized implementations for CPUs (Intel MKL) as well as NVIDIA (cuBLAS) and AMD (hipBLAS), so while it's very much limited in what it can do, you can at least write portable code around it.
If you slap a CUDA compatibility layer on top of AMD, then CUDA code optimized for NVIDIA chips would run, but would suffer a performance penalty compared to code that was customized/tuned for AMD, so unless AMD GPUs were sold cheap enough (i.e. with low profit margin) to mitigate this loss of performance you might as well buy NVIDIA in the first place.
You probably already know but just in case you don't: you can set up a Linux VM with VirtualBox on your Windows and then mount the vhdx (read-only) as an additional disk to extract the stuff you need via shared folders.
ROCm/hipDNN wraps CuDNN on Nvidia and MiOpen on AMD; but hasn't been updated in awhile: https://github.com/ROCm/hipDNN
https://news.ycombinator.com/item?id=37808036 : conda-forge has various BLAS implementations, including MKL-optimized BLAS, and compatible NumPy and SciPy builds.
BLAS: Basic Linear Algebra Sub programs: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprogra...
"Using CuPy on AMD GPU (experimental)" https://docs.cupy.dev/en/v13.0.0/install.html#using-cupy-on-... :
$ sudo apt install hipblas hipsparse rocsparse rocrand rocthrust rocsolver rocfft hipcub rocprim rccl
You were asking if this CUDA compatability layer might hold any advantage over HIP (e.g. for use by llama.cpp) ?
I think the answer is no, since HIP includes pretty full-featured support for many of the higher level CUDA-based APIs (cuDNN, cuBLAS, etc), while per the Phoronix article ZLUDA only (currently) has minimal support for them.
I wouldn't expect ZLUDA to provide any performance benefit over HIP either, since on AMD hardware HIP is just a pass-thru to MIOpen (AMD's equivalent to cuDNN), rocBLAS, etc.
ROCm docs > "Install ROCm Docker containers" > Base Image: https://rocm.docs.amd.com/projects/install-on-linux/en/lates... links to ROCm/ROCm-docker: https://github.com/ROCm/ROCm-docker which is the source of docker.io/rocm/rocm-terminal: https://hub.docker.com/r/rocm/rocm-terminal :
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/rocm-terminal
ROCm docs > "Docker image support matrix":
https://rocm.docs.amd.com/projects/install-on-linux/en/lates...ROCm/ROCm-docker//dev/Dockerfile-centos-7-complete: https://github.com/ROCm/ROCm-docker/blob/master/dev/Dockerfi...
Bazzite is a ublue (Universal Blue) fork of the Fedora Kinoite (KDE) or Fedora Silverblue (Gnome) rpm-ostree Linux distributions; ublue-os/bazzite//Containerfile : https://github.com/ublue-os/bazzite/blob/main/Containerfile#... has, in addition to fan and power controls, automatic updates on desktop, supergfxctl, system76-scheduler, and an fsync kernel:
rpm-ostree install rocm-hip \
rocm-opencl \
rocm-clinfo
But it's not `rpm-ostree install --apply-live` because its a Containerfile.To install a ublue-os distro, you install any of the Fedora ostree distros: {Silverblue, Kinoite, Sway Atomic, or Budgie Atomic} from e.g. a USB stick and then `rpm-ostree rebase <OCI_host_image_url>`:
rpm-ostree rebase ostree-unverified-registry:ghcr.io/ublue-os/bazzite:stable
rpm-ostree rebase ostree-unverified-registry:ghcr.io/ublue-os/bazzite-nvidia:stable
rpm-ostree rebase ostree-image-signed:
ublue-os/config//build/ublue-os-just/40-nvidia.just defines the `ujust configure-nvidia` and `ujust toggle-nvk` commands:
https://github.com/ublue-os/config/blob/main/build/ublue-os-...There's a default `distrobox` with pytorch in ublue-os/config//build/ublue-os-just/etc-distrobox/apps.ini: https://github.com/ublue-os/config/blob/main/build/ublue-os-...
[mlbox]
image=nvcr.io/nvidia/pytorch:23.08-py3
additional_packages="nano git htop"
init_hooks="pip3 install huggingface_hub tokenizers transformers accelerate datasets wandb peft bitsandbytes fastcore fastprogress watermark torchmetrics deepspeed"
pre-init-hooks="/init_script.sh"
nvidia=true
pull=true
root=false
replace=false
docker.io/rocm/pytorch:
https://hub.docker.com/r/rocm/pytorchpytorch/builder//manywheel/Dockerfile: https://github.com/pytorch/builder/blob/main/manywheel/Docke...
ROCm/pytorch//Dockerfile: https://github.com/ROCm/pytorch/blob/main/Dockerfile
The ublue-os (and so also bazzite) OCI host image Containerfile has Sunshine installed; which is a 4k HDR 120fps remote desktop solution for gaming.
There's a `ujust remove-sunshine` command in system_files/desktop/shared/usr/share/ublue-os/just/80-bazzite.just : https://github.com/ublue-os/bazzite/blob/main/system_files/d... and also kernel args for AMD:
pstate-force-enable:
rpm-ostree kargs --append-if-missing=amd_pstate=active
ublue-os/config//Containerfile:
https://github.com/ublue-os/config/blob/main/ContainerfileLizardByte/Sunshine: https://github.com/LizardByte/Sunshine
moonlight-stream https://github.com/moonlight-stream
Anyways, hopefully this PR fixes the immediate issue: https://github.com/invoke-ai/InvokeAI/pull/5714/files
conda-forge/pytorch-cpu-feedstock > "Add ROCm variant?": https://github.com/conda-forge/pytorch-cpu-feedstock/issues/...
And Fedora supports OCI containers as host images and also podman container images with just systemd to respawn one or a pod of containers.
I'm not sure what you're pointing to with your reference to the Fedora-based images. I'm quite happy with my NixOS install and really don't want to switch to anything else. And as long as I have the correct kernel module, my host OS really shouldn't matter to run any of the images.
And I'm sure it can be made to work with many base images, my point was just that the dependency management around pytorch was in a bad state, where it is extremely easy to break.
> Anyways, hopefully this PR fixes the immediate issue: https://github.com/invoke-ai/InvokeAI/pull/5714/files
It does! At least for me. It is my PR after all ;)
Is there a way to 'restorecon --like / /nix/os/root72`; to apply SELonix extended filesystem attributes labels just to NixOS prefixes?
Some research is done with RPM-based distros; which have become so advanced with rpm-ostree support.
FWICS Bazzite has NixOS support, too; in addition to distrobox containers.
Bazzite has alot of other stuff installed that's not necessary when attempting to isolate sources of variance in the interest of reproducible research; but being for gaming it has various optimizations.
InvokeAI might be faster to install and to compute with with conda-forge builds.
For me the issue on AMD was stability in situations when VRAM was getting tight.
AMD fundamentally viewed/views GPUs as nothing more than a tool to make semicustom deals. Just like "xbox isn't the product, gamepass is the product" - well, for AMD "radeon isn't the product, semicustom is the product". The only thing they really need graphics for is APUs, and they don't need to beat the 4090, they just need to beat Xe-LP. They don't need raytracing, they don't need that "AI" crap (oops), just to run games at 720p/1080p.
They're happy to squeeze whatever they can out of Sony/MS's R&D spend, but they aren't going to invest heavily on their own. And now that there is an obvious money fountain occurring in AI/ML... that is starting to change.
It was always about the money, specifically the lack of it. AMD knew HSA-Library/OpenCL/etc sucked, they didn't care, especially when the money was better spent going after Intel instead of NVIDIA. Intel is dysfunctional and AMD had a chance to crack their marketshare, and that's where every penny they had went. And that's probably not a wrong business decision.
We can see that it’s not magic, the neuron either activates or it doesn’t, so why should I pay attention to some probabilistic steam of gibberish it spewed out? There is nothing meaningful that can be inferred from such systems, right?
But being able to leverage my graphics card for GPGPU was a top priority for me, and like you, I was appalled with the ROCm situation. Not necessarily the tech itself (though I did not enjoy the docker approach), but more the developer situation surrounding it.
* well, that and some vague notions about RTX