←back to thread

212 points pella | 1 comments | | HN request time: 0.211s | source
Show context
mk_stjames ◴[] No.42749299[source]
So the 300A is an accelerator coupled with a full 24-core EPYC and 128GB of HBM all on a single chip (or, packaged chiplets, whatever).

Why is it I can't buy a single one of these, on a motherboard, in a workstation format case, to use as an insane workstation? Assuming you could program for the accelerator part, there is an entire world of x86-fixed CAD, engineering, and entertainment industry (rendering, etc) where people want a single, desktop machine with 128GB + of fast ram to number crunch.

There are Blender artists out there that build dual and quad RTX4090 machines with Threadrippers for $20k+ in components all day, because their render jobs pay for it.

There are engineering companies that would not bat an eye at dropping $30k on a workstation if it mean they could spin around 80 gigabyte CATIA models of cars or aircraft loaded in RAM quicker. I know this at least because I sure as hell did with with several HP Z-series machines costing whole-Toyota-Corolla prices over the years...

But these combined APU chips are relegated to these server units. In the end is this a driver problem? Just a software problem? A chicken and egg problem where no one is developing the support because there isn't the hardware on the market, and there isn't the hardware on the market because AMD thinks there is no use case?

Edit: and note my use cases mentioned don't rely on latency, really, like videogamers need to hit framerates. The cache miss latency mentioned in the article doesn't matter as much for these type of compute applications where the main problems are just loading and unloading the massive amount of data. Things like offline renders and post-processing CFD simulations. Not necessarily a video output framerate.

replies(4): >>42749843 #>>42752447 #>>42757529 #>>42762774 #
latchkey ◴[] No.42749843[source]
(I run a company that buys MI300x.)

> Why is it I can't buy a single one of these, on a motherboard, in a workstation format case, to use as an insane workstation?

AMD doesn't have the resources to support end users for something like this. They are a public company, look at their spend. They are pouring everything they've got into trying to keep up with the Nvidia release cycle for AI chips.

These chips are cutting edge, they are not perfect. They are still working through the hardware and software issues. It is hard enough to deal with all the public opinion on things as it is. Why would they add another layer of potential abuse?

replies(1): >>42752446 #
AnthonyMouse ◴[] No.42752446[source]
The people who buy stuff like that are professionals. They often know something about the tools they're using and if there are any problems, provide bug reports that actually describe what's happening instead of some non-descriptive mush like "I have your GPU and Windows crashes sometimes". That is extremely helpful if you're trying to get rid of those bugs.

This is the same reason software shops have found it useful to support Linux, even if not many people use it. The people who do will make your product suck less, which in turn makes it easier to sell to the mass market, which will get upset and think unfavorably of you if they have the same problem but not be as good at telling you about it.

replies(2): >>42752455 #>>42752538 #
Aurornis ◴[] No.42752455[source]
> provide bug reports that actually describe what's happening

Doesn’t matter if the bug reports are good or bad. Supporting low volume applications is a bad business move when the alternative is 9-figure data center contracts.

The data center business is orders of magnitude larger. Trying to support individual developers would be a huge business mistake when they already can’t keep up with data center.

replies(1): >>42752502 #
1. AnthonyMouse ◴[] No.42752502[source]
It's the same hardware running the same software. You want the bug reports so you can fix them and then your data center customers don't encounter them when they're evaluating your product.

What they can keep up with is basically a matter of how much capacity they order from TSMC. If they underestimated demand for some generation, that's the sort of thing you fix with the next contract or you're just throwing money away.