(chipsandcheese.com)

212 points pella | 1 comments | 18 Jan 25 12:28 UTC | HN request time: 0.206s | source

Show context

neuroelectron ◴[18 Jan 25 15:27 UTC] No.42749032[source]▶

>Still, core to core transfers are very rare in practice. I consider core to core latency test results to be just about irrelevant to application performance. I’m only showing test results here to explain the system topology.

How exactly are "applications" developed for this? Or is that all proprietary knowledge? TinyBox has resorted to writing their own drivers for 7900 XTX

replies(1): >>42749868 #

latchkey ◴[18 Jan 25 17:31 UTC] No.42749868[source]▶

>>42749032 #

ROCm is the stack that people write code against to talk to AMD hardware.

George wrote some incomplete non-perfomant drivers for a consumer grade product. Certainly not an easy task, but it also isn't something that most people would use. George just makes loud noises to get attention, but few in the HPC industry pay any attention to him.

replies(2): >>42750277 #>>42750974 #

neuroelectron ◴[18 Jan 25 20:01 UTC] No.42750974[source]▶

>>42749868 #

Yes ROCm is for the GPU, but the MI300A also includes 4 clusters of cpus connected by an infinity fabric. Generally this kind of thing is handled by the OS but there is no OS for this product.

replies(3): >>42751202 #>>42752910 #>>42762841 #

1. alienthrowaway ◴[19 Jan 25 01:45 UTC] No.42752910[source]▶

>>42750974 #

AMD has been doing IF-connected CCDs/chiplets for a while now - since Zen 1, released in 2017. All the x86 OSes work fine on each iteration.

Application authors who care about wringing out the last drop of performance need to be mindful about how they manage processes and cache lines on this hardware - as they would on any other architecture

↑

The AMD Radeon Instinct MI300A's Giant Memory Subsystem