←back to thread

212 points pella | 1 comments | | HN request time: 0.206s | source
Show context
neuroelectron ◴[] No.42749032[source]
>Still, core to core transfers are very rare in practice. I consider core to core latency test results to be just about irrelevant to application performance. I’m only showing test results here to explain the system topology.

How exactly are "applications" developed for this? Or is that all proprietary knowledge? TinyBox has resorted to writing their own drivers for 7900 XTX

replies(1): >>42749868 #
latchkey ◴[] No.42749868[source]
ROCm is the stack that people write code against to talk to AMD hardware.

George wrote some incomplete non-perfomant drivers for a consumer grade product. Certainly not an easy task, but it also isn't something that most people would use. George just makes loud noises to get attention, but few in the HPC industry pay any attention to him.

replies(2): >>42750277 #>>42750974 #
neuroelectron ◴[] No.42750974[source]
Yes ROCm is for the GPU, but the MI300A also includes 4 clusters of cpus connected by an infinity fabric. Generally this kind of thing is handled by the OS but there is no OS for this product.
replies(3): >>42751202 #>>42752910 #>>42762841 #
1. alienthrowaway ◴[] No.42752910[source]
AMD has been doing IF-connected CCDs/chiplets for a while now - since Zen 1, released in 2017. All the x86 OSes work fine on each iteration.

Application authors who care about wringing out the last drop of performance need to be mindful about how they manage processes and cache lines on this hardware - as they would on any other architecture