←back to thread

623 points magicalhippo | 2 comments | | HN request time: 0s | source
Show context
narrator ◴[] No.42619363[source]
Nvidia releases a Linux desktop supercomputer that's better price/performance wise than anything Wintel is doing and their whole new software stack will only run on WSL2. They aren't porting to Win32. Wow, it may actually be the year of Linux on the Desktop.
replies(7): >>42619399 #>>42619444 #>>42619549 #>>42619598 #>>42619820 #>>42620944 #>>42622537 #
sliken ◴[] No.42619598[source]
Not sure how to judge better price/perf. I wouldn't expect 20 Neoverse N2 cores to do particularly well vs 16 zen5 cores. The GPU side looks promising, but they aren't mentioning memory bandwidth, configuration, spec, or performance.

Did see vague claims of "starting at $3k", max 4TB nvme, and max 128GB ram.

I'd expect AMD Strix Halo (AI Max plus 395) to be reasonably competitive.

replies(2): >>42619674 #>>42619722 #
skavi ◴[] No.42619722[source]
It’s actually “10 Arm Cortex-X925 and 10 Cortex-A725” [0]. These are much newer cores and have a reasonable chance of being competitive.

[0]: https://newsroom.arm.com/blog/arm-nvidia-project-digits-high...

replies(3): >>42619778 #>>42622425 #>>42624856 #
adrian_b ◴[] No.42622425[source]
For programs dominated by iterations over arrays, these 10 Arm Cortex-X925 + 10 Cortex-A725, all 20 together, should have a throughput similar with only 10 of the 16 cores of Strix Halo (assuming that Strix Halo has full Zen 5 cores, which has not been confirmed yet).

For programs dominated by irregular integer and pointer operations, like software project compilation, 10 Arm Cortex-X925 + 10 Cortex-A725 should have a similar throughput with a 16-core Strix Halo, but which is faster would depend on cooling (i.e. a Strix Halo configured for a high power consumption will be faster).

There is not enough information to compare the performance of the GPUs from this NVIDIA Digits and from Strix Halo. However, it can be assumed that NVIDIA Digits will be better for ML/AI inference. Whether it can also be competitive for training or for graphics remains to be seen.

replies(1): >>42624797 #
1. skavi ◴[] No.42624797[source]
How did you come up with these numbers? There don't seem to be many shipping products with these cores. In fact, the only one I could find was the Dimensity 9400 with a single X925 and older generation A720s. And of course the Dimensity is a mobile SoC, so clocks will be low.

Are you projecting based on Arm's stated improvements from their last gen? In that case, what numbers are you using as your baseline?

replies(1): >>42625579 #
2. adrian_b ◴[] No.42625579[source]
For programs rich in array operations, which can be accelerated by SVE or AVX-512, Cortex-X925 has 6 x 128-bit execution pipelines, Cortex-A725 has 2 pipelines, Snapdragon Oryon has 4 pipelines, while a Zen 5 core has the equivalent of 8 Arm execution pipelines (i.e. 2 x 512-bit pipelines equivalent with 8 x 128-bit) + other 8 execution pipelines that can do only a subset of the operations.

That means a total of 80 execution pipelines for NVIDIA Digits, 48 execution pipelines for Snapdragon Elite and 128 equivalent execution pipelines for Strix Halo, taking into account only the complete execution pipelines, otherwise for operations like FP addition, which can be done in any pipeline, there would be 256 equivalent execution pipelines for Strix Halo.

Because the clock frequencies for multithreaded applications should be similar, if not better for Strix Halo, there is little doubt that the throughput for applications dominated by array operations should be at least 128/80 for Strix Halo vs. NVIDIA Digits, if not much better, because for many instructions even more execution pipelines are available and Zen 5 also has a higher IPC when executing irregular code, especially vs. the smaller Cortex-A725 cores. Therefore the throughput of NVIDIA Digits is smaller or at most equal in comparison with the throughput of 10 cores of Strix Halo.

On the other hand, for integer/pointer processing code, the number of execution units in a Cortex-925 + a Cortex-725 is about the same as in 2 Zen 5 cores. Therefore the 20 Arm cores of NVIDIA Digits have about the same number of execution units as 20 Zen 5 cores. Nevertheless, the occupancy of the Zen 5 execution units will be higher for most programs than for the Arm cores, especially because of the bigger and better cache memories, and also because of the lower IPC of Cortex-A725. Therefore the 20 Arm cores must be slower than 20 Zen 5 cores, probably only equivalent with about 15 Zen 5 cores, but the exact equivalence is hard to predict, because it depends on the NVIDIA implementation of things like the cache memories and the memory controller.