←back to thread

195 points rbanffy | 1 comments | | HN request time: 0.217s | source
Show context
ipsum2 ◴[] No.42176882[source]
As someone who worked in the ML infra space: Google, Meta, XAI, Oracle, Microsoft, Amazon have clusters that perform better than the highest performing cluster on Top500. They don't submit because there's no reason to, and some want to keep the size of their clusters a secret. They're all running Nvidia. (Except Google, who uses TPUs and Nvidia.)

> El Capitan – we don’t yet know how big of a portion yet as we write this – with 43,808 of AMD’s “Antares-A” Instinct MI300A devices

By comparison XAI announced that they have 100k H100s. MI300A and H100s have roughly similar performance. Meta says they're training on more than 100k H100s for Llama-4, and have the equivalent of 600k H100s worth of compute. (Note that compute and networking can be orthogonal).

Also, Nvidia B200s are rolling out now. They offer 2-3x the performance of H100s.

replies(10): >>42176948 #>>42177276 #>>42177493 #>>42177581 #>>42177611 #>>42177644 #>>42178095 #>>42178187 #>>42178825 #>>42179038 #
maratc ◴[] No.42177611[source]
> Nvidia B200s ... offer 2-3x the performance of H100s

For ML, not for HPC. ML and HPC are two completely different, only loosely related fields.

ML tasks are doing great with low precision, 16 and 8 bit precision is fine, arguably good results can be achieved even with 4 bit precision [0][1]. That won't do for HPC tasks, like predicting global weather, computational biology, etc. -- one would need 64 to 128 bit precision for that.

Nvidia needs to decide how to divide the billions of transistors on their new silicon. Greatly oversimplifying, they can choose to make one of the following:

  *  Card A with *n* FP64 cores, or 
  *  Card B with *2n* FP32 cores, or 
  *  Card C with *4n* FP16 cores, or 
  *  Card D with *8n* FP8 cores, or (theoretically)
  *  Card E with *16n* FP4 cores (not sure if FP4 is a thing). 
Card A would give HPC guys n usable cores, and it would give ML guys n usable cores. On the other end, Card E would give ML guys 16n usable cores (and zero usable cores for HPC guys). It's no wonder that HPC crowd wants Nvidia to produce Card A, while ML crowd wants Nvidia to produce Card E. Given that all the hype and the money are currently with the ML guys (and $NVDA reflects that), Nvidia will make a combination of different cores that is much much closer to Card E than it is to Card A.

Their new offerings are arguably worse than their older offerings for HPC tasks, and the feeling with the HPC crowd is that "Nvidia and AMD are in the process of abandoning this market".

[0] https://papers.nips.cc/paper/2020/file/13b919438259814cd5be8...

[1] https://arxiv.org/abs/2212.09720

replies(5): >>42178357 #>>42178713 #>>42179347 #>>42180055 #>>42185923 #
1. ipsum2 ◴[] No.42178713[source]
Yes, that's a great point that I missed. From anecdotal evidence, it seems more people are using supercomputers for ML use cases, that would have been traditionally done by HPC. (eg training models for weather forecasts)