El Capitan: New supercomputer is the fastest

(spectrum.ieee.org)

Show context

pama ◴[19 Nov 24 21:37 UTC] No.42188372[source]▶

Noting here that 2700 quadrillion operations per second is less than the estimated sustained throughput of productive bfloat16 compute during the training of the large llama3 models, which IIRC was about 45% of 16,000 quadrillion operations per second, ie 16k H100 in parallel at about 0.45 MFU. The compute power of national labs has fallen far behind industry in recent years.

replies(3): >>42188382 #>>42188389 #>>42188415 #

1. alephnerd ◴[19 Nov 24 21:39 UTC] No.42188389[source]▶

>>42188372 #

Training an LLM (basically Transformers) is different workflow from Nuclear Simulations (basically Monte Carlo simulations)

There are a lot of intricates, but at a high level they require different compute approaches.

replies(3): >>42188413 #>>42188417 #>>42188497 #

2. handfuloflight ◴[19 Nov 24 21:42 UTC] No.42188413[source]▶

>>42188389 (TP) #

Can you expand on why the operations per second is not an apt comparison?

replies(1): >>42188538 #

3. pama ◴[19 Nov 24 21:43 UTC] No.42188417[source]▶

>>42188389 (TP) #

Absolutely. Though the performance of El Capitain is only measured by a linpack benchmark not the actual application.

replies(1): >>42188515 #

4. Koshkin ◴[19 Nov 24 21:53 UTC] No.42188497[source]▶

>>42188389 (TP) #

This is about the raw compute, no matter the workflow.

replies(1): >>42193796 #

5. pertymcpert ◴[19 Nov 24 21:55 UTC] No.42188515[source]▶

>>42188417 #

I thought modern supercomputers use benchmarks like HPCG instead of LINPACK?

replies(1): >>42188963 #

6. pertymcpert ◴[19 Nov 24 21:57 UTC] No.42188538[source]▶

>>42188413 #

When you're doing scientific simulations, you're generally a lot more sensitive to FP precision than ML training which is very, very tolerant of reduced precision. So while FP8 might be fine for transformer networks, it would likely be unacceptably inaccurate/unusable for simulations.

7. fancyfredbot ◴[19 Nov 24 22:56 UTC] No.42188963{3}[source]▶

>>42188515 #

The top 500 includes both. There is no HPCG result for El Capitan yet:

https://top500.org/lists/hpcg/2024/11/

8. alephnerd ◴[20 Nov 24 13:33 UTC] No.42193796[source]▶

>>42188497 #

It isn't. I recommend reading u/pertymcpert's response.

↑