←back to thread

93 points rbanffy | 1 comments | | HN request time: 0.201s | source
Show context
pama ◴[] No.42188372[source]
Noting here that 2700 quadrillion operations per second is less than the estimated sustained throughput of productive bfloat16 compute during the training of the large llama3 models, which IIRC was about 45% of 16,000 quadrillion operations per second, ie 16k H100 in parallel at about 0.45 MFU. The compute power of national labs has fallen far behind industry in recent years.
replies(3): >>42188382 #>>42188389 #>>42188415 #
bryanlarsen ◴[] No.42188415[source]
A 64 bit float operation is >4X as expensive as a 16 bit float operation.
replies(2): >>42188503 #>>42188504 #
1. Koshkin ◴[] No.42188503[source]
In terms of heat dissipation, maybe, yes. But not necessarily in time.