(martinalderson.com)

507 points martinald | 1 comments | 28 Aug 25 10:15 UTC | HN request time: 0s | source

Show context

ekelsen ◴[28 Aug 25 14:46 UTC] No.45052850[source]▶

The math on the input tokens is definitely wrong. It claims each instance (8 GPUs) can handle 1.44 million tokens/sec of input. Let's check that out.

1.44e6 tokens/sec * 37e9 bytes/token / 3.3e12 bytes/sec/GPU = ~16,000 GPUs

And that's assuming a more likely 1 byte per parameter.

So the article is only off by a factor of at least 1,000. I didn't check any of the rest of the math, but that probably has some impact on their conclusions...

replies(5): >>45052936 #>>45052942 #>>45052964 #>>45053047 #>>45053166 #

1. Lionga ◴[28 Aug 25 14:53 UTC] No.45052936[source]▶

>>45052850 #

Well he asked some AI to do the math for him probably

↑

Are OpenAI and Anthropic losing money on inference?