(martinalderson.com)

507 points martinald | 2 comments | 28 Aug 25 10:15 UTC | HN request time: 0.001s | source

Show context

ekelsen ◴[28 Aug 25 14:46 UTC] No.45052850[source]▶

The math on the input tokens is definitely wrong. It claims each instance (8 GPUs) can handle 1.44 million tokens/sec of input. Let's check that out.

1.44e6 tokens/sec * 37e9 bytes/token / 3.3e12 bytes/sec/GPU = ~16,000 GPUs

And that's assuming a more likely 1 byte per parameter.

So the article is only off by a factor of at least 1,000. I didn't check any of the rest of the math, but that probably has some impact on their conclusions...

replies(5): >>45052936 #>>45052942 #>>45052964 #>>45053047 #>>45053166 #

1. endtime ◴[28 Aug 25 14:54 UTC] No.45052964[source]▶

>>45052850 #

> 37e9 bytes/token

This doesn't quite sound right...isn't a token just a few characters?

replies(1): >>45053116 #

2. ◴[28 Aug 25 15:05 UTC] No.45053116[source]▶

>>45052964 (TP) #

↑

Are OpenAI and Anthropic losing money on inference?