Are OpenAI and Anthropic losing money on inference?

(martinalderson.com)

507 points martinald | 2 comments | 28 Aug 25 10:15 UTC | HN request time: 0.02s | source

Show context

chillee ◴[28 Aug 25 21:45 UTC] No.45057409[source]▶

This article's math is wrong on many fundamental levels. One of the most obvious ones is that prefill is nowhere near bandwidth bound.

If you compute out the MFU the author gets it's 1.44 million input tokens per second * 37 billion active params * 2 (FMA) / 8 [GPUs per instance] = 13 Petaflops per second. That's approximately 7x absolutely peak FLOPS on the hardware. Obviously, that's impossible.

There's many other issues with this article, such as assuming only 32 concurrent requests(?), only 8 GPUs per instance as opposed to the more efficient/standard prefill-decode disagg setups, assuming that attention computation is the main thing that makes models compute-bound, etc. It's a bit of an indictment of HN's understanding of LLMs that most people are bringing up issues with the article that aren't any of the fundamental misunderstandings here.

replies(5): >>45057603 #>>45057767 #>>45057801 #>>45058397 #>>45060353 #

Den_VR ◴[28 Aug 25 22:06 UTC] No.45057603[source]▶

>>45057409 #

So, bottom line, do you think it’s probable that either OpenAI or Anthropic are “losing money on inference?”

replies(2): >>45057664 #>>45061050 #

chillee ◴[28 Aug 25 22:16 UTC] No.45057664[source]▶

>>45057603 #

No. In some sense, the article comes to the right conclusion haha. But it's probably >100x off on its central premise about output tokens costing more than input.

replies(2): >>45057722 #>>45057791 #

1. doctorpangloss ◴[28 Aug 25 22:23 UTC] No.45057722[source]▶

>>45057664 #

I’m pretty sure input tokens are cheap because they want to ingest the data for training later no? They want huge contexts to slice up.

replies(1): >>45062499 #

2. awwaiid ◴[29 Aug 25 10:59 UTC] No.45062499[source]▶

>>45057722 (TP) #

Afaik all the large providers flipped the default to contractually NOT train on your data. So no, training data context size is not a factor.

↑