Are OpenAI and Anthropic losing money on inference?

(martinalderson.com)

507 points martinald | 2 comments | 28 Aug 25 10:15 UTC | HN request time: 0.019s | source

Show context

ankit219 ◴[28 Aug 25 15:36 UTC] No.45053523[source]▶

This seems very very far off. From the latest reports, anthropic has a gross margin of 60%. It came out in their latest fundraising story. From that one The Information report, it estimated OpenAI's GM to be 50% including free users. These are gross margins so any amortization or model training cost would likely come after this.

Then, today almost every lab uses methods like speculative decoding and caching which reduce the cost and speed up things significantly.

The input numbers are far off. The assumption is 37B of active parameters. Sonnet 4 is supposedly a 100B-200B param model. Opus is about 2T params. Both of them (even if we assume MoE) wont have exactly these number of output params. Then there is a cost to hosting and activating params at inference time. (the article kind of assumes it would be the same constant 37B params).

replies(2): >>45053768 #>>45054031 #

mutkach ◴[28 Aug 25 15:55 UTC] No.45053768[source]▶

>>45053523 #

Gross margins also don't tell the whole story, we don't know how much Azure and Amazon charge for the infrastructure and we have reasons to believe they are selling it at a massive discount (Microsoft definitely does that, as follows from their agreement with OpenAI). They get the model, OpenAI gets discounted infra.

replies(1): >>45053892 #

1. ankit219 ◴[28 Aug 25 16:06 UTC] No.45053892[source]▶

>>45053768 #

A discounted Azure H100 will still be more than $2 per hour. Same goes for AWS. Trainium chips are new and not as effective (not saying they are bad) but still cost in the same range.

For inference, gross margins are exactly: (what companies charge per 1M tokens to the user) - (direct cost to produce that 1M tokens which is GPU costs).

replies(1): >>45054000 #

2. mutkach ◴[28 Aug 25 16:18 UTC] No.45054000[source]▶

>>45053892 (TP) #

I am implying that what OpenAI pays for GPU/hour is much less than $2, because of the discount. That's an assumption. It could be $1, $0.5, no?

It could still be burning money for Microsoft/Amazon

↑