Are OpenAI and Anthropic losing money on inference?

(martinalderson.com)

507 points martinald | 1 comments | 28 Aug 25 10:15 UTC | HN request time: 0s | source

Show context

ankit219 ◴[28 Aug 25 15:36 UTC] No.45053523[source]▶

This seems very very far off. From the latest reports, anthropic has a gross margin of 60%. It came out in their latest fundraising story. From that one The Information report, it estimated OpenAI's GM to be 50% including free users. These are gross margins so any amortization or model training cost would likely come after this.

Then, today almost every lab uses methods like speculative decoding and caching which reduce the cost and speed up things significantly.

The input numbers are far off. The assumption is 37B of active parameters. Sonnet 4 is supposedly a 100B-200B param model. Opus is about 2T params. Both of them (even if we assume MoE) wont have exactly these number of output params. Then there is a cost to hosting and activating params at inference time. (the article kind of assumes it would be the same constant 37B params).

replies(2): >>45053768 #>>45054031 #

thegeomaster ◴[28 Aug 25 16:21 UTC] No.45054031[source]▶

>>45053523 #

Are you saying that you think Sonnet 4 has 100B-200B _active_ params? And that Opus has 2T active? What data are you basing these outlandish assumptions on?

replies(2): >>45054387 #>>45055219 #

ankit219 ◴[28 Aug 25 16:53 UTC] No.45054387[source]▶

>>45054031 #

Oh nothing official. There are people who estimate the sizes based on tok/s, cost, benchmarks etc. The one that most go on is https://lifearchitect.substack.com/p/the-memo-special-editio.... This guy estimated Claude 3 opus to be 2T param model (given the pricing + speed). Opus 4 is 1.2T param according to him (but then I dont understand why the price remained the same.). Sonnet is estimated by various people to be around 100B-200B params.

[1]: https://docs.google.com/spreadsheets/d/1kc262HZSMAWI6FVsh0zJ...

replies(2): >>45054508 #>>45058226 #

1. thegeomaster ◴[28 Aug 25 23:34 UTC] No.45058226[source]▶

>>45054387 #

tok/s cannot in any way be used to estimate parameters. It's a tradeoff made at inference time. You can adjust your batch size to serve 1 user at a huge tok/s or many users at a slow tok/s.

↑