Are OpenAI and Anthropic losing money on inference?

(martinalderson.com)

507 points martinald | 3 comments | 28 Aug 25 10:15 UTC | HN request time: 0s | source

Show context

noodletheworld ◴[28 Aug 25 15:25 UTC] No.45053394[source]▶

Huh.

I feel oddly skeptical about this article; I can't specifically argue the numbers, since I have no idea, but... there are some decent open source models; they're not state of the art, but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices?

The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?

Surely if its this cheap, and we're talking massive margins according to this, I should be able to get a cheap / run my own 600B param model.

Am I missing something?

It seems that reality (ie. the absence of people actually doing things this cheap) is the biggest critic of this set of calculations.

replies(10): >>45053436 #>>45053533 #>>45053550 #>>45053564 #>>45053601 #>>45053730 #>>45053776 #>>45053962 #>>45055164 #>>45055610 #

dragonwriter ◴[28 Aug 25 15:52 UTC] No.45053730[source]▶

>>45053394 #

> but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices

There are multiple API providers offering models at dirt cheap prices, enough so that there is at least one well-known API provider that is an aggreggator of other API providers that offers lots of models at $0.

> The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?

https://openrouter.ai/deepseek/deepseek-r1-0528:free

replies(2): >>45053907 #>>45054952 #

1. booi ◴[28 Aug 25 16:08 UTC] No.45053907[source]▶

>>45053730 #

you can also run deepseek for free on a modestly sized laptop

replies(2): >>45053980 #>>45054238 #

2. dragonwriter ◴[28 Aug 25 16:16 UTC] No.45053980[source]▶

>>45053907 (TP) #

At 4-bit quant, R1 takes 300+ gigs just for weights. You can certainly run smaller models into which R1 has been distilled on a modest laptop, but I don't see how you can run R1 itself on anything that wouldn't be considered extreme for a laptop in at least one dimension.

3. svachalek ◴[28 Aug 25 16:40 UTC] No.45054238[source]▶

>>45053907 (TP) #

You're probably thinking of what ollama labels "deepseek" which is not in fact deepseek, but other models with some deepseek distilled into them.

↑