←back to thread

507 points martinald | 3 comments | | HN request time: 0s | source
Show context
noodletheworld ◴[] No.45053394[source]
Huh.

I feel oddly skeptical about this article; I can't specifically argue the numbers, since I have no idea, but... there are some decent open source models; they're not state of the art, but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices?

The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?

Surely if its this cheap, and we're talking massive margins according to this, I should be able to get a cheap / run my own 600B param model.

Am I missing something?

It seems that reality (ie. the absence of people actually doing things this cheap) is the biggest critic of this set of calculations.

replies(10): >>45053436 #>>45053533 #>>45053550 #>>45053564 #>>45053601 #>>45053730 #>>45053776 #>>45053962 #>>45055164 #>>45055610 #
dragonwriter ◴[] No.45053730[source]
> but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices

There are multiple API providers offering models at dirt cheap prices, enough so that there is at least one well-known API provider that is an aggreggator of other API providers that offers lots of models at $0.

> The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?

https://openrouter.ai/deepseek/deepseek-r1-0528:free

replies(2): >>45053907 #>>45054952 #
1. booi ◴[] No.45053907[source]
you can also run deepseek for free on a modestly sized laptop
replies(2): >>45053980 #>>45054238 #
2. dragonwriter ◴[] No.45053980[source]
At 4-bit quant, R1 takes 300+ gigs just for weights. You can certainly run smaller models into which R1 has been distilled on a modest laptop, but I don't see how you can run R1 itself on anything that wouldn't be considered extreme for a laptop in at least one dimension.
3. svachalek ◴[] No.45054238[source]
You're probably thinking of what ollama labels "deepseek" which is not in fact deepseek, but other models with some deepseek distilled into them.