←back to thread

507 points martinald | 1 comments | | HN request time: 0.219s | source
Show context
noodletheworld ◴[] No.45053394[source]
Huh.

I feel oddly skeptical about this article; I can't specifically argue the numbers, since I have no idea, but... there are some decent open source models; they're not state of the art, but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices?

The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?

Surely if its this cheap, and we're talking massive margins according to this, I should be able to get a cheap / run my own 600B param model.

Am I missing something?

It seems that reality (ie. the absence of people actually doing things this cheap) is the biggest critic of this set of calculations.

replies(10): >>45053436 #>>45053533 #>>45053550 #>>45053564 #>>45053601 #>>45053730 #>>45053776 #>>45053962 #>>45055164 #>>45055610 #
1. colinsane ◴[] No.45055610[source]
> I should be able to get a cheap / run my own 600B param model.

if the margins on hosted inference are 80%, then you need > 20% utilization of whatever you build for yourself for this to be less costly to you (on margin).

i self-host open weight models (please: deepseek et al aren't open _source_) on whatever $300 GPU i bought a few years ago, but if it outputs 2 tokens/sec then i'm waiting 10 minutes for most results. if i want results in 10s instead of 10m, i'll be paying $30000 instead. if i'm prompting it 100 times during the day, then it's idle 99% of the time.

coordinating a group buy for that $30000 GPU and sharing that across 100 people probably makes more sense than either arrangement in the previous paragraph. for now, that's a big component of what model providers, uh, provide.