←back to thread

507 points martinald | 1 comments | | HN request time: 0.212s | source
Show context
jsnell ◴[] No.45051797[source]
I don't believe the asymmetry between prefill and decode is that large. If it were, it would make no sense for most of the providers to have separate pricing for prefill with cache hits vs. without.

Given the analysis is based on R1, Deepseek's actual in-production numbers seem highly relevant: https://github.com/deepseek-ai/open-infra-index/blob/main/20...

(But yes, they claim 80% margins on the compute in that article.)

> When established players emphasize massive costs and technical complexity, it discourages competition and investment in alternatives

But it's not the established players emphasizing the costs! They're typically saying that inference is profitable. Instead the false claims about high costs and unprofitability are part of the anti-AI crowd's standard talking points.

replies(1): >>45051921 #
martinald ◴[] No.45051921[source]
Yes. I was really surprised at this myself (author here). If you have some better numbers I'm all ears. Even on my lowly 9070XT I get 20x the tok/s input vs output, and I'm not doing batching or anything locally.

I think the cache hit vs miss stuff makes sense at >100k tokens where you start getting compute bound.

replies(2): >>45052374 #>>45053461 #
1. Filligree ◴[] No.45053461[source]
Maybe because you aren’t doing batching? It sounds like you’re assuming that would benefit prefill more than decode, but I believe it’s the other way around.