←back to thread

204 points tdchaitanya | 4 comments | | HN request time: 0.803s | source
Show context
pbd ◴[] No.45094941[source]
GPT-4 at $24.7 per million tokens vs Mixtral at $0.24 - that's a 100x cost difference! Even if routing gets it wrong 20% of the time, the economics still work. But the real question is how you measure 'performance' - user satisfaction doesn't always correlate with technical metrics.
replies(5): >>45095081 #>>45095225 #>>45095267 #>>45095811 #>>45095813 #
simpaticoder ◴[] No.45095811[source]
PPT (price-per-token) is insufficient to compute cost. You will also need to know an average tokens-per-interaction (TPI). They multiply to give you a cost estimate. A .01x PPT is wiped out by 100x TPI.
replies(1): >>45096397 #
1. monsieurbanana ◴[] No.45096397[source]
Are you saying that some models will take 100x more tokens than other (models in the same ballpark) for the same task? Is the 100 a real measured metric or just random numbers to illustrate a point?
replies(2): >>45096631 #>>45102666 #
2. simpaticoder ◴[] No.45096631[source]
With thinking models, yes 100x is not just possible, but probable. You get charged for the intermediate thinking tokens, even if you don't see them (which is the case for Grok, for example). And even if you do see them, they won't necessarily add value.
replies(1): >>45113503 #
3. datadrivenangel ◴[] No.45102666[source]
the GPT 5 models use ~10x more tokens depending on the reasoning settings.
4. monsieurbanana ◴[] No.45113503[source]
> With thinking models, yes 100x is not just possible, but probable

So the answer is no then, because I don't put reasoning and non-reasoning models in the same ballpark when it comes to token usage. You can just turn off reasoning.