(arxiv.org)

204 points tdchaitanya | 4 comments | 01 Sep 25 16:57 UTC | HN request time: 0.803s | source

Show context

pbd ◴[01 Sep 25 17:49 UTC] No.45094941[source]▶

GPT-4 at $24.7 per million tokens vs Mixtral at $0.24 - that's a 100x cost difference! Even if routing gets it wrong 20% of the time, the economics still work. But the real question is how you measure 'performance' - user satisfaction doesn't always correlate with technical metrics.

replies(5): >>45095081 #>>45095225 #>>45095267 #>>45095811 #>>45095813 #

simpaticoder ◴[01 Sep 25 19:27 UTC] No.45095811[source]▶

>>45094941 #

PPT (price-per-token) is insufficient to compute cost. You will also need to know an average tokens-per-interaction (TPI). They multiply to give you a cost estimate. A .01x PPT is wiped out by 100x TPI.

replies(1): >>45096397 #

1. monsieurbanana ◴[01 Sep 25 20:35 UTC] No.45096397[source]▶

>>45095811 #

Are you saying that some models will take 100x more tokens than other (models in the same ballpark) for the same task? Is the 100 a real measured metric or just random numbers to illustrate a point?

replies(2): >>45096631 #>>45102666 #

2. simpaticoder ◴[01 Sep 25 21:10 UTC] No.45096631[source]▶

>>45096397 (TP) #

With thinking models, yes 100x is not just possible, but probable. You get charged for the intermediate thinking tokens, even if you don't see them (which is the case for Grok, for example). And even if you do see them, they won't necessarily add value.

replies(1): >>45113503 #

3. datadrivenangel ◴[02 Sep 25 13:10 UTC] No.45102666[source]▶

>>45096397 (TP) #

the GPT 5 models use ~10x more tokens depending on the reasoning settings.

4. monsieurbanana ◴[03 Sep 25 08:37 UTC] No.45113503[source]▶

>>45096631 #

> With thinking models, yes 100x is not just possible, but probable

So the answer is no then, because I don't put reasoning and non-reasoning models in the same ballpark when it comes to token usage. You can just turn off reasoning.

↑

Adaptive LLM routing under budget constraints