(arxiv.org)

204 points tdchaitanya | 1 comments | 01 Sep 25 16:57 UTC | HN request time: 0.203s | source

Show context

pbd ◴[01 Sep 25 17:49 UTC] No.45094941[source]▶

GPT-4 at $24.7 per million tokens vs Mixtral at $0.24 - that's a 100x cost difference! Even if routing gets it wrong 20% of the time, the economics still work. But the real question is how you measure 'performance' - user satisfaction doesn't always correlate with technical metrics.

replies(5): >>45095081 #>>45095225 #>>45095267 #>>45095811 #>>45095813 #

simpaticoder ◴[01 Sep 25 19:27 UTC] No.45095811[source]▶

>>45094941 #

PPT (price-per-token) is insufficient to compute cost. You will also need to know an average tokens-per-interaction (TPI). They multiply to give you a cost estimate. A .01x PPT is wiped out by 100x TPI.

replies(1): >>45096397 #

monsieurbanana ◴[01 Sep 25 20:35 UTC] No.45096397[source]▶

>>45095811 #

Are you saying that some models will take 100x more tokens than other (models in the same ballpark) for the same task? Is the 100 a real measured metric or just random numbers to illustrate a point?

replies(2): >>45096631 #>>45102666 #

1. datadrivenangel ◴[02 Sep 25 13:10 UTC] No.45102666[source]▶

>>45096397 #

the GPT 5 models use ~10x more tokens depending on the reasoning settings.

↑

Adaptive LLM routing under budget constraints