(arxiv.org)

204 points tdchaitanya | 1 comments | 01 Sep 25 16:57 UTC | HN request time: 0.211s | source

Show context

pbd ◴[01 Sep 25 17:49 UTC] No.45094941[source]▶

GPT-4 at $24.7 per million tokens vs Mixtral at $0.24 - that's a 100x cost difference! Even if routing gets it wrong 20% of the time, the economics still work. But the real question is how you measure 'performance' - user satisfaction doesn't always correlate with technical metrics.

replies(5): >>45095081 #>>45095225 #>>45095267 #>>45095811 #>>45095813 #

FINDarkside ◴[01 Sep 25 18:26 UTC] No.45095267[source]▶

>>45094941 #

It's trivial to get better score than GPT-4 with 1% of the cost by using my propertiary routing algorithm that routes all requests to Gemini 2.5 Flash. It's called GASP (Gemini Always, Save Pennies)

replies(1): >>45095832 #

nutjob2 ◴[01 Sep 25 19:30 UTC] No.45095832[source]▶

>>45095267 #

Does anyone working in an individual capacity actually end up paying for Gemini (Flash or Pro)? Or does Google boil you like a frog and you end up subscribing?

replies(4): >>45095961 #>>45096427 #>>45099635 #>>45100282 #

1. ivape ◴[02 Sep 25 08:00 UTC] No.45100282[source]▶

>>45095832 #

You get 1500 prompts on AIStudio across a few Gemini flash models. I think I saw 250 or 500 for 2.5. It’s basically free and beats the consumer rate limits of big apps (Claude, ChatGPT, Gemini, meta). I wonder when they’ll cut this off.

↑

Adaptive LLM routing under budget constraints