(arxiv.org)

204 points tdchaitanya | 1 comments | 01 Sep 25 16:57 UTC | HN request time: 0.428s | source

Show context

pbd ◴[01 Sep 25 17:49 UTC] No.45094941[source]▶

GPT-4 at $24.7 per million tokens vs Mixtral at $0.24 - that's a 100x cost difference! Even if routing gets it wrong 20% of the time, the economics still work. But the real question is how you measure 'performance' - user satisfaction doesn't always correlate with technical metrics.

replies(5): >>45095081 #>>45095225 #>>45095267 #>>45095811 #>>45095813 #

FINDarkside ◴[01 Sep 25 18:26 UTC] No.45095267[source]▶

>>45094941 #

It's trivial to get better score than GPT-4 with 1% of the cost by using my propertiary routing algorithm that routes all requests to Gemini 2.5 Flash. It's called GASP (Gemini Always, Save Pennies)

replies(1): >>45095832 #

nutjob2 ◴[01 Sep 25 19:30 UTC] No.45095832[source]▶

>>45095267 #

Does anyone working in an individual capacity actually end up paying for Gemini (Flash or Pro)? Or does Google boil you like a frog and you end up subscribing?

replies(4): >>45095961 #>>45096427 #>>45099635 #>>45100282 #

1. baq ◴[02 Sep 25 06:08 UTC] No.45099635[source]▶

>>45095832 #

If I actually had time to work on my hobby projects Gemini pro would be the first thing I’d spend money on. As is, it’s amazing how much progress you can squeeze out of those 5 chats every 24h; I can get a couple hours of before-times hacking done in 15 minutes, which is incidentally when free usage gets throttled and my free time runs out.

↑

Adaptive LLM routing under budget constraints