Most active commenters

    ←back to thread

    205 points tdchaitanya | 19 comments | | HN request time: 1.652s | source | bottom
    1. pbd ◴[] No.45094941[source]
    GPT-4 at $24.7 per million tokens vs Mixtral at $0.24 - that's a 100x cost difference! Even if routing gets it wrong 20% of the time, the economics still work. But the real question is how you measure 'performance' - user satisfaction doesn't always correlate with technical metrics.
    replies(5): >>45095081 #>>45095225 #>>45095267 #>>45095811 #>>45095813 #
    2. Keyframe ◴[] No.45095081[source]
    number of complaints / million tokens?
    3. pqtyw ◴[] No.45095225[source]
    > GPT-4 at $24.7 per million tokens

    While technically true why would you want to use it when OpenAI itself provides a bunch of many times cheaper and better models?

    replies(1): >>45095604 #
    4. FINDarkside ◴[] No.45095267[source]
    It's trivial to get better score than GPT-4 with 1% of the cost by using my propertiary routing algorithm that routes all requests to Gemini 2.5 Flash. It's called GASP (Gemini Always, Save Pennies)
    replies(1): >>45095832 #
    5. KTibow ◴[] No.45095604[source]
    RouterBench is from March 2024.
    6. simpaticoder ◴[] No.45095811[source]
    PPT (price-per-token) is insufficient to compute cost. You will also need to know an average tokens-per-interaction (TPI). They multiply to give you a cost estimate. A .01x PPT is wiped out by 100x TPI.
    replies(1): >>45096397 #
    7. mkoubaa ◴[] No.45095813[source]
    > How you measure 'performance'

    I heard the best way is through valuations

    8. nutjob2 ◴[] No.45095832[source]
    Does anyone working in an individual capacity actually end up paying for Gemini (Flash or Pro)? Or does Google boil you like a frog and you end up subscribing?
    replies(4): >>45095961 #>>45096427 #>>45099635 #>>45100282 #
    9. aspect8445 ◴[] No.45095961{3}[source]
    I've used Gemini in a lot of personal projects. At this point I've probably made tens of thousands of requests, sometimes exceeding 1k per week. So far, I haven't had to pay a dime!
    replies(1): >>45097258 #
    10. monsieurbanana ◴[] No.45096397[source]
    Are you saying that some models will take 100x more tokens than other (models in the same ballpark) for the same task? Is the 100 a real measured metric or just random numbers to illustrate a point?
    replies(2): >>45096631 #>>45102666 #
    11. dcre ◴[] No.45096427{3}[source]
    I've paid a few dollars a month for my API usage for about 6 months.
    12. simpaticoder ◴[] No.45096631{3}[source]
    With thinking models, yes 100x is not just possible, but probable. You get charged for the intermediate thinking tokens, even if you don't see them (which is the case for Grok, for example). And even if you do see them, they won't necessarily add value.
    replies(1): >>45113503 #
    13. worm00111 ◴[] No.45097258{4}[source]
    How come you don't need to pay? Do you get it for free somehow?
    replies(1): >>45097410 #
    14. KETHERCORTEX ◴[] No.45097410{5}[source]
    There's free tier for API.
    replies(1): >>45098785 #
    15. drittich ◴[] No.45098785{6}[source]
    "When you use Unpaid Services, including, for example, Google AI Studio and the unpaid quota on Gemini API, Google uses the content you submit to the Services and any generated responses to provide, improve, and develop Google products and services and machine learning technologies, including Google's enterprise features, products, and services, consistent with our Privacy Policy.

    To help with quality and improve our products, human reviewers may read, annotate, and process your API input and output. Google takes steps to protect your privacy as part of this process. This includes disconnecting this data from your Google Account, API key, and Cloud project before reviewers see or annotate it. Do not submit sensitive, confidential, or personal information to the Unpaid Services."

    Reference: https://ai.google.dev/gemini-api/terms

    16. baq ◴[] No.45099635{3}[source]
    If I actually had time to work on my hobby projects Gemini pro would be the first thing I’d spend money on. As is, it’s amazing how much progress you can squeeze out of those 5 chats every 24h; I can get a couple hours of before-times hacking done in 15 minutes, which is incidentally when free usage gets throttled and my free time runs out.
    17. ivape ◴[] No.45100282{3}[source]
    You get 1500 prompts on AIStudio across a few Gemini flash models. I think I saw 250 or 500 for 2.5. It’s basically free and beats the consumer rate limits of big apps (Claude, ChatGPT, Gemini, meta). I wonder when they’ll cut this off.
    18. datadrivenangel ◴[] No.45102666{3}[source]
    the GPT 5 models use ~10x more tokens depending on the reasoning settings.
    19. monsieurbanana ◴[] No.45113503{4}[source]
    > With thinking models, yes 100x is not just possible, but probable

    So the answer is no then, because I don't put reasoning and non-reasoning models in the same ballpark when it comes to token usage. You can just turn off reasoning.