The Gemini 2.5 Pro 05/06 release by Google’s own reported benchmarks was worse in 10/12 cases than the 3/25 version. Google re routed all traffic for the 3/25 checkpoint to the 05/06 version in the API.
I’m also unsure who needs all of these expanded quotas because the old Gemini subscription had higher quotas than I could ever anticipate using.
"Google AI Ultra" is a consumer offering though, there's no API to have quotas for?
For example - what if someone were to start a company around a fork of LiteLLM? https://litellm.ai/
LiteLLM, out of the box, lets you create a number of virtual API keys. Each key can be assigned to a user or a team, and can be granted access to one or more models (and their associated keys). Models are configured globally, but can have an arbitrary number of "real" and "virtual" keys.
Then you could sell access to a host of primary providers - OpenAI, Google, Anthropic, Groq, Grok, etc. - through a single API endpoint and key. Users could switch between them by changing a line in a config file or choosing a model from a dropdown, depending on their interface.
Assuming you're able to build a reasonable userbase, presumably you could then contract directly with providers for wholesale API usage. Pricing would be tricky, as part of your value prop would be abstracting away marginal costs, but I strongly suspect that very few people are actually consuming the full API quotas on these $200+ plans. Those that are are likely to be working directly with the providers to reduce both cost and latency, too.
The other value you could offer is consistency. Your engineering team's core mission would be providing a consistent wrapper for all of these models - translating between OpenAI-compatible, Llama-style, and Claude-style APIs on the fly.
Is there already a company doing this? If not, do you think this is a good or bad idea?
I'll investigate. Thanks!
The only reason I maintain Claude and OpenAi subscriptions is because I expect Google to pull the rug on what has been their competitive advantage since Gemini 2.5.
Have you also noticed a degradation in quality over long chat sessions? I've noticed it in NotebookLM specifically, but not Gemini 2.5. I anticipate this to become the standard, your chat degrades subtly over time.
Have you tried say O1 Pro Mode? And if you have, do you find it as good as whatever free models you use?
If you haven't, it's kind of weird to do the comparison without actually having tried it.
If you don't really have a problem to solve and you're just chatting, then "good" is just, like, your vibe, man.