Do you foresee these limitations increasing anytime soon?
Quick Edit: Just wanted to also say thank you for all your hard work, Claude has been phenomenal.
And I'm also sure that you're working on it, but some kind of auto-summarization of facts to reduce the context in order to avoid penalizing long threads would be sweet.
I don't know if your internal users are dogfooding the product that has user limits, so you may not have had this feedback - it makes me irritable/stressed to know that I'm running up close to the limit without having gotten to the bottom of a bug. I don't think stress response in your users is a desirable thing :).
We currently serve ~10bn tokens per day (across all models). OpenAI compatible API. No rate limits. Built in logging and tracing.
I work with LLMs every day, so I am always on top of adding models. 3.7 is also already available.
https://glama.ai/models/claude-3-7-sonnet-20250219
The gateway is integrated directly into our chat (https://glama.ai/chat). So you can use most of the things that you are used to having with Claude. And if anything is missing, just let me know and I will prioritize it. If you check our Discord, I have a decent track record of being receptive to feedback and quickly turning around features.
Long term, Glama's focus is predominantly on MCPs, but chat, gateway and LLM routing is integral to the greater vision.
I would love feedback if you are going to give a try frank@glama.ai
You can track costs in a few ways and set spend limits to avoid surprises: https://docs.anthropic.com/en/docs/agents-and-tools/claude-c...
The value proposition of Glama is that it combines UI and API.
While everyone focuses on either one or the other, I've been splitting my time equally working on both.
Glama UI would not win against Anthropic if we were to compare them by the number of features. However, the components that I developed were created with craft and love.
You have access to:
* Switch models between OpenAI/Anthropic, etc.
* Side-by-side conversations
* Full-text search of all your conversations
* Integration of LaTeX, Mermaid, rich-text editing
* Vision (uploading images)
* Response personalizations
* MCP
* Every action has a shortcut via cmd+k (ctrl+k)
But I still hit limits, I use Claudemind with jetbrains stuff and there is a max of input tokens (j believe), I am ‘tier 2’ but doesn’t look like I can go past this without an enterprise agreement
It is provided by DeepSeek and Avian.
I am also midway of enabling a third-provider (Nebius).
You can see all models/providers over at https://glama.ai/models
As another commenter in this tread said, we are just a 'frontend wrapper' around other people services. Therefore, it is not particularly difficult to add models that are already supported by other providers.
The benefit of using our wrapper is that you can use a single API key and you get one bill for all your AI bills, you don't need to hack together your own logic for routing requests between different providers, failovers, keeping track of their costs, worry what happens if a provider goes down, etc.
The market at the moment is hugely fragmented, with many providers unstable, constantly shifting prices, etc. The benefit of a router is that you don't need to worry about those things.
A lot of people just want the ability to pay more in order to get more.
I would gladly pay 10x more to get relatively modest increases in performance. That is how important the intelligence is.
i.e I'd like my chat and API usage to be all included under a flat-rate subscription.
Currenty Pro doesn't give me any API credits to use with coding assistants (Claude Code included ?) which is completely disjointed. And I need to be a business to use the API still ?
Honestly, Claude is so good, just please take my money and make it easy to do the above !
They have a very solid infrastructure.
Scaling infrastructure to handle billions of tokens is no joke.
I believe they are approaching 1 trillion tokens per week.
Glama is way smaller. We only recently crossed 10bn tokens per day.
However, I have invested a lot more into UX/UI of that chat itself, i.e. while OpenRouter is entirely focused on API gateway (which is working for them), I am going for a hybrid approach.
The market is big enough for both projects to co-exist.
1. Subscribe to Claude Pro for $20 month
2. Separately, Buy $100 worth of API credits.
Now you have a Claude "ultimate" subscription where the credits roll over as an added bonus.
As someone who only uses the APIs, and not the subscription services for AI, I can tell you that $100 is A LOT of usage. Quite frankly, I've never used anywhere close to $20 in a month which is why I don't subscribe. I mostly just use text though, so if you do a lot of image generation that can add up quickly
As long as capacity is an issue, you can't have both
Try to delete (close) the panel on the right on a side-by-side view. It took a good second to actually close. Creating one isn't much faster.
This is unbearably slow, to be blurt.
I would pay $50/mo or something to be able to have reasonable use of Claude Code in a limited (but not as limited) way as through the web UI, but all of these coding tools seem to work only with the API and are therefore either too expensive or too limited.
I've used https://github.com/cline/cline to get a similar workflow to their Claude Code demo, and yes it's amazing how quickly the token counts add up. Claude seems to have capacity issues so I'm guessing they decided to charge a premium for what they can serve up.
+1 on the too expensive or too limited sentiment. I subscribed to Claude for quite a while but got frustrated the few times I would use it heavily I'd get stuck due to the rate limits.
I could stomach a $20-$50 subscription for something like 3.7 that I could use a lot when coding, and not worry about hitting limits (or I suspect being pushed on to a quantized/smaller model when used too much).
It became such an anti-pattern that I stopped paying. Now, when people ask me which one to use, I always say I like Claude more than others, but I don’t recommend using it in a professional setting.
Like $5+ was cache read ($0.05/token vs $3/token) so it would have cost $300+
The entire LOTR trilogy is ~.55 million tokens (1,200 pages, published).
If you are sending and receiving the text equivalent of several hundred copies of the LOTR trilogy every week, I don't think you are actually using AI for anything useful, or you are providing far too much context.