Do you foresee these limitations increasing anytime soon?
Quick Edit: Just wanted to also say thank you for all your hard work, Claude has been phenomenal.
Can you tell us more about the trade-offs here?
Also, are you using synthetic data for improving the responses here, or are you purely leveraging data from usage/partner's usage?
So, perfect timing on this release for me! I decided to install Claude Code and it is making short work of this. I love the interface. I love the personality ("Ruminating", "Schlepping", etc).
Just an all around fantastic job!
(This makes me especially bummed that I really messed up my OA awhile back for you guys. I'll try again in a few months!)
Keep on doing great work. Thank you!
With Claude Code, the goal is clearly to take a slice of Cursor and its competitors' market share. I expected this to happen eventually.
The app layer has barely any moat, so any successful app with the potential to generate significant revenue will eventually be absorbed by foundation model companies in their quest for growth and profits.
TLDR: asking claude to speed up my code once 1.8x'd perf, but putting it in a loop telling it to make it faster for 2 hours led to a 500x speedup!
Could you speak at all about potential IDE integrations? An integration into Jetbrains IDEs would be super useful - I imagine being able to highlight a bit of code and having a plugin check the code graph to see dependencies, tests etc that might be affected by a change.
Copying and pasting code constantly is starting to seem a bit primitive.
Also, is there a way to switch models between 3.5-sonnet and 3.5-sonnet-thinking? Got the initial impression that the thinking model is using an excessive amount of tokens on first use.
Will check out Claude Code soon, but in the meantime one unrelated other feature request: Moving existing chats into a project. I have a number of old-ish but super-useful and valuable chats (that are superficially unrelated) that I would like to bring together in a project.
And I'm also sure that you're working on it, but some kind of auto-summarization of facts to reduce the context in order to avoid penalizing long threads would be sweet.
I don't know if your internal users are dogfooding the product that has user limits, so you may not have had this feedback - it makes me irritable/stressed to know that I'm running up close to the limit without having gotten to the bottom of a bug. I don't think stress response in your users is a desirable thing :).
We currently serve ~10bn tokens per day (across all models). OpenAI compatible API. No rate limits. Built in logging and tracing.
I work with LLMs every day, so I am always on top of adding models. 3.7 is also already available.
https://glama.ai/models/claude-3-7-sonnet-20250219
The gateway is integrated directly into our chat (https://glama.ai/chat). So you can use most of the things that you are used to having with Claude. And if anything is missing, just let me know and I will prioritize it. If you check our Discord, I have a decent track record of being receptive to feedback and quickly turning around features.
Long term, Glama's focus is predominantly on MCPs, but chat, gateway and LLM routing is integral to the greater vision.
I would love feedback if you are going to give a try frank@glama.ai
$ curl https://api.anthropic.com/v1/models --header "x-api-key: $ANTHROPIC_API_KEY" --header "anthropic-version: 2023-06-01"
{"type":"error","error":{"type":"not_found_error","message":"Not found"}}
Edit: Tried creating a different API key and it works with that one. Weird.
Thinking and not thinking is actually the same model! The model thinks automatically when you ask it to. If you don't explicitly ask it to think, it won't use thinking.
Claude is for Code: https://medium.com/thoughts-on-machine-learning/claude-is-fo...
(2) It's not clear to me that users (or developers) actually behave this way in practice. Engineering is a bit of a cargo cult. Cursor got popular because it was good but it also got popular because it got popular.
Not OP obviously, but I've built so many applications with Claude, here are just a few:
[1]
Mockup of Utopian infrastructure support button (this is just a mockup, the buttons don't do anything): https://claude.site/artifacts/435290a1-20c4-4b9b-8731-67f5d8...
[2]
Robot body simulation: https://claude.site/artifacts/6ffd3a73-43d6-4bdb-9e08-02901d...
[3]
15-piece slider puzzle: https://claude.site/artifacts/4504269b-69e3-4b76-823f-d55b3e...
[4]
Canada joining the U.S., checklist: https://claude.site/artifacts/6e249e38-f891-4aad-bb47-2d0c81...
[5]
Secure encryption and decryption with AES-256-GCM with password-based key derivation:
https://claude.site/artifacts/cb0ac898-e5ad-42cf-a961-3c4bf8...
(Try to decrypt this message
kFIxcBVRi2bZVGcIiQ7nnS0qZ+Y+1tlZkEtAD88MuNsfCUZcr6ujaz/mtbEDsLOquP4MZiKcGeTpBbXnwvSLLbA/a2uq4QgM7oJfnNakMmGAAtJ1UX8qzA5qMh7b5gze32S5c8OpsJ8=
With the password "Hello Hacker News!!" (without quotation marks))
[6]
Supply-demand visualizer under tariffs and subsidies: https://claude.site/artifacts/455fe568-27e5-4239-afa4-051652...
[7]
fortune cookie program: https://claude.site/artifacts/d7cfa4ae-6946-47af-b538-e6f992...
[8]
Household security training for classified household members (includes self-assessment and certificate): https://claude.site/artifacts/7754dae3-a095-4f02-b4d3-26f1a5...
[9]
public service accountability training program: https://claude.site/artifacts/b89a69fb-1e46-4b5c-9e96-2c29dd...
[10]
Nuclear non-proliferation "big brother" agent technical demonstration: https://claude.site/artifacts/555d57ba-6b0e-41a1-ad26-7c90ca...
Dating stuff:
[11]
Dating help: Interest Level Assessment Game (is she interested?) https://claude.site/artifacts/523c935c-274e-4efa-8480-1e09e9...
[12]
Dating checklist: https://claude.site/artifacts/10bf8bea-36d5-407d-908a-c1e156...
At work I used it many times daily in development. It's concise mode is a breath of fresh air compared to any other llm I've tried. It has helped me find bugs in foreign code bases, explain me the techstack, written bash scripts, saving me dozens of hours of work & many nerves. It generally makes me reach places I wouldn't without due to time constraints & nerves.
The only nitpick is that the service reliability is a bit worse than others, forcing me sometimes to switch to others. This is probably a hard to answer question, but are there plans to improve that?
You can track costs in a few ways and set spend limits to avoid surprises: https://docs.anthropic.com/en/docs/agents-and-tools/claude-c...
The value proposition of Glama is that it combines UI and API.
While everyone focuses on either one or the other, I've been splitting my time equally working on both.
Glama UI would not win against Anthropic if we were to compare them by the number of features. However, the components that I developed were created with craft and love.
You have access to:
* Switch models between OpenAI/Anthropic, etc.
* Side-by-side conversations
* Full-text search of all your conversations
* Integration of LaTeX, Mermaid, rich-text editing
* Vision (uploading images)
* Response personalizations
* MCP
* Every action has a shortcut via cmd+k (ctrl+k)
Are you finding that extended thinking helps a lot when the whole problem can be posed in the prompt, but that it isn't a major benefit for agentic tasks?
It would be a bit surprising, but it would also mirror my experiences, and the benchmarks which show Claude 3.5 being better at agentic tasks and SWE tasks than all other models, despite not being a reasoning model.
Haven't tried to build a modern JS web app in years — it took the claude tool just a few minutes of prompting to convert and refactor an old clunky tool into a proper project structure, and using svelte and vite and tailwind (which I haven't built with before). Trying to learn how to even scaffold a modern app has felt daunting and this eliminates 99% of that friction.
One funny quirk: I asked it to build a test suite (I know zilch about JS testing frameworks, so it picked vitest for me) for the newly refactored app. I noticed that 3 of the 20 tests failed and so I asked it to run vitest for itself and fix the failing things. 2 minutes later, and now 7 tests were failing...
Which is very funny to me, but also not a big deal. Again, it's such a chore to research test libs and then set things up to their conventions. That the claude tool built a very usable scaffold that I can then edit and iterate on is such a huge benefit by itself, I don't need (nor desire) the AI to be complete turnkey solution.
For anyone interested - you can extend Claude's functionality by allowing it to run commands via a local "MCP server" (e.g. make code commits, create files, retrieve third party library code etc).
Then when you're running Claude it asks for permission to run a specific tool inside your usual Claude UI.
There are several AIDEs out there, and based on working with Cursor, VS Code, and Windsurf there doesn't seem to be much of a difference (although I like Windsurf best). What moat does Cursor have?
I recently attempted to use the Google Drive integration but didn't follow through with connecting because Claude wanted access to my entire Google Drive. I understand this simplifies the user experience and reduced time to ship, but is there anyway the team can add "reduce the access scope of Google Drive integration" to your backlog. Thank you!
Also, I just caught the new Github integration. Awesome.
Here are steps to reproduce.
Background/environment:
ChatGPT helped me build this complete web browser in Python:
https://taonexus.com/publicfiles/feb2025/71toy-browser-with-...
It looks like this, versus the eventual goal: https://imgur.com/a/j8ZHrt1
in 1055 lines. But eventually it couldn't improve on it anymore, ChatGPT couldn't modify it at my request so that inline elements would be on the same line.
If you want to run it just download it and rename it to .py, I like Anaconda as an environment, after reading the code you can install the required libraries with:
conda install -c conda-forge requests pillow urllib3
then run the browser from the Anaconda prompt by just writing "python " followed by the name of the file.
2.
I tried to continue to improve the program with Claude, so that in-line elements would be on the same line.
I performed these reproduceable steps:
1. copied the code and pasted it into a Claude chat window with ctrl-v. This keeps it in the chat as paste.
2. Gave it the prompt "This complete web browser works but doesn't lay out inline elements inline, it puts them all on a new line, can you fix it so inline elements are inline?"
It spit out code until it hit section 8 out of 9 which is 70% of the way through and gave the error message "Claude hit the max length for a message and has paused its response. You can write Continue to keep the chat going". Screenshot:
So I wrote "Continue" and it stops when it is 90% of the way done.
Again it got stuck at 90% of the way done, second screenshot in the above album.
So I wrote "Continue" again.
It just gave an answer but it never finished the program. There's no app entry in the program, it completely omited the rest of the main class itself and the callback to call it, which would be like:
def run(self):
self.root.mainloop()
###############################################################################
# main
###############################################################################
if __name__=="__main__":
sys.setrecursionlimit(10**6)
app=ToyBrowser()
app.run()
so it only output a half-finished program. It explained that it was finished.I tried telling it "you didn't finish the program, output the rest of it" but doing so just got it stuck rewriting it without finishing it. Again it said it ran into the limit, again I said Continue, and again it didn't finish it.
The program itself is only 1055 lines, it should be able to output that much.
edit: note that my team mostly hits rate limits using things like aider and goose. 80k input token is not enough when in a flow, and I would love to experiment with a multi-agent workflow using claude
I'd also love to have it in a language that can be compiled, like golang or rust, but I recognize a rewrite might be more effort than it's worth. (Although maybe less with claude code to help you?)
EDIT: OK, 10 minutes in, and it seems to have major issues doing basic patches to my Golang code; the most recent thing it did was add a line with incorrect indentation, then try three times to update it with the correct indentation, getting "String to replace not found in file" each time. Aider with claude 3.5 does this really well -- not sure what the counfounding issue is here, but might be worth taking a look at their prompt & patch format to see how they do it.
It's one thing to retrofit LLMs into existing tools but I'm more curious how this new space will develop as time goes on. Already stuff like the Warp terminal is pretty useful in day to day use.
Who knows, maybe this time next year we'll see more people programming by voice input instead of typing. Something akin to Talon Voice supercharged by a local LLM hopefully.
But I still hit limits, I use Claudemind with jetbrains stuff and there is a max of input tokens (j believe), I am ‘tier 2’ but doesn’t look like I can go past this without an enterprise agreement
[0] https://github.com/microsoft/semantic-kernel/issues/5690#iss...
At least with Cursor, I can use all "premium" 500 completions and either buy more, or be patient for throttled responses.
Lists, numbers, tabs, etc. are all a little time consuming... minor annoyance but thought I'd share.
It is provided by DeepSeek and Avian.
I am also midway of enabling a third-provider (Nebius).
You can see all models/providers over at https://glama.ai/models
As another commenter in this tread said, we are just a 'frontend wrapper' around other people services. Therefore, it is not particularly difficult to add models that are already supported by other providers.
The benefit of using our wrapper is that you can use a single API key and you get one bill for all your AI bills, you don't need to hack together your own logic for routing requests between different providers, failovers, keeping track of their costs, worry what happens if a provider goes down, etc.
The market at the moment is hugely fragmented, with many providers unstable, constantly shifting prices, etc. The benefit of a router is that you don't need to worry about those things.
A lot of people just want the ability to pay more in order to get more.
I would gladly pay 10x more to get relatively modest increases in performance. That is how important the intelligence is.
If this Code preview is only open to subscribers it means i have to subscribe before i can even see if the binary works for me. Hmm
edit: Oh, there's a link to "joining the preview" which points to: https://docs.anthropic.com/en/docs/agents-and-tools/claude-c...
I'd really like to use Claude Code in some of my projects vs just sharing snippets via the UI but I'm curious how might doing this from our source directory affect our IP including NDA's, trade secret protections, prior disclosure rules on (future) patents, open source licensing restrictions re: redistribution etc?
Also hi Erik! - Rob
Eg Claude will refuse to write code to wget a website and parse the html if you ask it to scrape your ex girlfriend's Instagram profile, for ethical and tos reasons, but if you phrase the request differently, it'll happily go off and generate code that does that exact thing.
Asking it to scrape my ex girlfriend's Instagram profile is just a stand in for other times I've hit a problem where I've had to social engineer my way past those guard rails, but does having those guard rails really provide value on a professional level?
Also, curious if you have any intuition as to why the no-parallelism number for AIME with Claude (61.3%) is quite low (e.g., relative to R1 87.3% -- assuming it is an apples to apples comparison)?
i.e I'd like my chat and API usage to be all included under a flat-rate subscription.
Currenty Pro doesn't give me any API credits to use with coding assistants (Claude Code included ?) which is completely disjointed. And I need to be a business to use the API still ?
Honestly, Claude is so good, just please take my money and make it easy to do the above !
They have a very solid infrastructure.
Scaling infrastructure to handle billions of tokens is no joke.
I believe they are approaching 1 trillion tokens per week.
Glama is way smaller. We only recently crossed 10bn tokens per day.
However, I have invested a lot more into UX/UI of that chat itself, i.e. while OpenRouter is entirely focused on API gateway (which is working for them), I am going for a hybrid approach.
The market is big enough for both projects to co-exist.
You still need to know what good code looks like to use these tools. If you go forward in your career trusting the output of LLMs without the skills to evaluate the correctness, style, functionality of that code then you will have problems.
People still write low level machine code today, despite compilers having existed for 70+ (?) years.
We'll always need full-stack humans who understand everything down to the electrons even in the age of insane automation that we're entering.
I would like this to happen easily like hitting a menu or button without having to write an elaborate "prompt" every time.
Is this possible?
1. RAG: A simple model looks at the question, pulls up some associated data into the context and hopes that it helps.
2. Self-RAG: The model "intentionally"/agentically triggers a lookup for some topic. This can be via a traditional RAG or just string search, ie. grep.
3. Full Context: Just jam everything in the context window. The model uses its attention mechanism to pick out the parts it needs. Best but most expensive of the three, especially with repeated queries.
Aider uses kind of a hybrid of 2 and 3: you specify files that go in the context, but Aider also uses Tree-Sitter to get a map of the entire codebase, ie. function headers, class definitions etc., that is provided in full. On that basis, the model can then request additional files to be added to the context.
1. Subscribe to Claude Pro for $20 month
2. Separately, Buy $100 worth of API credits.
Now you have a Claude "ultimate" subscription where the credits roll over as an added bonus.
As someone who only uses the APIs, and not the subscription services for AI, I can tell you that $100 is A LOT of usage. Quite frankly, I've never used anywhere close to $20 in a month which is why I don't subscribe. I mostly just use text though, so if you do a lot of image generation that can add up quickly
But I can’t imagine this tool in the hands of someone who does not have a solid understanding of programming.
You need to understand when to push back and why. It’s like doing mini code reviews all the time. LLMs are very convincing and will happily generate garbage with the utmost authority.
Don’t trust and absolutely verify.
I would argue that this is still RAG. There's a common misconception (or at least I think it's a misconception) that RAG only counts if you used vector search - I like to expand the definition of RAG to include non-vector search (like Ripgrep in this case), or any other technique where you use Retrieval techniques to Augment the Generation phase.
IR (Information Retrieval) has been around for many decades before vector search become fashionable: https://en.wikipedia.org/wiki/Information_retrieval
As long as capacity is an issue, you can't have both
People who know both coding and LLMs will be a whole lot more attractive to hire to build software than people who just know LLMs for many years to come.
It's the only mainstream AI service that requests this information. After a string of security lapses by many of your competitors, I have zero faith in the ability of a "fast moving" AI-focused company to keep my PII data secure.
>Claude Code consumes tokens for each interaction. Typical usage costs range from $5-10 per developer per day, but can exceed $100 per hour during intensive use.
Would love to learn a bit more about how the GitHub integration works. From https://support.anthropic.com/en/articles/10167454-using-the... it seems it’s read only.
Does Claude Code let me take a generated/edited artifact and commit it back as a PR?
Claude Code can run commands including "git" commands, so it can create a branch, commit code to that branch and push that branch to GitHub - at which point point you can create a PR.
I'm very much in favour of removing the guardrails but I understand why they're in place. The problem is attribution. You can teach yourself how to engage in all manner of dark deeds with a library or wikipedia or a search engine and some time, but any resulting public outcry is usually diffuse or targeted at the sources rather than the service. When Claude or GPT or Stable Diffusion are used to generate something judged offensive, the outcry becomes an existential threat to the provider.
Try to delete (close) the panel on the right on a side-by-side view. It took a good second to actually close. Creating one isn't much faster.
This is unbearably slow, to be blurt.
There's also been a spate of AI companies rushing to release products and having "oops" moments where they leaked customer chats or whatever.
They're not run like a FAANG, they don't have the same security pedigree, and they generally don't have any real guarantee of privacy.
So yes, my privacy is more valuable.
Conversely: Why is my non-privacy so valuable to Anthropic? Do they plan on selling my data? Maybe not now... but when funding gets a bit tight? Do they plan on selling my information to the likes of Cambridge Analytica? Not just superficial metadata, but also an AI-summarised history of my questions?
The best thing to do would be not to ask. But they are asking.
Why?
Why only them?
It seems very very similar. I open sourced the code to MyCoder here: https://github.com/drivecore/mycoder I'll compare them. Off hand I think both CodeBuff and Claude Coder are missing the web debugging tools I added to MyCoder.
I would pay $50/mo or something to be able to have reasonable use of Claude Code in a limited (but not as limited) way as through the web UI, but all of these coding tools seem to work only with the API and are therefore either too expensive or too limited.
I've used https://github.com/cline/cline to get a similar workflow to their Claude Code demo, and yes it's amazing how quickly the token counts add up. Claude seems to have capacity issues so I'm guessing they decided to charge a premium for what they can serve up.
+1 on the too expensive or too limited sentiment. I subscribed to Claude for quite a while but got frustrated the few times I would use it heavily I'd get stuck due to the rate limits.
I could stomach a $20-$50 subscription for something like 3.7 that I could use a lot when coding, and not worry about hitting limits (or I suspect being pushed on to a quantized/smaller model when used too much).
It became such an anti-pattern that I stopped paying. Now, when people ask me which one to use, I always say I like Claude more than others, but I don’t recommend using it in a professional setting.
2. Instead lets the agent decide what to bring into context by using tools on the codebase. Since the tools used are fast enough, this gives you effectively "verified answers" so long as the agent didn't screw up its inputs to the tool (which will happen, most likely).
Captchas are trivially broken and you can get access to millions of residential IP addresses, but phone numbers (especially if you filter out VOIP providers) still have a cost.
There are similar open source CLI tools that predate Claude Coder. Its reasonable to assume Anthropic chose not to contribute to those projects for reasons other than complexity, and charitably Anthropic likely plans for differentiating features.
> Also, the minified source code is available
The redistribution license - or lack thereof - will be the stumbling block to directly reusing code authored by Anthropic without authorization.
I think the best period of Software Devs will be gone in few years. Knowing how how to code and fix things will be important still but more important to be also Jack-of-Many-Trades to provide more value: know a little about SEO, have a good taste of design and be able to tweak simple design, good taste how to organise code, better soft skills and managing or educating less tech-savvy stuff.
Another option is to specialise in some currently difficult subfield: robotics, ML, CUDA, rust and try to be this elite dev with expectation would have to move to SV or any such tech hub.
Best general recommendation I would give right now (especially for someone who is not from US) to someone who is currently studying is to use that a lot of time you have right now with not much responsibility to make some product that can provide you semi-passive income on a monthly basis ($5k-$10k) to drag yourself out of this rat race. Even if you not succeed or revenue stream will run out eventually you will learn those other skills that will be more important later if wanna be employed (SEO, code & design taste, marketing, soft skills).
Because most likely this window of opportunity might be only for the next few years in similar way when the best window for Mobile Apps was first ~2 years when App Store started
Who cares if you used vector search for the retrieval?
The best vector retrieval implementations are already switching to a hybrid between vector and FTS, because it turns out BM25 etc is still a better algorithm for a lot of use-cases.
"Agentic search" makes much less sense to me because the term "agentic" is so incredibly vague.
How do you feel about raking in millions while attempting to make us all unemployed?
How do you feel about stealing open source code and stripping the copyright?
Like $5+ was cache read ($0.05/token vs $3/token) so it would have cost $300+
WHY is a huge % of my UX filled with nothing? I would apprececiate metrics, token graphs etc
https://i.imgur.com/VlxLCwI.png
Why so much wasted space? ... >>??
The entire LOTR trilogy is ~.55 million tokens (1,200 pages, published).
If you are sending and receiving the text equivalent of several hundred copies of the LOTR trilogy every week, I don't think you are actually using AI for anything useful, or you are providing far too much context.
> The selling price and the unit price must be indicated in an unambiguous, easily identifiable and clearly legible manner for all products offered by traders to consumers (i.e. the final price should include value added tax and all other taxes).
I wanted to see what the annual plan would cost as it was just displaying €170+VAT, and when I clicked the upgrade button to find out (I checked everywhere on the page) then I was automatically subscribed without any confirmation and without ever seeing the final price before the transaction was completed.
You think it's acceptable that a company say the price is €170+vat and then after the transaction is complete they inform you that the actual price was €206.50?