Most active commenters
  • jascha_eng(4)
  • asdff(3)

←back to thread

Claude for Chrome

(www.anthropic.com)
795 points davidbarker | 25 comments | | HN request time: 0.66s | source | bottom
1. aliljet ◴[] No.45030980[source]
Having played a LOT with browser use, playwright, and puppeteer (all via MCP integrations and pythonic test cases), it's incredibly clear how quickly Claude (in particular) loses the thread as it starts to interact with the browser. There's a TON of visual and contextual information that just vanishes as you begin to do anything particularly complex. In my experience, repeatedly forcing new context windows between screenshots has dramatically improved the ability for claude to perform complex intearctions in the browser, but it's all been pretty weak.

When Claude can operate in the browser and effectively understand 5 radio buttons in a row, I think we'll have made real progress. So far, I've not seen that eval.

replies(7): >>45031153 #>>45031164 #>>45031750 #>>45032251 #>>45033961 #>>45034552 #>>45036980 #
2. tripplyons ◴[] No.45031153[source]
Definitely a good idea to wait for real evidence of it working. Hopefully they aren't just using the same model that wasn't really trained for browser use.
3. MattSayar ◴[] No.45031164[source]
Same. When I try to get it to do a simple loop (eg take screenshot, click next, repeat) it'll work for about five iterations (out of a hundred or so desired) then say, "All done, boss!"

I'm hoping Anthropic's browser extension is able to do some of the same "tricks" that Claude Code uses to gloss over these kinds of limitations.

replies(4): >>45031408 #>>45031587 #>>45031820 #>>45033874 #
4. tripplyons ◴[] No.45031408[source]
Hopefully one of those "tricks" involves training a model on examples of browser use.
5. robots0only ◴[] No.45031587[source]
Claude is extremely poor at vision when compared to Gemini and ChatGPT. i think anthropic severely overfit their evals to coding/text etc. use cases. maybe naively adding browser use would work, but I am a bit skeptical.
replies(2): >>45031690 #>>45032320 #
6. bdangubic ◴[] No.45031690{3}[source]
I have a completely different experience. Pasting a screenshot into CC is my de-facto go-to that more often than not leads to CC understanding what needs to be done etc…
replies(1): >>45032230 #
7. philip1209 ◴[] No.45031750[source]
Context rot: https://news.ycombinator.com/item?id=44564248
8. CSMastermind ◴[] No.45031820[source]
This has been exactly my experience using all the browser based tools I've tried.

ChatGPT's agents get the furthest but even then they only make it like 10 iterations or something.

replies(1): >>45032034 #
9. rzzzt ◴[] No.45032034{3}[source]
I have better success with asking for a short script that does the million iterations than asking the thing to make the changes itself (edit: in IDEs, not in the browser).
replies(1): >>45038063 #
10. ◴[] No.45032230{4}[source]
11. jascha_eng ◴[] No.45032251[source]
I have built a custom "deep research" internally that uses puppeteer to find business information, tech stack and other information about a company for our sales team.

My experience was that giving the LLM a very limited set of tools and no screenshots worked pretty damn well. Tbf for my use case I don't need more interactivity than navigate_to_url and click_link. Each tool returning a text version of the page and the clickable options as an array.

It is very capable of answering our basic questions. Although it is powered by gpt-5 not claude now.

replies(3): >>45032764 #>>45033355 #>>45033832 #
12. user453 ◴[] No.45032320{3}[source]
Is it overfitting if it makes them the best at those tasks?
13. panarky ◴[] No.45032764[source]
Just shoving everything into one context fails after just a few turns.

I've had more success with a hierarchy of agents.

A supervisor agent stays focused on the main objective, and it has a plan to reach that objective that's revised after every turn.

The supervisor agent invokes a sub-agent to search and select promising sites, and a separate sub-sub-agent for each site in the search results.

When navigating a site that has many pages or steps, a sub-sub-sub-agent for each page or step can be useful.

The sub-sub-sub-agent has all the context for that page or step, and it returns a very short summary of the content of that page, or the action it took on that step and the result to the sub-sub-agent.

The sub-sub-agents return just the relevant details to their parent, the sub-agent.

That way the supervisor agent can continue for many turns at the top level without exhausting the context window or losing the thread and pursuing its own objective.

replies(1): >>45033137 #
14. jascha_eng ◴[] No.45033137{3}[source]
Hmm my browser agents each have about 50-100 turns (takes roughly 3-5 minutes for each one) and one focused objective I make use of structured output to group all the info it found into a standardized format at the end.

I have 4 of those "research agents" with different prompts running after another and then I format the results into a nice slack message + Summarize and evaluate the results in one final call (with just the result jsons as input).

This works really well. We use it to score leads as for how promising they are to reach out to for us.

15. asdff ◴[] No.45033355[source]
Seems navigate_to_url and click_link would be solved with just a script running puppeteer vs having an llm craft a puppeteer script to hopefully do this simple action reliably? What is the great advantage with the llm tooling in this case?
replies(1): >>45033448 #
16. jascha_eng ◴[] No.45033448{3}[source]
Oh the tools are hand coded (or rather built with Claude Code) but the agent can call them to control the browser.

Imagine a prompt like this:

You are a research agent your goal is to figure out this companies tech stack: - Company Name

Your available tools are: - navigate_to_url: use this to load a page e.g. use google or bing to search for the company site It will return the page content as well as a list of available links - click_link: Use this to click on a specific link on the currently open page. It will also return the current page content and any available links

A good strategy is usually to go on the companies careers page and search for technical roles.

This is a short form of what is actually written there but we use this to score leads as we are built on postgres and AWS and if a company is using those, these are very interesting relevancy signals for us.

replies(1): >>45033547 #
17. asdff ◴[] No.45033547{4}[source]
I still don't understand what the llm does. One could do this with a few lines of curl and a list of tools to query against.
replies(1): >>45033829 #
18. jascha_eng ◴[] No.45033829{5}[source]
The LLM understands arbitrary web pages and finds the correct links to click. Not for one specific page but for ANY company name that you give it.

It will always come back with a list of technologies used if available on the companies page. Regardless of how that page is structured. That level of generic understanding is simply not solveable with just some regex and curls.

replies(1): >>45033959 #
19. felarof ◴[] No.45033832[source]
This is super cool!

If a "deep research" like agent is available directly in your browser, would that be useful?

We are building this at BrowserOS!

20. felarof ◴[] No.45033874[source]
I'm wondering if they are using vanilla claude or if they are using a fine-tuned version of claude specifically for browser use.

RL fine-tuning LLMs can have pretty amazing results. We did GRPO training of Qwen3:4B to do the task of a small action model at BrowserOS (https://www.browseros.com/) and it was much better than running vanilla Claude, GPT.

21. asdff ◴[] No.45033959{6}[source]
Sure it is. You can use recursive methods to go through all links in a webpage and identify your terms within. wget or curl would probably work with a few piped commands for this. I'd have to go through the man pages again to come up with a working example but people have done just this for a long time now.

One might ask how you verify your LLM works as intended without a method like this already built.

22. rukuu001 ◴[] No.45033961[source]
Maybe this will be the impetus for the ‘semantic web’ and accessibility to be taken seriously
23. suchintan ◴[] No.45034552[source]
Have you ever given Skyvern (https://github.com/Skyvern-AI/skyvern) a try? I'd love to hear your opinion
24. lopis ◴[] No.45036980[source]
After all this time, we might be entering the age of proper web accessibility, because this will help AI helps understand pages better.
25. seunosewa ◴[] No.45038063{4}[source]
If you need precision, that's the way to go, and it's usually cheaper and faster too.