←back to thread

Claude for Chrome

(www.anthropic.com)
795 points davidbarker | 1 comments | | HN request time: 0s | source
Show context
aliljet ◴[] No.45030980[source]
Having played a LOT with browser use, playwright, and puppeteer (all via MCP integrations and pythonic test cases), it's incredibly clear how quickly Claude (in particular) loses the thread as it starts to interact with the browser. There's a TON of visual and contextual information that just vanishes as you begin to do anything particularly complex. In my experience, repeatedly forcing new context windows between screenshots has dramatically improved the ability for claude to perform complex intearctions in the browser, but it's all been pretty weak.

When Claude can operate in the browser and effectively understand 5 radio buttons in a row, I think we'll have made real progress. So far, I've not seen that eval.

replies(7): >>45031153 #>>45031164 #>>45031750 #>>45032251 #>>45033961 #>>45034552 #>>45036980 #
jascha_eng ◴[] No.45032251[source]
I have built a custom "deep research" internally that uses puppeteer to find business information, tech stack and other information about a company for our sales team.

My experience was that giving the LLM a very limited set of tools and no screenshots worked pretty damn well. Tbf for my use case I don't need more interactivity than navigate_to_url and click_link. Each tool returning a text version of the page and the clickable options as an array.

It is very capable of answering our basic questions. Although it is powered by gpt-5 not claude now.

replies(3): >>45032764 #>>45033355 #>>45033832 #
panarky ◴[] No.45032764[source]
Just shoving everything into one context fails after just a few turns.

I've had more success with a hierarchy of agents.

A supervisor agent stays focused on the main objective, and it has a plan to reach that objective that's revised after every turn.

The supervisor agent invokes a sub-agent to search and select promising sites, and a separate sub-sub-agent for each site in the search results.

When navigating a site that has many pages or steps, a sub-sub-sub-agent for each page or step can be useful.

The sub-sub-sub-agent has all the context for that page or step, and it returns a very short summary of the content of that page, or the action it took on that step and the result to the sub-sub-agent.

The sub-sub-agents return just the relevant details to their parent, the sub-agent.

That way the supervisor agent can continue for many turns at the top level without exhausting the context window or losing the thread and pursuing its own objective.

replies(1): >>45033137 #
1. jascha_eng ◴[] No.45033137[source]
Hmm my browser agents each have about 50-100 turns (takes roughly 3-5 minutes for each one) and one focused objective I make use of structured output to group all the info it found into a standardized format at the end.

I have 4 of those "research agents" with different prompts running after another and then I format the results into a nice slack message + Summarize and evaluate the results in one final call (with just the result jsons as input).

This works really well. We use it to score leads as for how promising they are to reach out to for us.