Claude for Chrome

(www.anthropic.com)

Show context

aliljet ◴[26 Aug 25 19:19 UTC] No.45030980[source]▶

Having played a LOT with browser use, playwright, and puppeteer (all via MCP integrations and pythonic test cases), it's incredibly clear how quickly Claude (in particular) loses the thread as it starts to interact with the browser. There's a TON of visual and contextual information that just vanishes as you begin to do anything particularly complex. In my experience, repeatedly forcing new context windows between screenshots has dramatically improved the ability for claude to perform complex intearctions in the browser, but it's all been pretty weak.

When Claude can operate in the browser and effectively understand 5 radio buttons in a row, I think we'll have made real progress. So far, I've not seen that eval.

replies(7): >>45031153 #>>45031164 #>>45031750 #>>45032251 #>>45033961 #>>45034552 #>>45036980 #

1. MattSayar ◴[26 Aug 25 19:32 UTC] No.45031164[source]▶

>>45030980 #

Same. When I try to get it to do a simple loop (eg take screenshot, click next, repeat) it'll work for about five iterations (out of a hundred or so desired) then say, "All done, boss!"

I'm hoping Anthropic's browser extension is able to do some of the same "tricks" that Claude Code uses to gloss over these kinds of limitations.

replies(4): >>45031408 #>>45031587 #>>45031820 #>>45033874 #

2. tripplyons ◴[26 Aug 25 19:46 UTC] No.45031408[source]▶

>>45031164 (TP) #

Hopefully one of those "tricks" involves training a model on examples of browser use.

3. robots0only ◴[26 Aug 25 19:58 UTC] No.45031587[source]▶

>>45031164 (TP) #

Claude is extremely poor at vision when compared to Gemini and ChatGPT. i think anthropic severely overfit their evals to coding/text etc. use cases. maybe naively adding browser use would work, but I am a bit skeptical.

replies(2): >>45031690 #>>45032320 #

4. bdangubic ◴[26 Aug 25 20:07 UTC] No.45031690[source]▶

>>45031587 #

I have a completely different experience. Pasting a screenshot into CC is my de-facto go-to that more often than not leads to CC understanding what needs to be done etc…

replies(1): >>45032230 #

5. CSMastermind ◴[26 Aug 25 20:19 UTC] No.45031820[source]▶

>>45031164 (TP) #

This has been exactly my experience using all the browser based tools I've tried.

ChatGPT's agents get the furthest but even then they only make it like 10 iterations or something.

replies(1): >>45032034 #

6. rzzzt ◴[26 Aug 25 20:39 UTC] No.45032034[source]▶

>>45031820 #

I have better success with asking for a short script that does the million iterations than asking the thing to make the changes itself (edit: in IDEs, not in the browser).

replies(1): >>45038063 #

7. ◴[26 Aug 25 20:56 UTC] No.45032230{3}[source]▶

>>45031690 #

8. user453 ◴[26 Aug 25 21:05 UTC] No.45032320[source]▶

>>45031587 #

Is it overfitting if it makes them the best at those tasks?

9. felarof ◴[27 Aug 25 00:05 UTC] No.45033874[source]▶

>>45031164 (TP) #

I'm wondering if they are using vanilla claude or if they are using a fine-tuned version of claude specifically for browser use.

RL fine-tuning LLMs can have pretty amazing results. We did GRPO training of Qwen3:4B to do the task of a small action model at BrowserOS (https://www.browseros.com/) and it was much better than running vanilla Claude, GPT.

10. seunosewa ◴[27 Aug 25 11:17 UTC] No.45038063{3}[source]▶

>>45032034 #

If you need precision, that's the way to go, and it's usually cheaper and faster too.

↑