←back to thread

Claude for Chrome

(www.anthropic.com)
795 points davidbarker | 2 comments | | HN request time: 0.001s | source
Show context
parsabg ◴[] No.45031888[source]
I built a very similar extension [1] a couple of months ago that supports a wide range of models, including Claude, and enables them to take control of a user's browser using tools for mouse and keyboard actions, observation, etc. It's a fun little project to look at to understand how this type of thing works.

It's clear to me that the tech just isn't there yet. The information density of a web page with standard representations (DOM, screenshot, etc) is an order of magnitude lower than that of, say, a document or piece of code, which is where LLMs shine. So we either need much better web page representations, or much more capable models, for this to work robustly. Having LLMs book flights by interacting with the DOM is sort of like having them code a web app using assembly. Dia, Comet, Browser Use, Gemini, etc are all attacking this and have big incentives to crack it, so we should expect decent progress here.

A funny observation was that some models have been clearly fine tuned for web browsing tasks, as they have memorized specific selectors (e.g. "the selector for the search input in google search is `.gLFyf`").

[1] https://github.com/parsaghaffari/browserbee

replies(11): >>45032377 #>>45032556 #>>45032983 #>>45033328 #>>45033344 #>>45033797 #>>45033828 #>>45035580 #>>45036238 #>>45037152 #>>45040560 #
miguelspizza ◴[] No.45032983[source]
> It's clear to me that the tech just isn't there yet.

Totally agree. This was the thesis behind MCP-B (now WebMCP https://github.com/MiguelsPizza/WebMCP)

HN Post: https://news.ycombinator.com/item?id=44515403

DOM and visual parsing are dead ends for browser automation. Not saying models are bad; they are great. The web is just not designed for them at all. It's designed for humans, and humans, dare I say, are pretty impressive creatures.

Providing an API contract between extensions and websites via MCP allows an AI to interact with a website as a first-class citizen. It just requires buy-in from website owners.

It's being proposed as a web standard: > https://github.com/webmachinelearning/webmcp

replies(2): >>45033055 #>>45033221 #
chatmasta ◴[] No.45033221[source]
I suspect this kind of framework will be adopted by websites with income streams that are not dependent on human attention (i.e. advertising revenue, mostly). They have no reason to resist LLM browser agents. But if they’re in the business of selling ads to human eyeballs, expect resistance.

Maybe the AI companies will find a way to resell the user’s attention to the website, e.g. “you let us browse your site with an LLM, and we’ll show your ad to the user.”

replies(2): >>45033387 #>>45033539 #
1. onesociety2022 ◴[] No.45033387[source]
Even the websites whose primary source of revenue is not ad impressions might be resistant to let the agents be the primary interface through which users interact with their service.

Instacart currently seems to be very happy to let ChatGPT Operator use its website to place an order (https://www.instacart.com/company/updates/ordering-groceries...) [1]. But what happens when the primary interface for shopping with Instacart is no longer their website or their mobile app? OpenAI could demand a huge take rate for orders placed via ChatGPT agents, and if they don't agree to it, ChatGPT can strike a deal with a rival company and push traffic to that service instead. I think Amazon is never going to agree to let other agents use its website for shopping for the same reason (they will restrict it to just Alexa).

[1] - the funny part is the Instacart CEO quit shortly after this and joined OpenAI as CEO of Applications :)

replies(1): >>45033512 #
2. miguelspizza ◴[] No.45033512[source]
The side-panel browser agent is a good middle ground to this issue. The user is still there looking at the website via their own browser session, the AI just has access to the specific functionality which the website wants to expose to it. The human can take over or stop the AI if things are going south.