←back to thread

Claude for Chrome

(www.anthropic.com)
795 points davidbarker | 1 comments | | HN request time: 0s | source
Show context
parsabg ◴[] No.45031888[source]
I built a very similar extension [1] a couple of months ago that supports a wide range of models, including Claude, and enables them to take control of a user's browser using tools for mouse and keyboard actions, observation, etc. It's a fun little project to look at to understand how this type of thing works.

It's clear to me that the tech just isn't there yet. The information density of a web page with standard representations (DOM, screenshot, etc) is an order of magnitude lower than that of, say, a document or piece of code, which is where LLMs shine. So we either need much better web page representations, or much more capable models, for this to work robustly. Having LLMs book flights by interacting with the DOM is sort of like having them code a web app using assembly. Dia, Comet, Browser Use, Gemini, etc are all attacking this and have big incentives to crack it, so we should expect decent progress here.

A funny observation was that some models have been clearly fine tuned for web browsing tasks, as they have memorized specific selectors (e.g. "the selector for the search input in google search is `.gLFyf`").

[1] https://github.com/parsaghaffari/browserbee

replies(11): >>45032377 #>>45032556 #>>45032983 #>>45033328 #>>45033344 #>>45033797 #>>45033828 #>>45035580 #>>45036238 #>>45037152 #>>45040560 #
Exoristos ◴[] No.45035580[source]
Do we regret, yet, letting the Semantic Web wither on the vine?
replies(3): >>45035738 #>>45035761 #>>45037155 #
1. mike_hearn ◴[] No.45037155[source]
It didn't really wither on the vine, it just moved to JSON REST APIs with React as the layer that maps the model to the view. What's missing is API discovery which MCP provides.

The problem with the concept is not really the tech. The problem is the incentives. Companies don't have much incentive to offer APIs, in most cases. It just risks adding a middleman who will try and cut them out. Not many businesses want to be reduced to being just an API provider, it's a dead end business and thus a dead end career/lifestyle for the founders or executives. The telcos went through this in the early 2000s where their CEOs were all railing against a future of becoming "dumb pipes". They weren't able to stop it in the end, despite trying hard. But in many other cases companies did successfully avoid that fate.

MCP+API might be different or it might not. It eliminates some of the downsides of classical API work like needing to guarantee stability and commit to a feature set. But it still poses the risk of losing control of your own brand and user experience. The obvious move is for OpenAI to come along and demand a rev share if too many customers are interacting with your service via ChatGPT, just like Google effectively demand a revshare for sending traffic to your website because so many customers interact with the internet via web search.