Doesn't this seem like a remarkably small set of tests? And the fact that it took this testing to realize that prompt injection and giving the reigns to the AI agent is dangerous strikes me as strange that this wasn't anticipated while building the tool in the first place, before it even went to their red team.
Move fast and break things I guess. Only it is the worlds largest browser and the risk of breaking things means financial ruin and/or the end of the internet as we know it as a human to human communication tool.
> We view browser-using AI as inevitable: so much work happens in browsers that giving Claude the ability to see what you're looking at, click buttons, and fill forms will make it substantially more useful.
A lot of this can be done by building a bunch of custom environments at training time, but only a limited number of usecases can be handled that way. They don't need the entire data, they still need the kind of tasks real world users would ask them to do.
Hence, the press release pretty much saying that they think it's unsafe, they don't have any clue how to make it safe without trying it out, and they would only want a limited number of people to try it out. Give their stature, it's good to do it publicly instead of how Google does it with trusted testers or Openai does it with select customers.
I think this is more akin to say a theoretical browser not implementing HTTPS properly so people's credentials/sessions can be stolen with MiTM attacks or something. Clearly the bad behavior is in the toolchain and not the user here, and I'm not sure how much you can wave away claiming "We told you it's not fully safe." You can't sell tomatoes that have a 10% chance of giving you food poisoning, even if you declare that chance on the label, you know?
I don't know how these people sleep at night knowing they are actively ruining society.
To misquote the IRA - "[Scammers] only need to be lucky once, you need to be lucky every time." Even a 1% chance of getting pwned every time you get sent a malicious email is way too high. Plus the scammers aren't gonna rest on their laurels - they'll be iterating too.