←back to thread

Claude for Chrome

(www.anthropic.com)
795 points davidbarker | 4 comments | | HN request time: 1.044s | source
1. cube2222 ◴[] No.45030866[source]
> We’re launching with 1,000 Max users and expanding gradually based on what we learn. This measured approach helps us validate safeguards before broader deployment.

Somewhat comforting they’re not yolo-ing it too much, but I frankly don’t see how the prompt injection issues with browser agents that act on your behalf can be surmounted - maybe other than the company guaranteeing “we’ll reimburse you for any unintentional financial losses incurred by the agent”.

Cause it seems to me like any straightforward methods are really just an arms race between prompt injection and heuristic safeguards.

replies(1): >>45031054 #
2. hombre_fatal ◴[] No.45031054[source]
Since the LLM has to inherently make tool/API calls to do anything, can't you gate those behind a confirmation box that describes what it wants to do?

And you could whitelist APIs like "Fill form textarea with {content}" vs more destructive ones like "Submit form" or "Make request to {url} with {body}".

Edit: It seems to already do this.

Granted, you'd still have to be eternally vigilant.

replies(1): >>45031529 #
3. cube2222 ◴[] No.45031529[source]
When every operation needs to be approved (every button click, every form entry, etc.) does it even make sense to use an agent?

And it’s not like you can easily “always allow” let’s say, certain actions on certain websites, because the issue is less with the action, and more with the data passed to it.

replies(1): >>45031632 #
4. hombre_fatal ◴[] No.45031632{3}[source]
Sure, just look at the examples in TFA like finding emails that demand a response or doing custom queries on Zillow.

You probably are just going to grant it read access.

That said, having thought about it, the most successful or scarier injections probably aren't going to involve things like crafting noisy destructive actions but rather silently changing what the LLM does during trusted/casual flows like reading your emails.

So I can imagine a dichotomy between pretty low risk things (Zillow/Airbnb queries) and things that demand scrutiny like doing anything in your email inbox where the LLM needs to read emails, and I can imagine the latter requiring such vigilance that you might be right.

It'll be very interesting and probably quite humbling to see this whole new genre of attacks pop up in the wild.