←back to thread

645 points helloplanets | 2 comments | | HN request time: 0.472s | source
Show context
ec109685 ◴[] No.45005397[source]
It’s obviously fundamentally unsafe when Google, OpenAI and Anthropic haven’t released the same feature and instead use a locked down VM with no cookies to browse the web.

LLM within a browser that can view data across tabs is the ultimate “lethal trifecta”.

Earlier discussion: https://news.ycombinator.com/item?id=44847933

It’s interesting that in Brave’s post describing this exploit, they didn’t reach the fundamental conclusion this is a bad idea: https://brave.com/blog/comet-prompt-injection/

Instead they believe model alignment, trying to understand when a user is doing a dangerous task, etc. will be enough. The only good mitigation they mention is that the agent should drop privileges, but it’s just as easy to hit an attacker controlled image url to leak data as it is to send an email.

replies(7): >>45005444 #>>45005853 #>>45006130 #>>45006210 #>>45006263 #>>45006384 #>>45006571 #
skaul ◴[] No.45006130[source]
(I lead privacy at Brave and am one of the authors)

> Instead they believe model alignment, trying to understand when a user is doing a dangerous task, etc. will be enough.

No, we never claimed or believe that those will be enough. Those are just easy things that browser vendors should be doing, and would have prevented this simple attack. These are necessary, not sufficient.

replies(4): >>45006255 #>>45006329 #>>45006467 #>>45006601 #
cowboylowrez ◴[] No.45006255[source]
what you're saying is that the described step, "model alignment" is necessary even though it will fail a percentage of the time. whenever I see something that is "necessary" but doesn't have like a dozen 9's for reliability against failure or something well lets make that not necessary then. whadya say?
replies(1): >>45006309 #
skaul ◴[] No.45006309[source]
That's not how defense-in-depth works. If a security mitigation catches 90% of the "easy" attacks, that's worth doing, especially when trying to give users an extremely powerful capability. It just shouldn't be the only security measure you're taking.
replies(2): >>45006450 #>>45006710 #
cowboylowrez ◴[] No.45006450[source]
sure sure, except llms. I mean its valid and all bringing up tried and true maxims that we all should know regarding software, but whens the last time the ssl guys were happy with a fix that "has a chance of working, but a chance of not working."

defense in depth is to prevent one layer failure from getting to the next, you know, exploit chains etc. Failure in a layer is a failure, not statistically expected behavior. we fix bugs. what we need to do is treat llms as COMPLETELY UNTRUSTED user input as has been pointed out here and elsewhere time and again.

you reply to me like I need to be lectured, so consider me a dumb student in your security class. what am I missing here?

replies(3): >>45006676 #>>45006878 #>>45008574 #
skaul ◴[] No.45008574[source]
> you reply to me like I need to be lectured

That's not my intention! Just stating how we're thinking about this.

> defense in depth is to prevent one layer failure from getting to the next

We think a separate model can help with one layer of this: checking if the planner model's actions are aligned with the user's request. But we also need guarantees at other layers, like distinguishing web contents from user instructions, or locking down what tools the model has access to in what context. Fundamentally, though, like we said in the blog post:

"The attack we developed shows that traditional Web security assumptions don’t hold for agentic AI, and that we need new security and privacy architectures for agentic browsing."

replies(1): >>45009549 #
simonw ◴[] No.45009549[source]
"But we also need guarantees at other layers, like distinguishing web contents from user instructions"

How do you intend to do that?

In the three years I've spent researching and writing about prompt injection attacks I haven't seen a single credible technique from anyone that can distinguish content from instructions.

If you can solve that you'll have solved the entire class of prompt injection attacks!

replies(2): >>45013216 #>>45015240 #
skaul ◴[] No.45015240[source]
> I haven't seen a single credible technique from anyone that can distinguish content from instructions

You specifically mean that it's ~impossible to distinguish between content and instructions ONCE it is fed to the model, right? I agree with that. I was talking about a prior step, at the browser level. At the point that the query is sent to the backend, the browser would be able to distinguish between web contents and user prompt. This is useful for checking user-alignment of the output of the reasoning model (keeping in mind that the moment you feed in untrusted text into a model all bets are off).

We're actively thinking and working on this, so will have more to announce soon, but this discussion is useful!

replies(1): >>45017179 #
simonw ◴[] No.45017179[source]
Even if you know the source of the text before you feed it to the model you still need to solve the problem of how to send untrusted text from a user through a model without that untrusted text being able to trigger additional tool calls or actions.

The most credible pattern I've seen for that comes from the DeepMind CaMeL paper - I would love to see a browser agent that robustly implemented those ideas: https://simonwillison.net/2025/Apr/11/camel/

replies(1): >>45017568 #
1. skaul ◴[] No.45017568[source]
> Even if you know the source of the text before you feed it to the model you still need to solve the problem of how to send untrusted text from a user through a model without that untrusted text being able to trigger additional tool calls or actions.

We're exploring taking the action plan that a reasoning model (which sees both trusted and untrusted text) comes up with and passing it to a second model, which doesn't see the untrusted text and which then evaluates it.

> The most credible pattern I've seen for that comes from the DeepMind CaMeL paper

Yeah we're aware of the CaMeL paper and are looking into it, but it's definitely challenging from an implementation pov.

Also, I see that we said "The browser should clearly separate the user’s instructions from the website’s contents when sending them as context to the model" in the blog post. That should have been "backend", not "model". Agreed that once you feed both trusted and untrusted tokens into the LLM the output must be considered unsafe.

replies(1): >>45019316 #
2. jrflowers ◴[] No.45019316[source]
>We're exploring taking the action plan that a reasoning model (which sees both trusted and untrusted text) comes up with and passing it to a second model, which doesn't see the untrusted text and which then evaluates it.

How is this different from the Dual-LLM pattern that’s described in the link that was posted? It immediately describes how that setup is still susceptible to prompt injection.

>With the Dual LLM pattern the P-LLM delegates the task of finding Bob’s email address to the Q-LLM—but the Q-LLM is still exposed to potentially malicious instructions.