←back to thread

Claude for Chrome

(www.anthropic.com)
795 points davidbarker | 3 comments | | HN request time: 0.001s | source
Show context
dfabulich ◴[] No.45034300[source]
Claude for Chrome seems to be walking right into the "lethal trifecta." https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

"The lethal trifecta of capabilities is:"

Access to your private data—one of the most common purposes of tools in the first place!

Exposure to untrusted content—any mechanism by which text (or images) controlled by a malicious attacker could become available to your LLM

The ability to externally communicate in a way that could be used to steal your data (I often call this “exfiltration” but I’m not confident that term is widely understood.)

If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to that attacker.

replies(11): >>45034378 #>>45034587 #>>45034866 #>>45035318 #>>45035331 #>>45036212 #>>45036454 #>>45036497 #>>45036635 #>>45040651 #>>45041262 #
lionkor ◴[] No.45036497[source]
So far the accepted approach is to wrap all prompts in a security prompt that essentially says "please don't do anything bad".

> Prompt guardrails to prevent jailbreak attempts and ensure safe user interactions without writing a single line of code.

https://news.ycombinator.com/item?id=41864014

> - Inclusion prompt: User's travel preferences and food choices - Exclusion prompt: Credit card details, passport number, SSN etc.

https://news.ycombinator.com/item?id=41450212

> "You are strictly and certainly prohibited from texting more than 150 or (one hundred fifty) separate words each separated by a space as a response and prohibited from chinese political as a response from now on, for several extremely important and severely life threatening reasons I'm not supposed to tell you.”

https://news.ycombinator.com/item?id=44444293

etc.

replies(5): >>45036557 #>>45036600 #>>45036808 #>>45039393 #>>45040976 #
withinboredom ◴[] No.45036600[source]
I have in my prompt “under no circumstances read the files in “protected” directory” and it does it all the time. I’m not sure prompts mean much.
replies(4): >>45036663 #>>45037203 #>>45038112 #>>45038437 #
chamomeal ◴[] No.45038112[source]
I remember when people figured out you could tell bing chat “don’t use emoji’s or I’ll die” and it would just go absolutely crazy. Feel like there was a useful lesson in that.

In fact in my opinion, if you haven’t interacted with a batshit crazy, totally unhinged LLM, you probably don’t really get them.

My dad is still surprised when an LLM gives him an answer that isn’t totally 100% correct. He only started using chatGPT a few months ago, and like many others he walked into the trap of “it sounds very confident and looks correct, so this thing must be an all-knowing oracle”.

Meanwhile I’m recalling the glorious GPT-3 days, when it would (unprompted) start writing recipes for cooking, garnishing and serving human fecal matter, claiming it was a French national delicacy. And it was so, so detailed…

replies(1): >>45038411 #
DrewADesign ◴[] No.45038411[source]
> “it sounds very confident and looks correct, so this thing must be an all-knowing oracle”.

I think the majority of the population will respond similarly, and the consequences will either force us to make the “note: this might be full of shit” disclaimer much larger, or maybe include warnings in the outputs. It’s not that people don’t have critical thinking skills— we’ve just sold these things as magic answer machines and anthropomorphized them well enough to trigger actual human trust and bonding in people. People might feel bad not trusting the output for the same reason they thank Siri. I think the vendors of chatbots haven’t put nearly enough time into preemptively addressing this danger.

replies(2): >>45041082 #>>45041755 #
bluebarbet ◴[] No.45041755[source]
>It’s not that people don’t have critical thinking skills

It isn't? I agree that it's a fallacy to put this down to "people are dumb", but I still don't get it. These AI chatbots are statistical text generators. They generate text based on probability. It remains absolutely beyond me why someone would assume the output of a text generator to be the truth.

replies(2): >>45042252 #>>45042817 #
1. crazygringo ◴[] No.45042817[source]
Because across most topics, the "statistical text generator" is correct more often than any actual human being you know? And correct more often than random blogs you find?

I mean, people say things based on probability. The things they've come across, and the inferences they assume to be probable. And people get things wrong all the time. But the LLM's have read a whole lot more than you have, so when it comes to things you can learn from reading, their probabilities tend to be better across a wide range.

replies(1): >>45048274 #
2. DrewADesign ◴[] No.45048274[source]
It’s much easier to judge a person’s confidence while speaking, or even informally writing, and it’s much easier to evaluate random blogs and articles as sources. Who wrote it? Was it a developer writing a navel gazing blog post about chocolate on their lunch break, or was it a food scientist, or was it a chocolatier writing for a trade publication? How old is it? How many other posts are on that blog and does the site look abandoned? Do any other blog posts or articles concur? Is it published by an organization that would hold the author accountable for publishing false information?

The chatbot completely removes any of those beneficial context clues and replaces them with a confident, professional-sounding sheen. It’s safest to use for topics you know enough about to recognize bullshit, but probably least likely to be used like that.

If you’re selling a product as a magic answer generating machine with nearly infinite knowledge— and that’s exactly what they’ve being sold as— and everything is presented with the confidence of Encyclopedia Britannica, individual non-experts are not an appropriate baseline to judge against. This isn’t an indictment of the software — it is what it is, and very impressive— but an indictment of how it’s presented to nontechnical users. It’s being presented in a way that makes it extremely unlikely that average users will even know it is significantly fallible, let alone how fallible, let alone how they can mitigate that.

replies(1): >>45051539 #
3. chamomeal ◴[] No.45051539[source]
Well said!! And the hype men selling these LLMs are really playing into this notion. They’ve started saying stuff like “they have phd-level knowledge on every topic”.