(www.anthropic.com)

795 points davidbarker | 1 comments | 26 Aug 25 19:01 UTC | HN request time: 0s | source

Show context

biggestfan ◴[26 Aug 25 19:11 UTC] No.45030868[source]▶

According to their own blog post, even after mitigations, the model still has an 11% attack success rate. There's still no way I would feel comfortable giving this access to my main browser. I'm glad they're sticking to a very limited rollout for now. (Sidenote, why is this page so broken? Almost everything is hidden.)

replies(5): >>45030924 #>>45031456 #>>45031949 #>>45033353 #>>45034111 #

mark242 ◴[26 Aug 25 22:56 UTC] No.45033353[source]▶

>>45030868 #

11% success rate for what is effectively a spear-phishing attempt isn't that terrible and tbh it'll be easier to train Claude not to get tricked than it is to train eg my parents.

replies(4): >>45033380 #>>45033454 #>>45033795 #>>45039212 #

1. asdff ◴[26 Aug 25 22:59 UTC] No.45033380[source]▶

>>45033353 #

>Claude not to get tricked than it is to train eg my parents.

One would think but apparently from this blog post it is still succeptible to the same old prompt injections that have always been around. So I'm thinking it is not very easy to train Claude like this at all. Meanwhile with parents you could probably eliminate an entire security vector outright if you merely told them "bank at the local branch," or "call the number on the card for the bank don't try and look it up."

↑

Claude for Chrome