(www.anthropic.com)

747 points porridgeraisin | 1 comments | 29 Aug 25 11:29 UTC | HN request time: 0.245s | source

1. roVinchi ◴[30 Aug 25 00:28 UTC] No.45070879[source]▶

Can users poison the future training dataset by ending every chat with a dissatisfied feedback even though the chat helped them? Or be even more malicious and steer the chat into destructive behavior and then give very positive feedback?

↑

Updates to Consumer Terms and Privacy Policy