←back to thread

770 points ta988 | 1 comments | | HN request time: 0s | source
Show context
uludag ◴[] No.42551713[source]
I'm always curious how poisoning attacks could work. Like, suppose that you were able to get enough human users to produce poisoned content. This poisoned content would be human written and not just garbage, and would contain flawed reasoning, misjudgments, lapses of reasoning, unrealistic premises, etc.

Like, I've asked ChatGPT certain questions where I know the online sources are limited and it would seem that from a few datapoints it can come up with a coherent answer. Imagine attacks where people would publish code misusing libraries. With certain libraries you could easily outnumber real data with poisoned data.

replies(4): >>42552062 #>>42552110 #>>42552129 #>>42557901 #
m3047 ◴[] No.42552129[source]
(I was going to post "run a bot motel" as a topline, but I get tired of sounding like broken record.)

To generate garbage data I've had good success using Markov Chains in the past. These days I think I'd try an LLM and turning up the "heat".

replies(1): >>42555366 #
Terr_ ◴[] No.42555366[source]
Wouldn't your own LLM be overkill? Ideally one would generate decoy junk more much efficiently than these abusive/hostile attackers can steal it.
replies(2): >>42557449 #>>42561534 #
1. m3047 ◴[] No.42561534[source]
I didn't run the text generator in real time (that would defeat the point of shifting cost to the adversary, wouldn't it?). I created and cached a corpus, and then selectively made small edits (primarily URL rewriting) on the way out.