AI companies cause most of traffic on forums

(pod.geraspora.de)

770 points ta988 | 1 comments | 30 Dec 24 14:37 UTC | HN request time: 0s | source

Show context

uludag ◴[30 Dec 24 17:56 UTC] No.42551713[source]▶

I'm always curious how poisoning attacks could work. Like, suppose that you were able to get enough human users to produce poisoned content. This poisoned content would be human written and not just garbage, and would contain flawed reasoning, misjudgments, lapses of reasoning, unrealistic premises, etc.

Like, I've asked ChatGPT certain questions where I know the online sources are limited and it would seem that from a few datapoints it can come up with a coherent answer. Imagine attacks where people would publish code misusing libraries. With certain libraries you could easily outnumber real data with poisoned data.

replies(4): >>42552062 #>>42552110 #>>42552129 #>>42557901 #

m3047 ◴[30 Dec 24 18:33 UTC] No.42552129[source]▶

>>42551713 #

(I was going to post "run a bot motel" as a topline, but I get tired of sounding like broken record.)

To generate garbage data I've had good success using Markov Chains in the past. These days I think I'd try an LLM and turning up the "heat".

replies(1): >>42555366 #

Terr_ ◴[31 Dec 24 01:06 UTC] No.42555366[source]▶

>>42552129 #

Wouldn't your own LLM be overkill? Ideally one would generate decoy junk more much efficiently than these abusive/hostile attackers can steal it.

replies(2): >>42557449 #>>42561534 #

1. uludag ◴[31 Dec 24 08:52 UTC] No.42557449[source]▶

>>42555366 #

I still think this could worthwhile though for these reasons.

- One "quality" poisoned document may be able to do more damage - Many crawlers will be getting this poison, so this multiplies the effect by a lot - The cost of generation seems to be much below market value at the moment

↑