(pod.geraspora.de)

770 points ta988 | 2 comments | 30 Dec 24 14:37 UTC | HN request time: 0.47s | source

Show context

walterbell ◴[30 Dec 24 16:51 UTC] No.42551009[source]▶

OpenAI publishes IP ranges for their bots, https://github.com/greyhat-academy/lists.d/blob/main/scraper...

For antisocial scrapers, there's a Wordpress plugin, https://kevinfreitas.net/tools-experiments/

> The words you write and publish on your website are yours. Instead of blocking AI/LLM scraper bots from stealing your stuff why not poison them with garbage content instead? This plugin scrambles the words in the content on blog post and pages on your site when one of these bots slithers by.

replies(6): >>42551078 #>>42551167 #>>42551217 #>>42551446 #>>42551777 #>>42564313 #

brookst ◴[30 Dec 24 16:58 UTC] No.42551078[source]▶

>>42551009 #

The latter is clever but unlikely to do any harm. These companies spend a fortune on pre-training efforts and doubtlessly have filters to remove garbage text. There are enough SEO spam pages that just list nonsense words that they would have to.

replies(5): >>42551122 #>>42551337 #>>42551547 #>>42552581 #>>42562028 #

1. walterbell ◴[30 Dec 24 17:02 UTC] No.42551122[source]▶

>>42551078 #

Obfuscators can evolve alongside other LLM arms races.

replies(1): >>42551385 #

2. ben_w ◴[30 Dec 24 17:27 UTC] No.42551385[source]▶

>>42551122 (TP) #

Yes, but with an attacker having advantage because it directly improves their own product even in the absence of this specific motivation for obfuscation: any Completely Automated Public Turing test to tell Computers and Humans Apart can be used to improve the output of an AI by requiring the AI to pass that test.

And indeed, this has been part of the training process for at least some of OpenAI models before most people had heard of them.

↑

AI companies cause most of traffic on forums