←back to thread

770 points ta988 | 1 comments | | HN request time: 0.001s | source
Show context
walterbell ◴[] No.42551009[source]
OpenAI publishes IP ranges for their bots, https://github.com/greyhat-academy/lists.d/blob/main/scraper...

For antisocial scrapers, there's a Wordpress plugin, https://kevinfreitas.net/tools-experiments/

> The words you write and publish on your website are yours. Instead of blocking AI/LLM scraper bots from stealing your stuff why not poison them with garbage content instead? This plugin scrambles the words in the content on blog post and pages on your site when one of these bots slithers by.

replies(6): >>42551078 #>>42551167 #>>42551217 #>>42551446 #>>42551777 #>>42564313 #
GaggiX ◴[] No.42551217[source]
I imagine these companies today are curing their data with LLMs, this stuff isn't going to do anything.
replies(4): >>42551300 #>>42551409 #>>42552071 #>>42552243 #
1. luckylion ◴[] No.42552243[source]
That opens up the opposite attack though: what do you need to do to get your content discarded by the AI?

I doubt you'd have much trouble passing LLM-generated text through their checks, and of course the requirements for you would be vastly different. You wouldn't need (near) real-time, on-demand work, or arbitrary input. You'd only need to (once) generate fake doppelganger content for each thing you publish.

If you wanted to, you could even write this fake content yourself if you don't mind the work. Feed Open AI all those rambling comments you had the clarity not to send.