←back to thread

770 points ta988 | 1 comments | | HN request time: 0.206s | source
Show context
walterbell ◴[] No.42551009[source]
OpenAI publishes IP ranges for their bots, https://github.com/greyhat-academy/lists.d/blob/main/scraper...

For antisocial scrapers, there's a Wordpress plugin, https://kevinfreitas.net/tools-experiments/

> The words you write and publish on your website are yours. Instead of blocking AI/LLM scraper bots from stealing your stuff why not poison them with garbage content instead? This plugin scrambles the words in the content on blog post and pages on your site when one of these bots slithers by.

replies(6): >>42551078 #>>42551167 #>>42551217 #>>42551446 #>>42551777 #>>42564313 #
brookst ◴[] No.42551078[source]
The latter is clever but unlikely to do any harm. These companies spend a fortune on pre-training efforts and doubtlessly have filters to remove garbage text. There are enough SEO spam pages that just list nonsense words that they would have to.
replies(5): >>42551122 #>>42551337 #>>42551547 #>>42552581 #>>42562028 #
1. wood_spirit ◴[] No.42562028[source]
Rather than garbage, perhaps just serve up something irrelevant and banal? Or splice sentences from various random project Gutenberg books? And add in a tarpit for good measure.

At least in the end it gives the programmer one last hoorah before the AI makes us irrelevant :)