←back to thread

770 points ta988 | 1 comments | | HN request time: 0s | source
Show context
walterbell ◴[] No.42551009[source]
OpenAI publishes IP ranges for their bots, https://github.com/greyhat-academy/lists.d/blob/main/scraper...

For antisocial scrapers, there's a Wordpress plugin, https://kevinfreitas.net/tools-experiments/

> The words you write and publish on your website are yours. Instead of blocking AI/LLM scraper bots from stealing your stuff why not poison them with garbage content instead? This plugin scrambles the words in the content on blog post and pages on your site when one of these bots slithers by.

replies(6): >>42551078 #>>42551167 #>>42551217 #>>42551446 #>>42551777 #>>42564313 #
GaggiX ◴[] No.42551217[source]
I imagine these companies today are curing their data with LLMs, this stuff isn't going to do anything.
replies(4): >>42551300 #>>42551409 #>>42552071 #>>42552243 #
botanical76 ◴[] No.42551409[source]
You're right, this approach is too easy to spot. Instead, pass all your blog posts through an LLM to automatically inject grammatically sound inaccuracies.
replies(1): >>42551431 #
GaggiX ◴[] No.42551431[source]
Are you going to use OpenAI API or maybe setup a Meta model on an NVIDIA GPU? Ahah

Edit: I found it funny to buy hardware/compute to only fund what you are trying to stop.

replies(2): >>42551579 #>>42551596 #
1. ◴[] No.42551579{3}[source]