←back to thread

770 points ta988 | 7 comments | | HN request time: 0s | source | bottom
Show context
walterbell ◴[] No.42551009[source]
OpenAI publishes IP ranges for their bots, https://github.com/greyhat-academy/lists.d/blob/main/scraper...

For antisocial scrapers, there's a Wordpress plugin, https://kevinfreitas.net/tools-experiments/

> The words you write and publish on your website are yours. Instead of blocking AI/LLM scraper bots from stealing your stuff why not poison them with garbage content instead? This plugin scrambles the words in the content on blog post and pages on your site when one of these bots slithers by.

replies(6): >>42551078 #>>42551167 #>>42551217 #>>42551446 #>>42551777 #>>42564313 #
GaggiX ◴[] No.42551217[source]
I imagine these companies today are curing their data with LLMs, this stuff isn't going to do anything.
replies(4): >>42551300 #>>42551409 #>>42552071 #>>42552243 #
1. walterbell ◴[] No.42551300[source]
Attackers don't have a monopoly on LLM expertise, defenders can also use LLMs for obfuscation.

Technology arms races are well understood.

replies(1): >>42551405 #
2. GaggiX ◴[] No.42551405[source]
I hate LLM companies, I guess I'm going to use OpenAI API to "obfuscate" the content or maybe I will buy an NVIDIA GPU to run a llama model, mhm maybe on GPU cloud.
replies(1): >>42551571 #
3. walterbell ◴[] No.42551571[source]
With tiny amounts of forum text, obfuscation can be done locally with open models and local inference hardware (NPU on Arm SoC). Zero dollars sent to OpenAI, NVIDIA, AMD or GPU clouds.
replies(2): >>42551604 #>>42551715 #
4. GaggiX ◴[] No.42551604{3}[source]
>local inference hardware (NPU on Arm SoC).

Okay the battle is already lost from the beginning.

replies(1): >>42551759 #
5. pogue ◴[] No.42551715{3}[source]
What specifically are you suggesting? Is this a project that already exists or a theory of yours?
replies(1): >>42552093 #
6. walterbell ◴[] No.42551759{4}[source]
There are alternatives to NVIDIAmaxing with brute force. See the Chinese paper on DeepSeek V3, comparable to recent GPT and Claude, trained with 90% fewer resources. Research on efficient inference continues.

https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSee...

7. sangnoir ◴[] No.42552093{4}[source]
Markov chains are ancient in AI-years, and don't need a GPU.