AI companies cause most of traffic on forums

(pod.geraspora.de)

Show context

walterbell ◴[30 Dec 24 16:51 UTC] No.42551009[source]▶

OpenAI publishes IP ranges for their bots, https://github.com/greyhat-academy/lists.d/blob/main/scraper...

For antisocial scrapers, there's a Wordpress plugin, https://kevinfreitas.net/tools-experiments/

> The words you write and publish on your website are yours. Instead of blocking AI/LLM scraper bots from stealing your stuff why not poison them with garbage content instead? This plugin scrambles the words in the content on blog post and pages on your site when one of these bots slithers by.

replies(6): >>42551078 #>>42551167 #>>42551217 #>>42551446 #>>42551777 #>>42564313 #

GaggiX ◴[30 Dec 24 17:12 UTC] No.42551217[source]▶

>>42551009 #

I imagine these companies today are curing their data with LLMs, this stuff isn't going to do anything.

replies(4): >>42551300 #>>42551409 #>>42552071 #>>42552243 #

1. walterbell ◴[30 Dec 24 17:20 UTC] No.42551300[source]▶

>>42551217 #

Attackers don't have a monopoly on LLM expertise, defenders can also use LLMs for obfuscation.

Technology arms races are well understood.

replies(1): >>42551405 #

2. GaggiX ◴[30 Dec 24 17:29 UTC] No.42551405[source]▶

>>42551300 (TP) #

I hate LLM companies, I guess I'm going to use OpenAI API to "obfuscate" the content or maybe I will buy an NVIDIA GPU to run a llama model, mhm maybe on GPU cloud.

replies(1): >>42551571 #

3. walterbell ◴[30 Dec 24 17:44 UTC] No.42551571[source]▶

>>42551405 #

With tiny amounts of forum text, obfuscation can be done locally with open models and local inference hardware (NPU on Arm SoC). Zero dollars sent to OpenAI, NVIDIA, AMD or GPU clouds.

replies(2): >>42551604 #>>42551715 #

4. GaggiX ◴[30 Dec 24 17:46 UTC] No.42551604{3}[source]▶

>>42551571 #

>local inference hardware (NPU on Arm SoC).

Okay the battle is already lost from the beginning.

replies(1): >>42551759 #

5. pogue ◴[30 Dec 24 17:56 UTC] No.42551715{3}[source]▶

>>42551571 #

What specifically are you suggesting? Is this a project that already exists or a theory of yours?

replies(1): >>42552093 #

6. walterbell ◴[30 Dec 24 17:59 UTC] No.42551759{4}[source]▶

>>42551604 #

There are alternatives to NVIDIAmaxing with brute force. See the Chinese paper on DeepSeek V3, comparable to recent GPT and Claude, trained with 90% fewer resources. Research on efficient inference continues.

https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSee...

7. sangnoir ◴[30 Dec 24 18:29 UTC] No.42552093{4}[source]▶

>>42551715 #

Markov chains are ancient in AI-years, and don't need a GPU.

↑