←back to thread

454 points positiveblue | 5 comments | | HN request time: 0s | source
Show context
matt-p ◴[] No.45066473[source]
I have zero issue with Ai Agents, if there's a real user behind there somewhere. I DO have a major issue with my sites being crawled extremely aggressively by offenders including Meta, Perplexity and OpenAI - it's really annoying realising that we're tying up several cpu cores on AI crawling. Less than on real users and google et al.
replies(6): >>45066494 #>>45066689 #>>45066754 #>>45067321 #>>45067530 #>>45068488 #
1. swed420 ◴[] No.45067530[source]
> I DO have a major issue with my sites being crawled extremely aggressively by offenders including Meta, Perplexity and OpenAI

Gee, if only we had, like, one central archive of the internet. We could even call it the internet archive.

Then, all these AI companies could interface directly with that single entity on terms that are agreeable.

replies(2): >>45067816 #>>45074266 #
2. teitoklien ◴[] No.45067816[source]
you think they care about that ? they’d still crawl like this just in case which is why they don’t rate limit atm
replies(1): >>45078306 #
3. gck1 ◴[] No.45074266[source]
Internet Archive is missing enormous chunks of the internet though. And I don't mean weird parts of the internet, just regional stuff.

Not even news articles from top 10 news websites from my country are usually indexed there.

replies(1): >>45078310 #
4. swed420 ◴[] No.45078306[source]
It would of course need to be legally enforced somehow, with penalties high enough to hurt even the big players.
5. swed420 ◴[] No.45078310[source]
So then make a better one. I was only referencing it as a general concept that can be approved upon as desired.