←back to thread

454 points positiveblue | 2 comments | | HN request time: 0s | source
Show context
matt-p ◴[] No.45066473[source]
I have zero issue with Ai Agents, if there's a real user behind there somewhere. I DO have a major issue with my sites being crawled extremely aggressively by offenders including Meta, Perplexity and OpenAI - it's really annoying realising that we're tying up several cpu cores on AI crawling. Less than on real users and google et al.
replies(6): >>45066494 #>>45066689 #>>45066754 #>>45067321 #>>45067530 #>>45068488 #
Operyl ◴[] No.45066494[source]
They're getting to the point of 200-300RPS for some of my smaller marketing sites, hallucinating URLs like crazy. It's fucking insane.
replies(2): >>45066518 #>>45066583 #
palmfacehn ◴[] No.45066583[source]
You'd think they would have an interest in developing reasonable crawling infrastructure, like Google, Bing or Yandex. Instead they go all in on hosts with no metering. All of the search majors reduce their crawl rate as request times increase.

On one hand these companies announce themselves as sophisticated, futuristic and highly-valued, on the other hand we see rampant incompetence, to the point that webmasters everywhere are debating the best course of action.

replies(3): >>45066630 #>>45071958 #>>45081606 #
matt-p ◴[] No.45066630[source]
Honestly it's just tragedy of the commons. Why put the effort in when you don't have to identify yourself, just crawl and if you get blocked move the job to another server.
replies(1): >>45066686 #
1. palmfacehn ◴[] No.45066686[source]
At this point I'm blocking several ASNs. Most are cloud provider related, but there are also some repurposed consumer ASNs coming out of the PRC. Long term, this devalues the offerings of those cloud providers, as prospective customers will not be able to use them for crawling.
replies(1): >>45091584 #
2. account42 ◴[] No.45091584[source]
This is the correct solution and is how network abuse has been dealt with before the latest fad. Network operators can either police their own users or be blocked/throttled wholesale. There isn't anything more needed except for the willingness to apply measures to networks that are "too big to fail".