←back to thread

211 points CrankyBear | 1 comments | | HN request time: 0.209s | source
Show context
RebeccaTheDev ◴[] No.45107678[source]
I'll add my voice to others here that this is a huge problem especially for small hobbyist websites.

I help administer a somewhat popular railroading forum. We've had some of these AI crawlers hammering the site to the point that it became unusable to actual human beings. You design your architecture around certain assumptions, and one of those was definitely not "traffic quintuples."

We've ended up blocking lots of them, but it's a neverending game of whack-a-mole.

replies(1): >>45110502 #
1. benjiro ◴[] No.45110502[source]
> one of those was definitely not "traffic quintuples."

O, it was... People warned about the mass usage of WordPress because of its performance issues.

The internet usage kept growing, even without LLM scraping in mass. Everybody wants more and more up to date info, recent price checks, and so many other features. This trend has been going on for over 10+ years.

Its just now, that bot scraping for LLMs has pushed some sites over the edge.

> We've ended up blocking lots of them, but it's a neverending game of whack-a-mole.

And unless you block every IP, you can not stop them. Its really easy to hide scrapers, especially if you use a slow scrap rate.

The issue comes when you have like one of the posters here, a setup where a DB call takes up to 1s for some product pages that are not in cache. Those sites already lived on borrowed time.

Ironically, better software on their site (like not using WP), will allow them to handle easily 1000x the volume for the same resources. And do not get me started in how badly configured a lot of sites are in the backend.

People are kind of blaming the wrong issue. Our needs for up to date, data, has been growing for over the last 10 years. Its just that people considered website that took 400ms to generate a webpage as ok. (when in reality they are wasting tons of resource or are limited in the backend)