←back to thread

136 points xnx | 3 comments | | HN request time: 0.6s | source
Show context
toomuchtodo ◴[] No.43708221[source]
This sounds good to take the ML/AI consumption load off Wikimedia infra?
replies(1): >>43708252 #
immibis ◴[] No.43708252[source]
The consumption load isn't the problem. You can download a complete dump of Wikipedia and even if every AI company downloaded the newest dump every time it came out, it would be a manageable server load - you know, probably double-digit terabytes per month, but that's manageable these days. And if that would be a problem, they could charge a reasonable amount to get it on a stack of BD-R discs, or heck, these companies can easily afford a leased line to Wikimedia HQ.

The problem is the non-consumptive load where they just flat-out DDoS the site for no actual reason. They should be criminally charged for that.

Late edit: Individual page loads to answer specific questions aren't a problem either. DDoS is the problem.

replies(2): >>43708581 #>>43710312 #
1. squigz ◴[] No.43710312[source]
I'm confused. Are you suggesting that the AI companies actively participate in malicious DDOS campaigns against Wikimedia, for no constructive reason?

Is there a source on this?

replies(1): >>43713522 #
2. kbelder ◴[] No.43713522[source]
Not maliciousness. Incompetence.

Bot traffic is notoriously stupid, reloading the same pages over and over, surging one hour and then gone the next, getting stuck in loops, not understand html response codes... It's only gotten worse with all the AI scrapers. Somehow, they seem even more poorly written than the search engine bots.

replies(1): >>43714226 #
3. immibis ◴[] No.43714226[source]
Mine disappeared after about a week of serving them all the same dummy page on every request. They were fetching the images on the dummy page once for each time they fetched the page...