This sounds good to take the ML/AI consumption load off Wikimedia infra?
replies(1):
The problem is the non-consumptive load where they just flat-out DDoS the site for no actual reason. They should be criminally charged for that.
Late edit: Individual page loads to answer specific questions aren't a problem either. DDoS is the problem.
I was at an interview for a tier one AI lab and the pm I was taking to refused to believe that the torrent dumps from Wikipedia were fresh and usable for training.
When you spend all your time fighting bot detection measures it's hard to imagine someone willingly putting up their data out there for free.