Ban me at the IP level if you don't like me

i think there is an opportunity to train an neural network on browser user agent s(they are catalogued but vary and change a lot). then u can block everything not matching.

it will work better than regex. a lot of these companies rely on 'but we are clearly recognizable' via fornexample these user agents, as excuse to put burden on sysadmins to maintains blocklists instead of otherway round (keep list of scrapables..)

maybe someone mathy can unburden them ?

you could also look who ask for nonexisting resources, and block anyone who asks for more than X (large enough not to let config issue or so kill regular clients). block might be just a minute so u dont have too many risk when an FP occurs. it will be enough likely to make the scraper turn away.

there are many things to do depending on context, app complexity, load etc. , problem is there's no really easy way to do these things.

ML should be able to help a lot in such a space??