←back to thread

556 points campuscodi | 5 comments | | HN request time: 0.834s | source
1. prmoustache ◴[] No.41867591[source]
I believe this also pose issues to people running adblockers. I get tons of repetitive captchas on some websites.

Also other companies offering similar services like imperva seems to be straight banning my ip after one visit to a website with uBlock Origin I first get a captcha, then a page saying I am not allowed, and whatever I do, even using an extensionless chrome browser with a new profile I can't visit it anymore because my ip is banned.

replies(1): >>41868970 #
2. acdha ◴[] No.41868970[source]
One thing to keep in mind is that the modern web sees a lot of spam and scraping, and ad revenue has been sliding for years. If you make your activity look like a not, most operators will assume you’re not generating revenue and block you. It sucks but thank a spammer for the situation.
replies(1): >>41877093 #
3. immibis ◴[] No.41877093[source]
They should provide an API if they don't like scraping, but also, any sane scraper isn't really a problem, unless you are trying to enshittify your site by forcing people to use your app. I heard some AI scrapers are insane, and should be individually blocked.
replies(2): >>41877366 #>>41878903 #
4. Klonoar ◴[] No.41877366{3}[source]
Some AI scrapers have been proven to not report themselves as AI scrapers and mimic true users.

This is part of what’s leading to the bludgeoning approach you see with blocking. They are not an individual thjng that can be blocked.

5. acdha ◴[] No.41878903{3}[source]
“Sane scraper” doesn’t have a definition or anyone to enforce it. Similarly, APIs aren’t magic - if you make things publicly available, people will harvest it whether that’s with a 90s-style bot making individual requests or a headless browser which runs the JavaScript you use to make API calls.

The other thing to think about is the lack of enforcement: you can’t complain to the bot police when some dude in China decides to harvest your data, and if you try blocking by user-agent or IP you’ll play whack-a-mole trying to stay ahead of the bot operators who will spoof the former and churn the latter. After developing an appreciation for why security people talk about validating correctness rather than trying to enumerate badness, you’ll end up with a combination of rate-limiting and broader blocking for the same reasons. Yes, it’s no fun but the problem isn’t the sites but the people abusing the free services we’ve been given.