←back to thread

211 points CrankyBear | 1 comments | | HN request time: 0s | source
Show context
k310 ◴[] No.45105673[source]
> Cloud services company Fastly agrees. It reports that 80% of all AI bot traffic comes from AI data fetcher bots.

No kidding. An increasing number of sites are putting up CAPTCHA's.

Problem? CAPTCHAS are annoying, they're a 50 times a day eye exam, and

> Google's reCAPTCHA is not only useless, it's also basically spyware [0]

> reCAPTCHA v3's checkbox test doesn't stop bots and tracks user data

[0] https://www.techspot.com/news/106717-google-recaptcha-not-on...

replies(5): >>45105796 #>>45106416 #>>45106468 #>>45106701 #>>45110554 #
benjiro ◴[] No.45110554[source]
Ironic part ... LLM are very good as solving CAPTCHA's. So the only people bothered by those same CAPTCHA's are the actual site visitors.

What sites need to do is temp block repeat request from the same IPs. Sure, some agents use 10.000's of IP's but if they are really so aggressive as people state, your going to run into the same IP's way more often then normal users.

That will kick out the over aggressive guys. I have done web scraping and limited it to around 1r/s. You never run into any blocking or detection that way because you hardly show up. But when you have some *** that send 1000's off parallel request down a website, because they never figured out query builders for large page hits. And do not know how to build checks to see from last-update pages.

One of the main issues i see, is some people simply write the most basic of basic scrapers. See link, follow, spawn process, scrap, see 100 more links ... Updates? Just rescrap website, repeat, repeat... Because it takes time to make a scrap template for each website, that knows where to check for updated. So some never bother.

replies(1): >>45111648 #
1. k310 ◴[] No.45111648[source]
I often use a VPN or iCloud private relay. Some sites gripe “too many accesses (downloads) from your IP address today.”

The devil’s in the details. I (a non-bot) sometimes resort to VPN-flipping.

I suppose that some bots try this, just a wild guess.