Most active commenters
  • throwaway290(4)
  • benhurmarcel(3)

←back to thread

770 points ta988 | 16 comments | | HN request time: 0.506s | source | bottom
Show context
markerz ◴[] No.42551173[source]
One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...

It seems a bit naive for some reason and doesn't do performance back-off the way I would expect from Google Bot. It just kept repeatedly requesting more and more until my server crashed, then it would back off for a minute and then request more again.

My solution was to add a Cloudflare rule to block requests from their User-Agent. I also added more nofollow rules to links and a robots.txt but those are just suggestions and some bots seem to ignore them.

Cloudflare also has a feature to block known AI bots and even suspected AI bots: https://blog.cloudflare.com/declaring-your-aindependence-blo... As much as I dislike Cloudflare centralization, this was a super convenient feature.

replies(14): >>42551260 #>>42551410 #>>42551412 #>>42551513 #>>42551649 #>>42551742 #>>42552017 #>>42552046 #>>42552437 #>>42552763 #>>42555123 #>>42562686 #>>42565119 #>>42572754 #
MetaWhirledPeas ◴[] No.42551742[source]
> Cloudflare also has a feature to block known AI bots and even suspected AI bots

In addition to other crushing internet risks, add wrongly blacklisted as a bot to the list.

replies(4): >>42551773 #>>42552921 #>>42562510 #>>42564887 #
1. throwaway290 ◴[] No.42551773[source]
What do you mean crushing risk? Just solve these 12 puzzles by moving tiny icons on tiny canvas while on the phone and you are in the clear for a couple more hours!
replies(3): >>42552006 #>>42552586 #>>42552825 #
2. gs17 ◴[] No.42552006[source]
If it clears you at all. I accidentally set a user agent switcher on for every site instead of the one I needed it for, and Cloudflare would give me an infinite loop of challenges. At least turning it off let me use the Internet again.
3. homebrewer ◴[] No.42552586[source]
If you live in a region which it is economically acceptable to ignore the existence of (I do), you sometimes get blocked by website r̶a̶c̶k̶e̶t̶ protection for no reason at all, simply because some "AI" model saw a request coming from an unusual place.
4. benhurmarcel ◴[] No.42552825[source]
Sometimes it doesn’t even give you a Captcha.

I have come across some websites that block me using Cloudflare with no way of solving it. I’m not sure why, I’m in a large first-world country, I tried a stock iPhone and a stock Windows PC, no VPN or anything.

That’s just no way to know.

replies(2): >>42555004 #>>42570541 #
5. dannyw ◴[] No.42555004[source]
That’s probably a page/site rule set by the website owner. Some sites block EU IPs as the costs of complying with GDPR outweigh the gain.
replies(2): >>42556915 #>>42557953 #
6. throwaway290 ◴[] No.42556915{3}[source]
I saw GDPR related blockage like literally twice in a few years and I connect from EU IP almost all the time

Overload of captcha is not about GDPR...

but the issue is strange. @benhurmarcel I would check if there is somebody or some company nearby abusing stuff and you got under the hammer. Maybe unscrupulous VPN company. Using a good VPN can in fact make things better (but will cost money) or if you have a place to put your own all the better. otherwise check if you can change your IP with provider or change providers or move I guess...

not to excuse CF racket but as this thread shows the data hungry artificial stupidity leaves no choice to some sites

replies(2): >>42557971 #>>42565139 #
7. benhurmarcel ◴[] No.42557953{3}[source]
One of the affected websites is a local cafe in the EU. It doesn’t make any sense to block EU IPs.
8. benhurmarcel ◴[] No.42557971{4}[source]
Does it work only based on the IP?

I also tried from a mobile 4G connection, it’s the same.

replies(1): >>42564209 #
9. throwaway290 ◴[] No.42564209{5}[source]
This may be too paranoid, but if your mobile IP is persistent and phone was compromised and is serving as a proxy for bots then it could explain why your IP fell out of favor
replies(1): >>42565170 #
10. EVa5I7bHFq9mnYK ◴[] No.42565139{4}[source]
I found it's best to use VPSes from young and little known hosting companies, as their IP is not yet on the blacklists.
11. EVa5I7bHFq9mnYK ◴[] No.42565170{6}[source]
You don't get your own external IP with the phone, it's shared, like NAT.
replies(2): >>42565485 #>>42566337 #
12. throwaway290 ◴[] No.42565485{7}[source]
Depends on provider/plan
13. scarface_74 ◴[] No.42566337{7}[source]
I get a different IPv4 and IPv6 address every time I toggle airplane mode on and off.
replies(1): >>42571480 #
14. ◴[] No.42570541[source]
15. lazide ◴[] No.42571480{8}[source]
Externally routable IPv4, or just a different between-a-cgnat address?
replies(1): >>42571679 #
16. scarface_74 ◴[] No.42571679{9}[source]
Externally routable IPv4 as seen by whatismyip.com.