Most active commenters

Popular/hot comments

(pod.geraspora.de)

Show context

markerz ◴[30 Dec 24 17:07 UTC] No.42551173[source]▶

One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...

It seems a bit naive for some reason and doesn't do performance back-off the way I would expect from Google Bot. It just kept repeatedly requesting more and more until my server crashed, then it would back off for a minute and then request more again.

My solution was to add a Cloudflare rule to block requests from their User-Agent. I also added more nofollow rules to links and a robots.txt but those are just suggestions and some bots seem to ignore them.

Cloudflare also has a feature to block known AI bots and even suspected AI bots: https://blog.cloudflare.com/declaring-your-aindependence-blo... As much as I dislike Cloudflare centralization, this was a super convenient feature.

replies(14): >>42551260 #>>42551410 #>>42551412 #>>42551513 #>>42551649 #>>42551742 #>>42552017 #>>42552046 #>>42552437 #>>42552763 #>>42555123 #>>42562686 #>>42565119 #>>42572754 #

CoastalCoder ◴[30 Dec 24 17:30 UTC] No.42551410[source]▶

>>42551173 #

I wonder if it would work to send Meta's legal department a notice that they are not permitted to access your website.

Would that make subsequent accesses be violations of the U.S.'s Computer Fraud and Abuse Act?

replies(3): >>42551475 #>>42551557 #>>42551847 #

1. betaby ◴[30 Dec 24 17:37 UTC] No.42551475[source]▶

>>42551410 #

Crashing wasn't the intent. And scraping is legal, as I remember per Linkedin case.

replies(3): >>42551556 #>>42551790 #>>42551791 #

2. azemetre ◴[30 Dec 24 17:43 UTC] No.42551556[source]▶

>>42551475 (TP) #

There’s a fine line between scrapping and DDOS’ing I’m sure.

Just because you manufacture chemicals doesn’t mean you can legally dump your toxic waste anywhere you want (well shouldn’t be allowed to at least).

You also shouldn’t be able to set your crawlers causing sites to fail.

replies(2): >>42551594 #>>42576313 #

3. acedTrex ◴[30 Dec 24 17:46 UTC] No.42551594[source]▶

>>42551556 #

intent is likely very important to something like a ddos charge

replies(4): >>42551704 #>>42551735 #>>42551816 #>>42551902 #

4. iinnPP ◴[30 Dec 24 17:55 UTC] No.42551704{3}[source]▶

>>42551594 #

Wilful ignorance is generally enough.

5. gameman144 ◴[30 Dec 24 17:57 UTC] No.42551735{3}[source]▶

>>42551594 #

Maybe, but impact can also make a pretty viable case.

For instance, if you own a home you may have an easement on part of your property that grants other cars from your neighborhood access to pass through it rather than going the long way around.

If Amazon were to build a warehouse on one side of the neighborhood, however, it's not obvious that they would be equally legally justified to send their whole fleet back and forth across it every day, even though their intent is certainly not to cause you any discomfort at all.

6. echelon ◴[30 Dec 24 18:02 UTC] No.42551790[source]▶

>>42551475 (TP) #

Then you can feed them deliberately poisoned data.

Send all of your pages through an adversarial LLM to pollute and twist the meaning of the underlying data.

replies(1): >>42552788 #

7. franga2000 ◴[30 Dec 24 18:02 UTC] No.42551791[source]▶

>>42551475 (TP) #

If I make a physical robot and it runs someone over, I'm still liable, even though it was a delivery robot, not a running over people robot.

If a bot sends so many requests that a site completely collapses, the owner is liable, even though it was a scraping bot and not a denial of service bot.

replies(1): >>42552206 #

8. RF_Savage ◴[30 Dec 24 18:04 UTC] No.42551816{3}[source]▶

>>42551594 #

So have the stressor and stress testing DDoS for hire sites changed to scraping yet?

replies(1): >>42559369 #

9. layer8 ◴[30 Dec 24 18:11 UTC] No.42551902{3}[source]▶

>>42551594 #

So is negligence. Or at least I would hope so.

10. stackghost ◴[30 Dec 24 18:40 UTC] No.42552206[source]▶

>>42551791 #

The law doesn't work by analogy.

replies(1): >>42563460 #

11. cess11 ◴[30 Dec 24 19:55 UTC] No.42552788[source]▶

>>42551790 #

The scraper bots can remain irrational longer than you can stay solvent.

12. acedTrex ◴[31 Dec 24 15:44 UTC] No.42559369{4}[source]▶

>>42551816 #

The courts will likely be able to discern between "good faith" scraping and a DDoS for hire masquerading as scraping.

13. maximinus_thrax ◴[01 Jan 25 02:14 UTC] No.42563460{3}[source]▶

>>42552206 #

Except when it does https://en.wikipedia.org/wiki/Analogy_(law)

14. herbst ◴[02 Jan 25 17:15 UTC] No.42576313[source]▶

>>42551556 #

It's like these AI companies have to invent scraping spiders again from scratch. I don't know how often I have been ddosed to complete site failure and still ongoing by random scrapers just the last few months.

↑

AI companies cause most of traffic on forums