Nepenthes is a tarpit to catch AI web crawlers

(zadzmo.org)

714 points blendergeek | 1 comments | 16 Jan 25 13:57 UTC | HN request time: 0.197s | source

Show context

observationist[dead post] ◴[16 Jan 25 15:30 UTC] No.42726591[source]▶

[flagged]

jsheard ◴[16 Jan 25 15:39 UTC] No.42726726[source]▶

This is a really bad take, it's not like this server is hacking clients which connect to it. It's providing perfectly valid HTTP responses that just happen to be slow and full of markov gibberish, any harm which comes of that is self inflicted by assuming that websites must provide valuable data as a matter of course.

If AI companies want to sue webmasters for that then by all means, they can waste their money and get laughed out of court.

replies(3): >>42726813 #>>42726898 #>>42729375 #

observationist[dead post] ◴[16 Jan 25 15:45 UTC] No.42726813[source]▶

>>42726726 #

[flagged]

blibble ◴[16 Jan 25 15:56 UTC] No.42726981[source]▶

>>42726813 #

> If you want to protect your content, use the technical mechanisms that are available,

> You can choose to gatekeep your content, and by doing so, make it unscrapeable, and legally protected.

so... robots.txt, which the AI parasites ignore?

> Also, consider that relatively small, cheap llms are able to parse the difference between meaningful content and Markovian jabber such as this software produces.

okay, so it's not damaging, and there you've refuted your entire argument

replies(1): >>42727385 #

observationist[dead post] ◴[16 Jan 25 16:23 UTC] No.42727385[source]▶

>>42726981 #

[flagged]

jsheard ◴[16 Jan 25 16:43 UTC] No.42727632[source]▶

>>42727385 #

> No, put up a loginwall or paywall, authenticate users, and go private.

We know for a fact that AI companies don't respect that, if they want data that's behind a paywall then they'll jump through hoops to take it anyway.

https://www.theguardian.com/technology/2025/jan/10/mark-zuck...

If they don't have to abide by "norms" then we don't have to for their sake. Fuck 'em.

replies(1): >>42727731 #