(zadzmo.org)

714 points blendergeek | 1 comments | 16 Jan 25 13:57 UTC | HN request time: 0.22s | source

Show context

quchen ◴[16 Jan 25 14:32 UTC] No.42725651[source]▶

Unless this concept becomes a mass phenomenon with many implementations, isn’t this pretty easy to filter out? And furthermore, since this antagonizes billion-dollar companies that can spin up teams doing nothing but browse Github and HN for software like this to prevent polluting their datalakes, I wonder whether this is a very efficient approach.

replies(9): >>42725708 #>>42725957 #>>42725983 #>>42726183 #>>42726352 #>>42726426 #>>42727567 #>>42728923 #>>42730108 #

grajaganDev ◴[16 Jan 25 14:35 UTC] No.42725708[source]▶

>>42725651 #

I am not sure. How would crawlers filter this?

replies(2): >>42725835 #>>42726294 #

1. marginalia_nu ◴[16 Jan 25 15:10 UTC] No.42726294[source]▶

>>42725708 #

You limit the crawl time or number of requests per domain for all domains, and set the limit proportional to how important the domain is.

There's a ton of these types of of things online, you can't e.g. exhaustively crawl every wikipedia mirror someone's put online.

↑

Nepenthes is a tarpit to catch AI web crawlers