←back to thread

646 points blendergeek | 1 comments | | HN request time: 0s | source
Show context
grajaganDev ◴[] No.42725460[source]
This keeps generating new pages to keep the crawler occupied.

Looks like this would tarpit any web crawler.

replies(1): >>42725575 #
BryantD ◴[] No.42725575[source]
It would indeed. Note the warning: "There is not currently a way to differentiate between web crawlers that are indexing sites for search purposes, vs crawlers that are training AI models. ANY SITE THIS SOFTWARE IS APPLIED TO WILL LIKELY DISAPPEAR FROM ALL SEARCH RESULTS."
replies(3): >>42725586 #>>42725898 #>>42726004 #
jsheard ◴[] No.42725898[source]
Real search engines respect robots.txt so you could just tell them not to enter Markov Chain Hell.
replies(1): >>42726318 #
throwaway744678 ◴[] No.42726318[source]
I suspect AI crawler would also (quickly learn to) respect it also?
replies(2): >>42726334 #>>42726348 #
1. ◴[] No.42726348[source]