←back to thread

646 points blendergeek | 1 comments | | HN request time: 0.4s | source
Show context
quchen ◴[] No.42725651[source]
Unless this concept becomes a mass phenomenon with many implementations, isn’t this pretty easy to filter out? And furthermore, since this antagonizes billion-dollar companies that can spin up teams doing nothing but browse Github and HN for software like this to prevent polluting their datalakes, I wonder whether this is a very efficient approach.
replies(9): >>42725708 #>>42725957 #>>42725983 #>>42726183 #>>42726352 #>>42726426 #>>42727567 #>>42728923 #>>42730108 #
Blackthorn ◴[] No.42725957[source]
If it means it makes your own content safe when you deploy it on a corner of your website: mission accomplished!
replies(2): >>42726400 #>>42727416 #
1. Blackthorn ◴[] No.42728175[source]
You've got to be seriously AI-drunk to equate letting your site be crawled by commercial scrapers with "contributing to humanity".

Maybe you don't want your your stuff to get thrown into the latest silicon valley commercial operation without getting paid for it. That seems like a valid position to take. Or maybe you just don't want Claude's ridiculously badly behaved scraper to chew through your entire budget.

Regardless, scrapers that don't follow the rules like robots.txt pretty quickly will discover why those rules exist in the first place as they receive increasing amounts of garbage.