←back to thread

770 points ta988 | 1 comments | | HN request time: 0s | source
Show context
mentalgear ◴[] No.42551541[source]
Note-worthy from the article (as some commentators suggested blocking them).

"If you try to rate-limit them, they’ll just switch to other IPs all the time. If you try to block them by User Agent string, they’ll just switch to a non-bot UA string (no, really). This is literally a DDoS on the entire internet."

replies(5): >>42551717 #>>42551976 #>>42552122 #>>42552700 #>>42552885 #
loeg ◴[] No.42552122[source]
I'd kind of like to see that claim substantiated a little more. Is it all crawlers that switch to a non-bot UA, or how are they determining it's the same bot? What non-bot UA do they claim?
replies(3): >>42552172 #>>42552177 #>>42555570 #
alphan0n ◴[] No.42555570[source]
I would take anything the author said with a grain of salt. They straight up lied about the configuration of the robots.txt file.

https://news.ycombinator.com/item?id=42551628

replies(2): >>42563001 #>>42567297 #
ribadeo ◴[] No.42567297[source]
How do you know what the contextual configuration of their robots.txt is/was?

Your accusation was directly addressed by the author in a comment on the original post, IIRC

i find your attitude as expressed here to be problematic in many ways

replies(1): >>42569521 #
alphan0n ◴[] No.42569521{3}[source]
CommonCrawl archives robots.txt

For convenience, you can view the extracted data here:

https://pastebin.com/VSHMTThJ

You are welcome to verify for yourself by searching for “wiki.diasporafoundation.org/robots.txt” in the CommonCrawl index here:

https://index.commoncrawl.org/

The index contains a file name that you can append to the CommonCrawl url to download the archive and view.

More detailed information on downloading archives here:

https://commoncrawl.org/get-started

From September to December, the robots.txt at wiki.diasporafoundation.org contained this, and only this:

>User-agent: * >Disallow: /w/

Apologies for my attitude, I find defenders of the dishonest in the face of clear evidence even more problematic.

replies(1): >>42574617 #
shkkmo ◴[] No.42574617{4}[source]
Your attitude is inappropriate and violates the sitewide guidelines for discussion.
replies(1): >>42583213 #
alphan0n ◴[] No.42583213{5}[source]
There are currently two references to “Mangion-ing” OpenAI board members in this thread, several more from Reddit, based on the falsehoods being perpetrated by the author. Is this really someone you want to conspire with? Is calling this out more egregious than the witch hunt being organized here?
replies(1): >>42587233 #
1. shkkmo ◴[] No.42587233{6}[source]
"conspire" and "witch hunt", are not terms of productive discourse.

If you are legitimately trying to correct misinformation, your attitude, tone and language are counter productive. You would be much better seved by taking that energy and crafting an actually persuasive argument. You come across as unreasonable and unwilling to listen, not someone with a good grasp of the technical specifics.

I don't have a horse in the race. I'm fairly technical, but I did not find your arguments persuasive. This doesn't mean they are wrong, but it does mean that you didn't do a good job of explaining them.