←back to thread

770 points ta988 | 1 comments | | HN request time: 0s | source
Show context
alphan0n ◴[] No.42551628[source]
Can someone point out the authors robots.txt where the offense is taking place?

I’m just seeing: https://pod.geraspora.de/robots.txt

Which allows all user agents.

*The discourse server does not disallow the offending bots mentioned in their post:

https://discourse.diasporafoundation.org/robots.txt

Nor does the wiki:

https://wiki.diasporafoundation.org/robots.txt

No robots.txt at all on the homepage:

https://diasporafoundation.org/robots.txt

replies(1): >>42552224 #
denschub ◴[] No.42552224[source]
the robots.txt on the wiki is no longer what it was when the bot accessed it. primarily because I clean up my stuff afterwards, and the history is now completely inaccessible to non-authenticated users, so there's no need to maintain my custom robots.txt.
replies(1): >>42553096 #
alphan0n ◴[] No.42553096[source]
https://web.archive.org/web/20240101000000*/https://wiki.dia...
replies(1): >>42553180 #
denschub ◴[] No.42553180[source]
notice how there's a period of almost two months with no new index, just until a week before I posted this? I wonder what might have caused this!!1

(and it's not like they only check robots.txt once a month or so. https://stuff.overengineer.dev/stash/2024-12-30-dfwiki-opena...)

replies(1): >>42553842 #
alphan0n ◴[] No.42553842[source]
:/ Common Crawl archives robots.txt and indicates that the file at wiki.diasporafoundation.org was unchanged in November and December from what it is now. Unchanged from September, in fact.

https://pastebin.com/VSHMTThJ

https://index.commoncrawl.org/

replies(2): >>42556374 #>>42556405 #
1. ◴[] No.42556405[source]