Can someone point out the authors robots.txt where the offense is taking place?
I’m just seeing:
https://pod.geraspora.de/robots.txt
Which allows all user agents.
*The discourse server does not disallow the offending bots mentioned in their post:
https://discourse.diasporafoundation.org/robots.txt
Nor does the wiki:
https://wiki.diasporafoundation.org/robots.txt
No robots.txt at all on the homepage:
https://diasporafoundation.org/robots.txt
the robots.txt on the wiki is no longer what it was when the bot accessed it. primarily because I clean up my stuff afterwards, and the history is now completely inaccessible to non-authenticated users, so there's no need to maintain my custom robots.txt.
notice how there's a period of almost two months with no new index, just until a week before I posted this? I wonder what might have caused this!!1
(and it's not like they only check robots.txt once a month or so. https://stuff.overengineer.dev/stash/2024-12-30-dfwiki-opena...)
:/ Common Crawl archives robots.txt and indicates that the file at wiki.diasporafoundation.org was unchanged in November and December from what it is now. Unchanged from September, in fact.
https://pastebin.com/VSHMTThJ
https://index.commoncrawl.org/