←back to thread

Cloudflare.com's Robots.txt

(www.cloudflare.com)

145 points sans_souse | 1 comments | 17 Nov 24 12:39 UTC | HN request time: 0.203s | source

Show context

seanwilson ◴[17 Nov 24 16:08 UTC] No.42164898[source]▶

>>42163883 (OP) #

I have an ASCII art Easter egg like this in an SEO product I made. :)

https://www.checkbot.io/robots.txt

I should probably add this SEO tip too because the purpose of robots.txt is confusing: If you want to remove/deindex a page from Google search, you counterintuitively need to allow the page to be crawled in the robots.txt file, and then add a noindex response header or noindex meta tag to the page. This way the crawler gets to see the noindex instruction. Robots.txt controls which pages can be crawled, not which pages can be indexed.

replies(1): >>42165078 #

1. dazc ◴[17 Nov 24 16:38 UTC] No.42165078[source]▶

The consequences of robots.txt misuse can also be disastrous for a regular site. For example, I've seen instances where multiple warnings of 'page indexed but blocked by robots.txt' have led to sites being severely down-ranked as a consequence.

My assumption being that search engines don't want to be listing too many pages that everyone can read and they can not.