What are you protecting cloudflare?
Also they show those captchas when going to robots.txt... unbelievable.
Cloudflare's Browser Intergrity Check/Verification/Challenge feature used by many websites, is denying access to users of non-mainstream browsers like Pale Moon.
Users reports began on January 31:
https://forum.palemoon.org/viewtopic.php?f=3&t=32045
This situation occurs at least once a year, and there is no easy way to contact Cloudflare. Their "Submit feedback" tool yields no results. A Cloudflare Community topic was flagged as "spam" by members of that community and was promptly locked with no real solution, and no official response from Cloudflare:
https://community.cloudflare.com/t/access-denied-to-pale-moo...
Partial list of other browsers that are being denied access:
Falkon, SeaMonkey, IceCat, Basilisk.
Hacker News 2022 post about the same issue, which brought attention and had Cloudflare quickly patching the issue:
https://news.ycombinator.com/item?id=31317886
A Cloudflare product manager declared back then: "...we do not want to be in the business of saying one browser is more legitimate than another."
As of now, there is no official response from Cloudflare. Internet access is still denied by their tool.
What are you protecting cloudflare?
Also they show those captchas when going to robots.txt... unbelievable.
For example even Cloudflare hasn't configure their official blog's RSS feed properly. My feed reader (running in a DigitalOcean datacenter) hasn't been able to access it since 2021 (403 every time even though backed off to checking weekly). This is a cachable endpoint with public data intended for robots. If they can't configure their own product correctly for their official blog how can they expect other sites to?
If you are writing some kind of malicious crawler that doesn't care about rate-limiting, and wants to scan as many sites as possible for the most vulnerable to get a list together to hack, you will scan robots.txt because that is the file that tells robots NOT to index these pages. I never use a robots.txt for some kind of security through obscurity. I've only ever bothered with robots.txt to make SEO easier when you can control a virtual subdirectory of a site, to block things like repeated content with alternative layouts (to avoid duplicate content issues), or to get a section of a website to drop out of SERPs for discontinued sections of a site.
This is not relevant because Cloudflare will cache it so it never hits your origin. Unless they are adding random URL parameters (which you can teach Cloudflare to ignore but I don't think that should be a default configuration).
Again, I think you are correct with more sane defaults, but I don't know if you've ever dealt with a network admin or web administrator that hasn't dealt with server-side caching vs. browser caching, but it most definitely would end up with Cloudflare losing sales because people misunderstood how things work. Maybe I'm jaded, at 45, but I feel like most people don't even know to look at headers by default when they feel they hit a caching issue. I don't think it's based on age, I think it's based on being interested in the technology and wanting to learn all about it. Mostly developers that got into it for the love of technology, versus those that got into it because it was high paying and they understood Excel, or learned to build a simple website early in life, so everyone told them to get into software.