←back to thread

1343 points Hold-And-Modify | 1 comments | | HN request time: 0.219s | source

Hello.

Cloudflare's Browser Intergrity Check/Verification/Challenge feature used by many websites, is denying access to users of non-mainstream browsers like Pale Moon.

Users reports began on January 31:

https://forum.palemoon.org/viewtopic.php?f=3&t=32045

This situation occurs at least once a year, and there is no easy way to contact Cloudflare. Their "Submit feedback" tool yields no results. A Cloudflare Community topic was flagged as "spam" by members of that community and was promptly locked with no real solution, and no official response from Cloudflare:

https://community.cloudflare.com/t/access-denied-to-pale-moo...

Partial list of other browsers that are being denied access:

Falkon, SeaMonkey, IceCat, Basilisk.

Hacker News 2022 post about the same issue, which brought attention and had Cloudflare quickly patching the issue:

https://news.ycombinator.com/item?id=31317886

A Cloudflare product manager declared back then: "...we do not want to be in the business of saying one browser is more legitimate than another."

As of now, there is no official response from Cloudflare. Internet access is still denied by their tool.

Show context
zlagen ◴[] No.42953898[source]
I'm using chrome on linux and noticed that this year cloudflare is very agressive in showing the "Verify you are a human" box. Now a lot of sites that use cloudflare show it and once you solve the challenge it shows it again after 30 minutes!

What are you protecting cloudflare?

Also they show those captchas when going to robots.txt... unbelievable.

replies(17): >>42954054 #>>42954451 #>>42954784 #>>42954904 #>>42955172 #>>42955240 #>>42955949 #>>42956893 #>>42957248 #>>42957383 #>>42957406 #>>42957408 #>>42957698 #>>42957738 #>>42957782 #>>42958180 #>>42960458 #
progmetaldev ◴[] No.42954784[source]
Whoever configures the Cloudflare rules should be turning off the firewall for things like robots.txt and sitemap.xml. You can still use caching for those resources to prevent them becoming a front door to DDoS.
replies(1): >>42956791 #
kevincox ◴[] No.42956791[source]
It seems like common cases like this should be handled correctly by default. These are cachable requests intended for robots. Sure, it would be nice if webmasters configure it but I suspect a tiny minority does.

For example even Cloudflare hasn't configure their official blog's RSS feed properly. My feed reader (running in a DigitalOcean datacenter) hasn't been able to access it since 2021 (403 every time even though backed off to checking weekly). This is a cachable endpoint with public data intended for robots. If they can't configure their own product correctly for their official blog how can they expect other sites to?

replies(1): >>42957386 #
progmetaldev ◴[] No.42957386[source]
I agree, but I also somewhat understand. Some people will actually pay more per month for Cloudflare than their own hosting. The Cloudflare Pro plan is $20/month USD. Some sites wouldn't be able to handle the constant requests for robots.txt, just because bots don't necessarily respect cache headers (if they are even configured for robots.txt), and the sheer number of bots that look at robots.txt and will ignore a caching header are too numerous.

If you are writing some kind of malicious crawler that doesn't care about rate-limiting, and wants to scan as many sites as possible for the most vulnerable to get a list together to hack, you will scan robots.txt because that is the file that tells robots NOT to index these pages. I never use a robots.txt for some kind of security through obscurity. I've only ever bothered with robots.txt to make SEO easier when you can control a virtual subdirectory of a site, to block things like repeated content with alternative layouts (to avoid duplicate content issues), or to get a section of a website to drop out of SERPs for discontinued sections of a site.

replies(1): >>42957422 #
kevincox ◴[] No.42957422[source]
> sheer number of bots that look at robots.txt and will ignore a caching header

This is not relevant because Cloudflare will cache it so it never hits your origin. Unless they are adding random URL parameters (which you can teach Cloudflare to ignore but I don't think that should be a default configuration).

replies(1): >>42957503 #
1. progmetaldev ◴[] No.42957503[source]
The thing is, it won't do that by default. You have to enable caching currently, when creating a new account. I use a service that detects if a website is still running, and it does this by using a certain URL parameter to bypass the cache.

Again, I think you are correct with more sane defaults, but I don't know if you've ever dealt with a network admin or web administrator that hasn't dealt with server-side caching vs. browser caching, but it most definitely would end up with Cloudflare losing sales because people misunderstood how things work. Maybe I'm jaded, at 45, but I feel like most people don't even know to look at headers by default when they feel they hit a caching issue. I don't think it's based on age, I think it's based on being interested in the technology and wanting to learn all about it. Mostly developers that got into it for the love of technology, versus those that got into it because it was high paying and they understood Excel, or learned to build a simple website early in life, so everyone told them to get into software.