←back to thread

556 points campuscodi | 1 comments | | HN request time: 0.211s | source
Show context
kevincox ◴[] No.41865353[source]
I dislike advice of whitelisting specific readers by user-agent. Not only is this endless manual work that will only solve the problem for a subset of users but it also is easy to bypass by malicious actors. My recommendation would be to create a page rule that disables bot blocking for your feeds. This will fix the problem for all readers with no ongoing maintenance.

If you are worried about DoS attacks that may hammer on your feeds then you can use the same configuration rule to ignore the query string for cache keys (if your feed doesn't use query strings) and overriding the caching settings if your server doesn't set the proper headers. This way Cloudflare will cache your feed and you can serve any number of visitors without putting load onto your origin.

As for Cloudflare fixing the defaults, it seems unlikely to happen. It has been broken for years, Cloudflare's own blog is affected. They have been "actively working" on fixing it for at least 2 years according to their VP of product: https://news.ycombinator.com/item?id=33675847

replies(3): >>41867168 #>>41868163 #>>41869223 #
1. benregenspan ◴[] No.41869223[source]
AI crawlers have changed the picture significantly and in my opinion are a much bigger threat to the open web than Cloudflare. The training arms race has drastically increased bot traffic, and the value proposition behind that bot traffic has inverted. Previously many site operators could rely on the average automated request being net-beneficial to the site and its users (outside of scattered, time-limited DDoS attacks) but now most of these requests represent value extraction. Combine this with a seemingly related increase in high-volume bots that don't respect robots.txt and don't set a useful User-Agent, and using a heavy-handed firewall becomes a much easier business decision, even if it may target some desirable traffic (like valid RSS requests).