Most active commenters

    ←back to thread

    556 points campuscodi | 12 comments | | HN request time: 0.001s | source | bottom
    1. kevincox ◴[] No.41865353[source]
    I dislike advice of whitelisting specific readers by user-agent. Not only is this endless manual work that will only solve the problem for a subset of users but it also is easy to bypass by malicious actors. My recommendation would be to create a page rule that disables bot blocking for your feeds. This will fix the problem for all readers with no ongoing maintenance.

    If you are worried about DoS attacks that may hammer on your feeds then you can use the same configuration rule to ignore the query string for cache keys (if your feed doesn't use query strings) and overriding the caching settings if your server doesn't set the proper headers. This way Cloudflare will cache your feed and you can serve any number of visitors without putting load onto your origin.

    As for Cloudflare fixing the defaults, it seems unlikely to happen. It has been broken for years, Cloudflare's own blog is affected. They have been "actively working" on fixing it for at least 2 years according to their VP of product: https://news.ycombinator.com/item?id=33675847

    replies(3): >>41867168 #>>41868163 #>>41869223 #
    2. vaylian ◴[] No.41867168[source]
    I don't know if cloudflare offers it, but whitelisting the URL of the RSS feed would be much more effective than filtering user agents.
    replies(3): >>41867185 #>>41868217 #>>41869916 #
    3. derkades ◴[] No.41867185[source]
    Yes it supports it, and I think that's what the parent comment was all about
    replies(1): >>41867257 #
    4. BiteCode_dev ◴[] No.41867257{3}[source]
    Specifically, whitelisting the URL for the bot protection, but not the cache, so that you are still somewhat protected against adversarial use.
    replies(1): >>41868789 #
    5. a-french-anon ◴[] No.41868163[source]
    And for those of us using sfeed, the default UA is Curl's.
    6. ◴[] No.41868217[source]
    7. londons_explore ◴[] No.41868789{4}[source]
    An adversary can easily send no-cache headers to bust the cache.
    replies(1): >>41868869 #
    8. acdha ◴[] No.41868869{5}[source]
    The CDN can choose whether to honor those. That hasn’t been an effective adversarial technique since the turn of the century.
    replies(1): >>41870197 #
    9. benregenspan ◴[] No.41869223[source]
    AI crawlers have changed the picture significantly and in my opinion are a much bigger threat to the open web than Cloudflare. The training arms race has drastically increased bot traffic, and the value proposition behind that bot traffic has inverted. Previously many site operators could rely on the average automated request being net-beneficial to the site and its users (outside of scattered, time-limited DDoS attacks) but now most of these requests represent value extraction. Combine this with a seemingly related increase in high-volume bots that don't respect robots.txt and don't set a useful User-Agent, and using a heavy-handed firewall becomes a much easier business decision, even if it may target some desirable traffic (like valid RSS requests).
    10. jks ◴[] No.41869916[source]
    Yes, you can do it with a "page rule", which the parent comment mentioned. The CloudFlare free tier has a budget of three page rules, which might mean that you have to bundle all your rss feeds in one folder so they share a path prefix.
    11. londons_explore ◴[] No.41870197{6}[source]
    does cloudflare give such an option? Even for non-paid accounts?
    replies(1): >>41878921 #
    12. acdha ◴[] No.41878921{7}[source]
    They ignore request cache control headers, I believe unconditionally so you’d have to disable caching for the endpoints which clients are allowed to request uncached.