←back to thread

556 points campuscodi | 1 comments | | HN request time: 0.216s | source
Show context
whs ◴[] No.41867572[source]
My company runs a tech news website. We offer RSS feed as any Drupal website would, which content farm just scrape our RSS feed to rehost our content in full. This is usually fine for us - the content is CC-licensed and they do post the correct source. But they run thousands of different WordPress instances on the same IP and they individually fetch the feed.

In the end we had to use Cloudflare to rate limit the RSS endpoint.

replies(2): >>41868962 #>>41869978 #
1. kevincox ◴[] No.41868962[source]
> In the end we had to use Cloudflare to rate limit the RSS endpoint.

I think this is fine. You are solving a specific problem and still allowing some traffic. The problem with the Cloudflare default settings is that they block all requests leading to users failing to get any updates even when fetching the feed at a reasonable rate.

BTW in this case another solution may just be to configure proper caching headers. Even if you only cache for 5min at a time that will be at most 1 request every 5min per Cloudflare caching location (I don't know the exact configuration but typically use ~5 locations per origin, so that would be only 1req/min which is trivial load and will handle both these inconsiderate scrapers and regular users. You can also configure all fetches to come from a single location and then you would only need to actually serve the feed once per 5min)