←back to thread

597 points classichasclass | 1 comments | | HN request time: 0.001s | source
Show context
kldg ◴[] No.45020146[source]
What is the commonality between websites severely affected by bots? I run web server from home for years on .com TLD, is high-ish in Google site index for relevant keywords, and do not have any exotic protections against bots either on router or server (though I did make an attempt at counting bots, for curiosity). I get very frequent port scans, and they usually grab the index page, but only rarely follow dynamically-loaded links. I don't even really think about bots because there is no noticeable impact either when I ran server on Apache 2, and now with multiple websites run using Axum.

I would guess directory listing? -But I'm an idiot, so any elucidation would be appreciated.

replies(1): >>45021266 #
1. gucci-on-fleek ◴[] No.45021266[source]
For my personal site, I let the bots do whatever they want—it's a static site with like 12 pages, so they'd essentially need to saturate the (gigabit) network before causing me any problems.

On the other hand, I had to deploy Anubis for the SVN web interface for tug.org. SVN is way slower than Git (most pages take 5 seconds to load), and the server didn't even have basic caching enabled, but before last year, there weren't any issues. But starting early this year, the bots started scraping every revision, and since the repo is 20+ years old and has 300k files, there are a lot of pages to scrape. This was overloading the entire server, making every other service hosted there unusable. I tried adding caching and blocking some bad ASNs, but Anubis was (unfortunately) the only solution that seems to have worked.

So, I think that the main commonality is popular-ish sites with lots of pages that are computationally-expensive to generate.