I would guess directory listing? -But I'm an idiot, so any elucidation would be appreciated.
I would guess directory listing? -But I'm an idiot, so any elucidation would be appreciated.
On the other hand, I had to deploy Anubis for the SVN web interface for tug.org. SVN is way slower than Git (most pages take 5 seconds to load), and the server didn't even have basic caching enabled, but before last year, there weren't any issues. But starting early this year, the bots started scraping every revision, and since the repo is 20+ years old and has 300k files, there are a lot of pages to scrape. This was overloading the entire server, making every other service hosted there unusable. I tried adding caching and blocking some bad ASNs, but Anubis was (unfortunately) the only solution that seems to have worked.
So, I think that the main commonality is popular-ish sites with lots of pages that are computationally-expensive to generate.