Ban me at the IP level if you don't like me

For my personal site, I let the bots do whatever they want—it's a static site with like 12 pages, so they'd essentially need to saturate the (gigabit) network before causing me any problems.

On the other hand, I had to deploy Anubis for the SVN web interface for tug.org. SVN is way slower than Git (most pages take 5 seconds to load), and the server didn't even have basic caching enabled, but before last year, there weren't any issues. But starting early this year, the bots started scraping every revision, and since the repo is 20+ years old and has 300k files, there are a lot of pages to scrape. This was overloading the entire server, making every other service hosted there unusable. I tried adding caching and blocking some bad ASNs, but Anubis was (unfortunately) the only solution that seems to have worked.

So, I think that the main commonality is popular-ish sites with lots of pages that are computationally-expensive to generate.