←back to thread

Anubis Works

(xeiaso.net)
313 points evacchi | 1 comments | | HN request time: 0.199s | source
Show context
prologic ◴[] No.43669126[source]
I've read about Anubis, cool project! Unfortunately, as pointed out in the comments, requires your site's visitors to have Javascript™ enabled. This is totally fine for sites that require Javascript™ anyway to enhance the user experience, but not so great for static sites and such that require no JS at all.

I built my own solution that effectively blocks these "Bad Bots" at the network level. I effectively block the entirety of several large "Big Tech / Big LLM" networks entirely at the ASN (BGP) by utilizing MaxMind's database and a custom WAF and Reverse Proxy I put together.

replies(4): >>43669138 #>>43669313 #>>43669553 #>>43670788 #
xyzzy_plugh ◴[] No.43669553[source]
A significant portion of the bot traffic TFA is designed to handle originates from consumer/residential space. Sure, there are ASN games being played alongside reputation fraud, but it's very hard to combat. A cursory investigation of our logs showed these bots (which make ~1 request from a given residential IP) are likely in ranges that our real human users occupy as well.

Simply put you risk blocking legitimate traffic. This solution does as well but for most humans the actual risk is much lower.

As much as I'd love to not need JavaScript and to support users who run with it disabled, I've never once had a customer or end user complain about needing JavaScript enabled.

It is an incredible vocal minority who disapprove of requiring JavaScript, the majority of whom, upon encountering a site for which JavaScript is required, simply enable it. I'd speculate that, even then, only a handful ever release a defeated sigh.

replies(1): >>43669585 #
prologic ◴[] No.43669585[source]
This is true. I had some bad actors from the ComCast Network at one point. And unfortunately also valid human users of some of my "things". So I opted not to block the ComCast ASN at that point.
replies(2): >>43669594 #>>43669605 #
1. prologic ◴[] No.43669594[source]
I would be interested to hear of any other solutions that guarantee to either identity or block non-Human traffic. In the "small web" and self-hosting, we typically don't really want Crawlers, and other similar software hitting our services, because often the software is either buggy in the first place (Example: Runaway Claude Bot) or you don't want your sites indexed by them in the first place.