Most active commenters
  • prologic(7)

←back to thread

Anubis Works

(xeiaso.net)
313 points evacchi | 15 comments | | HN request time: 0.596s | source | bottom
1. prologic ◴[] No.43669126[source]
I've read about Anubis, cool project! Unfortunately, as pointed out in the comments, requires your site's visitors to have Javascript™ enabled. This is totally fine for sites that require Javascript™ anyway to enhance the user experience, but not so great for static sites and such that require no JS at all.

I built my own solution that effectively blocks these "Bad Bots" at the network level. I effectively block the entirety of several large "Big Tech / Big LLM" networks entirely at the ASN (BGP) by utilizing MaxMind's database and a custom WAF and Reverse Proxy I put together.

replies(4): >>43669138 #>>43669313 #>>43669553 #>>43670788 #
2. jadbox ◴[] No.43669138[source]
How do you know it's an LLM and not a VPN? How do you use this MaxMind's database to isolate LLMs?
replies(1): >>43669162 #
3. prologic ◴[] No.43669162[source]
I don't distinguish actually. There are two things I do normally:

- Block Bad Bots. There's a simple text file called `bad_bots.txt` - Block Bad ASNs. There's a simple text file called `bad_asns.txt`

There's also another for blocking IP(s) and IP-ranges called `bad_ips.txt` but it's often more effective to block an much larger range of IPs (At the ASN level).

To give you an concrete idea, here's some examples:

$ cat etc/caddy/waf/bad_asns.txt # CHINANET-BACKBONE No.31,Jin-rong Street, CN # Why: DDoS 4134

# CHINA169-BACKBONE CHINA UNICOM China169 Backbone, CN # Why: DDoS 4837

# CHINAMOBILE-CN China Mobile Communications Group Co., Ltd., CN # Why: DDoS 9808

# FACEBOOK, US # Why: Bad Bots 32934

# Alibaba, CN # Why: Bad Bots 45102

# Why: Bad Bots 28573

4. Cyphase ◴[] No.43669313[source]
For anyone wondering, Oracle holds the trademark for "JavaScript": https://javascript.tm/
replies(1): >>43669322 #
5. prologic ◴[] No.43669322[source]
Which arguably they should let go of
6. xyzzy_plugh ◴[] No.43669553[source]
A significant portion of the bot traffic TFA is designed to handle originates from consumer/residential space. Sure, there are ASN games being played alongside reputation fraud, but it's very hard to combat. A cursory investigation of our logs showed these bots (which make ~1 request from a given residential IP) are likely in ranges that our real human users occupy as well.

Simply put you risk blocking legitimate traffic. This solution does as well but for most humans the actual risk is much lower.

As much as I'd love to not need JavaScript and to support users who run with it disabled, I've never once had a customer or end user complain about needing JavaScript enabled.

It is an incredible vocal minority who disapprove of requiring JavaScript, the majority of whom, upon encountering a site for which JavaScript is required, simply enable it. I'd speculate that, even then, only a handful ever release a defeated sigh.

replies(1): >>43669585 #
7. prologic ◴[] No.43669585[source]
This is true. I had some bad actors from the ComCast Network at one point. And unfortunately also valid human users of some of my "things". So I opted not to block the ComCast ASN at that point.
replies(2): >>43669594 #>>43669605 #
8. prologic ◴[] No.43669594{3}[source]
I would be interested to hear of any other solutions that guarantee to either identity or block non-Human traffic. In the "small web" and self-hosting, we typically don't really want Crawlers, and other similar software hitting our services, because often the software is either buggy in the first place (Example: Runaway Claude Bot) or you don't want your sites indexed by them in the first place.
9. xyzzy_plugh ◴[] No.43669605{3}[source]
Exactly. We've all been down this rabbit hole, collectively, and that's why Anubis has taken off. It works shockingly well.
replies(1): >>43670331 #
10. prologic ◴[] No.43670331{4}[source]
I was planning on building a Caddy module for Anubis actually. Is anyone else interested in this?
replies(2): >>43673454 #>>43688619 #
11. runxiyu ◴[] No.43670788[source]
Do you have a link to your own solution?
replies(2): >>43672067 #>>43688624 #
12. prologic ◴[] No.43672067[source]
Not yet unfortunately. But if you're interested, please reach out! I currently run it in a 3-region GeoDNS setup with my self-hosted infra.
13. vinibrito ◴[] No.43673454{5}[source]
Yes, I would! I love Caddy's set and forget nature, and with this it wouldn't be different. Especially if it could be triggered conditionally, for example based on server load or a flood being detected.
14. JsonCameron ◴[] No.43688619{5}[source]
see https://github.com/TecharoHQ/anubis/issues/16

There is going to be a pretty big refactor soon, but once that's done we plan on crushing this out.

15. JsonCameron ◴[] No.43688624[source]
I have a pretty similar one. (Works off of the same concept) https://github.com/JasonLovesDoggo/caddy-defender if you're curious. Keep in mind this will not protect you against residential IP scraping.