←back to thread

Anubis Works

(xeiaso.net)
313 points evacchi | 8 comments | | HN request time: 0.001s | source | bottom
Show context
gyomu ◴[] No.43668594[source]
If you’re confused about what this is - it’s to prevent AI scraping.

> Anubis uses a proof-of-work challenge to ensure that clients are using a modern browser and are able to calculate SHA-256 checksums

https://anubis.techaro.lol/docs/design/how-anubis-works

This is pretty cool, I have a project or two that might benefit from it.

replies(2): >>43669511 #>>43671745 #
x3haloed ◴[] No.43669511[source]
I’ve been wondering to myself for many years now whether the web is for humans or machines. I personally can’t think of a good reason to specifically try to gate bots when it comes to serving content. Trying to post content or trigger actions could obviously be problematic under many circumstances.

But I find that when it comes to simple serving of content, human vs. bot is not usually what you’re trying to filter or block on. As long as a given client is not abusing your systems, then why do you care if the client is a human?

replies(8): >>43669544 #>>43669558 #>>43669572 #>>43670108 #>>43670208 #>>43670880 #>>43671272 #>>43676454 #
xboxnolifes ◴[] No.43669572[source]
> As long as a given client is not abusing your systems, then why do you care if the client is a human?

Well, that's the rub. The bots are abusing the systems. The bots are accessing the contents at rates thousands of times faster and more often than humans. The bots also have access patterns unlike your expected human audience (downloading gigabytes or terabytes of data multiples times, over and over).

And these bots aren't some being with rights. They're tools unleashed by humans. It's humans abusing the systems. These are anti-abuse measures.

replies(2): >>43669980 #>>43671277 #
bbor ◴[] No.43669980[source]
Well, that's the meta-rub: if they're abusing, block abuse. Rate limits are far simpler, anyway!

In the interest of bringing the AI bickering to HN: I think one could accurately characterize "block bots just in case they choose to request too much data" as discrimination! Robots of course don't have any rights so it's not wrong, but it certainly might be unwise.

replies(1): >>43670056 #
inejge ◴[] No.43670056[source]
> Rate limits are far simpler, anyway!

Not when the bots are actively programmed to thwart them by using far-flung IP address carousels, request pacing, spoofed user agents and similar techniques. It's open war these days.

replies(1): >>43670360 #
parineum ◴[] No.43670360[source]
Request pacing sounds intentionally unabusive.
replies(2): >>43670904 #>>43670947 #
1. j16sdiz ◴[] No.43670947[source]
It is not bringing down your server, but they are taking 80%+ of your bandwidth budget. Does this count as abuse?
replies(2): >>43671160 #>>43671288 #
2. ithkuil ◴[] No.43671160[source]
Isn't that what a rate limiter would address?
replies(1): >>43671256 #
3. mkl ◴[] No.43671256[source]
Not when the traffic is coming from 10s of thousands of IP addresses, with very few requests from each one: https://drewdevault.com/2025/03/17/2025-03-17-Stop-externali...
replies(2): >>43671556 #>>43679698 #
4. immibis ◴[] No.43671288[source]
Are you at a hoster with extortionately expensive bandwidth, such as AWS, GCP, or Azure?
5. KronisLV ◴[] No.43671556{3}[source]
That very much reads like the rant of someone who is sick and tired of the state of things.

I’m afraid that it doesn’t change anything in of itself and any sorts of solutions to only allow the users that you’re okay with are what’s direly needed all across the web.

Though reading about the people trying to mine crypto on a CI solution, it feels that sometimes it won’t just be LLM scrapers that you need to protect against but any number of malicious people.

At that point, you might as well run an invite only community.

replies(1): >>43671703 #
6. bayindirh ◴[] No.43671703{4}[source]
Source Hut implemented Anubis, and it works so well. I mostly never see the waiting screen. And after it whitelists me for a very long time, so I work without any limitations.
replies(1): >>43671761 #
7. KronisLV ◴[] No.43671761{5}[source]
That’s great to hear and Anubis seems cool!

I just worry about the idea of running public/free services on the web, due to the potential for misuse and bad actors, though making things paid also seems sensible, e.g. what was linked: https://man.sr.ht/ops/builds.sr.ht-migration.md

8. ithkuil ◴[] No.43679698{3}[source]
ok, but my answer was about was how to react to request pacing.

If the abuser is using request pacing to make less request then that's making the abuser less abusive. If you're still complaining that request pacing is not pacing the requests down enough because the pacing is designed to just not bring your server down and instead make you consume money, then you can counteract that just by tuning the rate limiting even further down.

The 10s of thousands distinct IP address is another (and perfectly valid) issue, but it was not the point I answered to.