←back to thread

Anubis Works

(xeiaso.net)
313 points evacchi | 1 comments | | HN request time: 0s | source
Show context
gyomu ◴[] No.43668594[source]
If you’re confused about what this is - it’s to prevent AI scraping.

> Anubis uses a proof-of-work challenge to ensure that clients are using a modern browser and are able to calculate SHA-256 checksums

https://anubis.techaro.lol/docs/design/how-anubis-works

This is pretty cool, I have a project or two that might benefit from it.

replies(2): >>43669511 #>>43671745 #
x3haloed ◴[] No.43669511[source]
I’ve been wondering to myself for many years now whether the web is for humans or machines. I personally can’t think of a good reason to specifically try to gate bots when it comes to serving content. Trying to post content or trigger actions could obviously be problematic under many circumstances.

But I find that when it comes to simple serving of content, human vs. bot is not usually what you’re trying to filter or block on. As long as a given client is not abusing your systems, then why do you care if the client is a human?

replies(8): >>43669544 #>>43669558 #>>43669572 #>>43670108 #>>43670208 #>>43670880 #>>43671272 #>>43676454 #
xboxnolifes ◴[] No.43669572[source]
> As long as a given client is not abusing your systems, then why do you care if the client is a human?

Well, that's the rub. The bots are abusing the systems. The bots are accessing the contents at rates thousands of times faster and more often than humans. The bots also have access patterns unlike your expected human audience (downloading gigabytes or terabytes of data multiples times, over and over).

And these bots aren't some being with rights. They're tools unleashed by humans. It's humans abusing the systems. These are anti-abuse measures.

replies(2): >>43669980 #>>43671277 #
immibis ◴[] No.43671277[source]
Then you look up their IP address's abuse contact, send an email and get them to either stop attacking you or get booted off the internet so they can't attack you.

And if that doesn't happen, you go to their ISP's ISP and get their ISP booted off the Internet.

Actual ISPs and hosting providers take abuse reports extremely seriously, mostly because they're terrified of getting kicked off by their ISP. And there's no end to that - just a chain of ISPs from them to you and you might end with convincing your ISP or some intermediary to block traffic from them. However, as we've seen recently, rules don't apply if enough money is involved. But I'm not sure if these shitty interim solutions come from ISPs ignoring abuse when money is involved, or from not knowing that abuse reporting is taken seriously to begin with.

Anyone know if it's legal to return a never-ending stream of /dev/urandom based on the user-agent?

replies(4): >>43671307 #>>43671495 #>>43671676 #>>43673964 #
sussmannbaka ◴[] No.43671307[source]
Please, read literally any article about the ongoing problem. The IPs are basically random, come from residential blocks, requests don’t reuse the same IP more than a bunch of times.
replies(1): >>43673025 #
1. immibis ◴[] No.43673025[source]
Are you sure that's AI? I get requests that are overtly from AI crawlers, and almost no other requests. Certainly all of the high-volume crawler-like requests overtly say that they're from crawlers.

And those residential proxy services cost their customer around $0.50/GB up to $20/GB. Do with that knowledge what you will.