Anubis Works

(xeiaso.net)

319 points evacchi | 1 comments | 12 Apr 25 22:32 UTC | HN request time: 0.954s | source

Show context

gyomu ◴[12 Apr 25 22:59 UTC] No.43668594[source]▶

If you’re confused about what this is - it’s to prevent AI scraping.

> Anubis uses a proof-of-work challenge to ensure that clients are using a modern browser and are able to calculate SHA-256 checksums

https://anubis.techaro.lol/docs/design/how-anubis-works

This is pretty cool, I have a project or two that might benefit from it.

replies(2): >>43669511 #>>43671745 #

x3haloed ◴[13 Apr 25 02:17 UTC] No.43669511[source]▶

>>43668594 #

I’ve been wondering to myself for many years now whether the web is for humans or machines. I personally can’t think of a good reason to specifically try to gate bots when it comes to serving content. Trying to post content or trigger actions could obviously be problematic under many circumstances.

But I find that when it comes to simple serving of content, human vs. bot is not usually what you’re trying to filter or block on. As long as a given client is not abusing your systems, then why do you care if the client is a human?

replies(8): >>43669544 #>>43669558 #>>43669572 #>>43670108 #>>43670208 #>>43670880 #>>43671272 #>>43676454 #

t-writescode ◴[13 Apr 25 02:24 UTC] No.43669544[source]▶

>>43669511 #

> I personally can’t think of a good reason to specifically try to gate bots

There's been numerous posts on HN about people getting slammed, to the tune of many, many dollars and terabytes of data from bots, especially LLM scrapers, burning bandwidth and increasing server-running costs.

replies(1): >>43669560 #

ronsor ◴[13 Apr 25 02:28 UTC] No.43669560[source]▶

>>43669544 #

I'm genuinely skeptical that those are all real LLM scrapers. For one, a lot of content is in CommonCrawl and AI companies don't want to redo all that work when they can get some WARC files from AWS.

I'm largely suspecting that these are mostly other bots pretending to be LLM scrapers. Does anyone even check if the bots' IP ranges belong to the AI companies?

replies(4): >>43669584 #>>43669780 #>>43669996 #>>43670176 #

1. anonym29 ◴[13 Apr 25 03:12 UTC] No.43669780[source]▶

>>43669560 #

>Does anyone even check if the bots' IP ranges belong to the AI companies?

Sounds like a fun project for an AbuseIPDB contributor. Could look for fake Googlebots / Bingbots, etc, too.

↑