Nepenthes is a tarpit to catch AI web crawlers

(zadzmo.org)

Show context

hartator ◴[16 Jan 25 14:52 UTC] No.42725964[source]▶

There are already “infinite” websites like these on the Internet.

Crawlers (both AI and regular search) have a set number of pages they want to crawl per domain. This number is usually determined by the popularity of the domain.

Unknown websites will get very few crawls per day whereas popular sites millions.

Source: I am the CEO of SerpApi.

replies(9): >>42726093 #>>42726258 #>>42726572 #>>42727553 #>>42727737 #>>42727760 #>>42728210 #>>42728522 #>>42742537 #

diggan ◴[16 Jan 25 14:58 UTC] No.42726093[source]▶

>>42725964 #

> There are already “infinite” websites like these on the Internet.

Cool. And how much of the software driving these websites is FOSS and I can download and run it for my own (popular enough to be crawled more than daily by multiple scrapers) website?

replies(2): >>42726322 #>>42726514 #

1. gruez ◴[16 Jan 25 15:12 UTC] No.42726322[source]▶

>>42726093 #

Off the top of my head: https://everyuuid.com/

https://github.com/nolenroyalty/every-uuid

replies(2): >>42726420 #>>42732710 #

2. diggan ◴[16 Jan 25 15:19 UTC] No.42726420[source]▶

>>42726322 (TP) #

Aren't those finite lists? How is a scraper (normal or LLM) supposed to "get stuck" on those?

replies(1): >>42726470 #

3. gruez ◴[16 Jan 25 15:23 UTC] No.42726470[source]▶

>>42726420 #

even though 2^128 uuids is technically "finite", for all intents and purposes is infinite to a scraper.

replies(1): >>42728528 #

4. johnisgood ◴[17 Jan 25 00:29 UTC] No.42732710[source]▶

>>42726322 (TP) #

How is that infinite if the last one is always the same? Am I misunderstanding this? I assumed it is almost like an infinite scroll or something.

replies(1): >>42733963 #

5. gruez ◴[17 Jan 25 04:06 UTC] No.42733963[source]▶

>>42732710 #

Here's another site that does something similar (iterating over bitcoin private keys rather than uuids), but has separate pages and would theoretically catch a crawler:

https://allprivatekeys.com/all-bitcoin-private-keys-list

replies(1): >>42734832 #

6. johnisgood ◴[17 Jan 25 06:57 UTC] No.42734832{3}[source]▶

>>42733963 #

503 :D

↑