(zadzmo.org)

714 points blendergeek | 1 comments | 16 Jan 25 13:57 UTC | HN request time: 0.433s | source

Show context

hartator ◴[16 Jan 25 14:52 UTC] No.42725964[source]▶

There are already “infinite” websites like these on the Internet.

Crawlers (both AI and regular search) have a set number of pages they want to crawl per domain. This number is usually determined by the popularity of the domain.

Unknown websites will get very few crawls per day whereas popular sites millions.

Source: I am the CEO of SerpApi.

replies(9): >>42726093 #>>42726258 #>>42726572 #>>42727553 #>>42727737 #>>42727760 #>>42728210 #>>42728522 #>>42742537 #

diggan ◴[16 Jan 25 14:58 UTC] No.42726093[source]▶

>>42725964 #

> There are already “infinite” websites like these on the Internet.

Cool. And how much of the software driving these websites is FOSS and I can download and run it for my own (popular enough to be crawled more than daily by multiple scrapers) website?

replies(2): >>42726322 #>>42726514 #

gruez ◴[16 Jan 25 15:12 UTC] No.42726322[source]▶

>>42726093 #

Off the top of my head: https://everyuuid.com/

https://github.com/nolenroyalty/every-uuid

replies(2): >>42726420 #>>42732710 #

diggan ◴[16 Jan 25 15:19 UTC] No.42726420[source]▶

>>42726322 #

Aren't those finite lists? How is a scraper (normal or LLM) supposed to "get stuck" on those?

replies(1): >>42726470 #

1. gruez ◴[16 Jan 25 15:23 UTC] No.42726470[source]▶

>>42726420 #

even though 2^128 uuids is technically "finite", for all intents and purposes is infinite to a scraper.

replies(1): >>42728528 #

↑

Nepenthes is a tarpit to catch AI web crawlers