←back to thread

646 points blendergeek | 6 comments | | HN request time: 0.642s | source | bottom
Show context
hartator ◴[] No.42725964[source]
There are already “infinite” websites like these on the Internet.

Crawlers (both AI and regular search) have a set number of pages they want to crawl per domain. This number is usually determined by the popularity of the domain.

Unknown websites will get very few crawls per day whereas popular sites millions.

Source: I am the CEO of SerpApi.

replies(9): >>42726093 #>>42726258 #>>42726572 #>>42727553 #>>42727737 #>>42727760 #>>42728210 #>>42728522 #>>42742537 #
diggan ◴[] No.42726093[source]
> There are already “infinite” websites like these on the Internet.

Cool. And how much of the software driving these websites is FOSS and I can download and run it for my own (popular enough to be crawled more than daily by multiple scrapers) website?

replies(2): >>42726322 #>>42726514 #
1. gruez ◴[] No.42726322[source]
Off the top of my head: https://everyuuid.com/

https://github.com/nolenroyalty/every-uuid

replies(2): >>42726420 #>>42732710 #
2. diggan ◴[] No.42726420[source]
Aren't those finite lists? How is a scraper (normal or LLM) supposed to "get stuck" on those?
replies(1): >>42726470 #
3. gruez ◴[] No.42726470[source]
even though 2^128 uuids is technically "finite", for all intents and purposes is infinite to a scraper.
replies(1): >>42728528 #
4. johnisgood ◴[] No.42732710[source]
How is that infinite if the last one is always the same? Am I misunderstanding this? I assumed it is almost like an infinite scroll or something.
replies(1): >>42733963 #
5. gruez ◴[] No.42733963[source]
Here's another site that does something similar (iterating over bitcoin private keys rather than uuids), but has separate pages and would theoretically catch a crawler:

https://allprivatekeys.com/all-bitcoin-private-keys-list

replies(1): >>42734832 #
6. johnisgood ◴[] No.42734832{3}[source]
503 :D