←back to thread

646 points blendergeek | 1 comments | | HN request time: 0.2s | source
Show context
hartator ◴[] No.42725964[source]
There are already “infinite” websites like these on the Internet.

Crawlers (both AI and regular search) have a set number of pages they want to crawl per domain. This number is usually determined by the popularity of the domain.

Unknown websites will get very few crawls per day whereas popular sites millions.

Source: I am the CEO of SerpApi.

replies(9): >>42726093 #>>42726258 #>>42726572 #>>42727553 #>>42727737 #>>42727760 #>>42728210 #>>42728522 #>>42742537 #
diggan ◴[] No.42726093[source]
> There are already “infinite” websites like these on the Internet.

Cool. And how much of the software driving these websites is FOSS and I can download and run it for my own (popular enough to be crawled more than daily by multiple scrapers) website?

replies(2): >>42726322 #>>42726514 #
1. hartator ◴[] No.42726514[source]
Every not found pages that don’t return a 404 http header is basically an infinite trap.

It’s useless to do this though as all crawlers have a way to handle this. It’s very crawler 101.