You’ve got to remember that google/bing do not index the internet entire. Part of their magic is selectively indexing only a tiny sliver and still being effective.
Other kinds of search systems have to index everything, which simplifies things but has its own scaling challenges.
Easiest way to think about it is that while the majority of webpages are never indexed, every blob of text in a social media post, private message in an app, email, document, etc in every major app in the world, including the ones with billions of users, is indexed in a search engine for that app:
- GSuite search (think of how many gmails are searchable in the world right now… and they are all indexed)
- the enterprise search powering ChatGPT, Claude (these maybe there by now, if not they are likely well on the way)
- The Microsoft 365 search (this is probably massive with so many corporate email systems and teams systems on it)
- slack search
- X(twitter) search
- ticktock search (this idk, I’ve never used ticktock but if every video and every comment is searchable then this is probably huge)
- Facebook search (especially since this is likely combined across its product suite)
These are probably all larger in effective size than google or bing.