←back to thread

279 points freediver | 1 comments | | HN request time: 0.205s | source
Show context
marginalia_nu ◴[] No.45952174[source]
The idea behind search itself is very simple, and it's a fun problem domain that I encourage anyone to explore[1].

The difficulties in search are almost entirely dealing with the large amounts of data, both logistically and in handling underspecified queries.

A DBMS-backed approach breaks down surprisingly fast. Probably perfectly fine if you're indexing your own website, but will likely choke on something the size of English wikipedia.

[1] The SeIRP e-book is a good (free) starting point https://ciir.cs.umass.edu/irbook/

replies(7): >>45952237 #>>45952734 #>>45952769 #>>45952991 #>>45953075 #>>45953286 #>>45954345 #
1. submeta ◴[] No.45952237[source]
Thank you very much for the recommendation. I am in the process of building knowledge base bots, and am confronted with the task of creating various crawlers for the different sources the company has. And this book comes in very handy.