←back to thread

279 points freediver | 4 comments | | HN request time: 0.001s | source
Show context
marginalia_nu ◴[] No.45952174[source]
The idea behind search itself is very simple, and it's a fun problem domain that I encourage anyone to explore[1].

The difficulties in search are almost entirely dealing with the large amounts of data, both logistically and in handling underspecified queries.

A DBMS-backed approach breaks down surprisingly fast. Probably perfectly fine if you're indexing your own website, but will likely choke on something the size of English wikipedia.

[1] The SeIRP e-book is a good (free) starting point https://ciir.cs.umass.edu/irbook/

replies(7): >>45952237 #>>45952734 #>>45952769 #>>45952991 #>>45953075 #>>45953286 #>>45954345 #
1. gcanyon ◴[] No.45952991[source]
> The difficulties in search are almost entirely dealing with the large amounts of data, both logistically and in handling underspecified queries.

I would expect the difficulty to be deciding which item to return when there are multiple that contain the search term. Is wikipedia's article on Gilligan's Island better than some guy's blog post? Or is that guy a fanatic who has spent his entire life pondering whether Wrongway Feldman was malicious or how Irving met Bingo Bango and Bongo?

Add in rank hacking, keyword stuffing, etc. and it seems like a very hard problem, while scaling... is scaling? ¯\_(ツ)_/¯

replies(2): >>45953018 #>>45953084 #
2. marginalia_nu ◴[] No.45953018[source]
That would be the "handling underspecified queries" thing I mentioned.
3. dumbfounder ◴[] No.45953084[source]
Elastic and many others fail to solve this problem too. There are many different strategies and many of them require ingenuity and development.
replies(1): >>45953256 #
4. jonstewart ◴[] No.45953256[source]
It’s not like ElasticSearch lacks ranking algorithms and control thereof. But it can require tuning and adjustment for various domains. Relevancy is, after all, subjective.