←back to thread

279 points freediver | 1 comments | | HN request time: 0.353s | source
Show context
eduction ◴[] No.45951334[source]
I completely agree with the insight that full text search has been complexified. People seem to want to jump straight to clustering or other enterprise level things.

I also appreciate the moxie of getting in there and building it yourself.

Myself, I reach for Lucene. Then you don’t need to build all this yourself if you don’t want. It lives in a dir on disk. True, it’s a separate database, but one optimized for this problem.

replies(1): >>45951365 #
aorloff ◴[] No.45951365[source]
This was the solution I was thinking about, but I thought, well that's the way someone would have done it 20 years ago
replies(1): >>45951846 #
shevy-java ◴[] No.45951846[source]
Alright but why do we not have more search engines that are actually good?

I'd love to cut myself off from Google, including Google Search, but any alternatives manage to be even worse. Consistently so. It's as if Google won the war by being just permanently slightly better - while everyone is actually really crap. That wasn't the case, say, 10 years ago or so.

replies(5): >>45952022 #>>45952024 #>>45952350 #>>45955823 #>>45958345 #
eduction ◴[] No.45955823[source]
Not all search is web-wide search. The best-known example of this is probably Amazon's search bar. No one really wants to search Amazon via Google. They have staffers contributing heavily to Lucene.

But also there are all kinds of other applications. Let's say you run a reviews site; you can build a bespoke power search form allowing people to sort on things like price, date of review, set a minimum star threshhold, etc. You can also weigh product names or review titles more heavily in the index scoring (a review /of/ the Pixel 10 should rank higher than a review that mentions the Pixel 10 prominently).

Even being able to sort results of searching blog posts or other dated content by date is powerful - Google can only guess at the actual dates of those posts. You can search with required tags, or weigh tags more heavily in result scoring. You can put your finger on the scale and say, effectively, post A should always rank more highly than post B for term X.

Also, site operators know traffic/popularity, which internet search engine can only sort of guess at, and can use this to score/sort. Amazon clearly does this.

For some reason a lot of web devs seem to think search is this really hard problem. But once you learn the basics of how it works, and if you use a library like Lucene, it does not need to be hard at all. Mostly you just have to be strategic and consistent about where and when you index and deindex content, it's usually alongside your db persistence calls. Once it's running you optimize by sprinkling some minimum amount of magic on your scoring setup to make it worthwhile/differentiated from Google.

replies(1): >>45990004 #
1. aorloff ◴[] No.45990004[source]
Before Lucene, search was really hard