←back to thread

151 points modinfo | 8 comments | | HN request time: 0.484s | source | bottom
Show context
adeptima ◴[] No.43682013[source]
Meilisearch is great, used it for a quick demo

However if you need a full-text search similar to Apache Lucene, my go-to options are based on Tantivy

Tantivy https://github.com/quickwit-oss/tantivy

Asian language, BM25 scoring, Natural query language, JSON fields indexing support are all must-have features for me

Quickwit - https://github.com/quickwit-oss/quickwit - https://quickwit.io/docs/get-started/quickstart

ParadeDB - https://github.com/paradedb/paradedb

I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

Any thoughts on up-to-date hybrid search experience are greatly appreciated

replies(6): >>43682354 #>>43682566 #>>43683120 #>>43683227 #>>43688339 #>>43704628 #
1. kk3 ◴[] No.43683227[source]
As far as combining full-text search with embedding vectors goes, Typesense has been building features around that - https://typesense.org/docs/28.0/api/vector-search.html

I haven't tried those features but I did try Meilisearch awhile back and I found Typesense to index much faster (which was a bottleneck for my particular use case) and also have many more features to control search/ranking. Although just to say, my use case was not typical for search and I'm sure Meilisearch has come a long way since then, so this is not to speak poorly of Meilisearch, just that Typesense is another great option.

replies(3): >>43684199 #>>43684928 #>>43695500 #
2. Kerollmops ◴[] No.43684199[source]
Meilisearch just improved the indexing speed and simplified the update path. We released v1.12 and highly improved indexing speed [1]. We improved the upgrade path with the dumpless upgrade feature [2].

The main advantage of Meilisearch is that the content is written to disk. Rebooting an instance is instant, and that's quite useful when booting from a snapshot or upgrading to a smaller or larger machine. We think disk-first is a great approach as the user doesn't fear reindexing when restarting the program.

That's where Meilisearch's dumpless upgrade is excellent: all the content you've previously indexed is still written to disk and slightly modified to be compatible with the latest engine version. This differs from Typesense, where upgrades necessitate reindexing the documents in memory. I don't know about embeddings. Do you have to query OpenAI again when upgrading? Meilisearch keeps the embeddings on disk to avoid costs and remove the indexing time.

[1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1... [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...

replies(1): >>43688924 #
3. irevoire ◴[] No.43684928[source]
I hate the way typesense are doing their « hybrid search ». It’s called fusion search and the idea is that you have no idea of how well the semantic and full text search are being doing, so you’re going to randomly mix them together without looking at all at the results both searches are returning.

I tried to explain them in an issue that in this state it was pretty much useless because you would always have one or the other search strategy that would give you awful results, but they basically said « some other engine are doing that as well so we won’t try to improve it » + a ton a justification instead of just admitting that this strategy is bad.

replies(1): >>43685833 #
4. jabo ◴[] No.43685833[source]
We generally tend to engage in in-depth conversations with our users.

But in this case, when you opened the GitHub issue, we noticed that you’re part of the Meilisearch team, so we didn’t want to spend too much time explaining something in-depth to someone who was just doing competitive research, when we could have instead spent that time helping other Typesense users. Which is why the response to you might have seemed brief.

For what it’s worth, the approach used in Typesense is called Reciprocal Rank Fusion (RRF) and it’s a well researched topic that has a bunch of academic papers published on it. So it’s best to read those papers to understand the tradeoffs involved.

replies(1): >>43685966 #
5. irevoire ◴[] No.43685966{3}[source]
> But in this case, when you opened the GitHub issue, we noticed that you’re part of the Meilisearch team, so we didn’t want to spend too much time explaining something in-depth to someone who was just doing competitive research, when we could have instead spent that time helping other Typesense users. Which is why the response to you might have seemed brief.

Well, in this case I was just trying to be a normal user that want the best relevancy possible and couldn’t find a solution. But the reason why I couldn’t find it was not because you didn’t want to spend more time on my case, it was because typesense provide no solution to this problem.

> it’s a well researched topic that has a bunch of academic papers published on it. So it’s best to read those papers to understand the tradeoffs involved.

Yeah, cool or in other word « it’s bad, we know it and we can’t help you, but it’s the state of the art, you should instruct yourself ». But guess what, meilisearch may need some fine-tuning around your model etc, but in the end it gives you the tool to make a proper hybrid search that knows the quality of the results before mixing them.

If other people want to see the original issue: https://github.com/typesense/typesense/issues/1964

replies(1): >>43686094 #
6. spiderfarmer ◴[] No.43686094{4}[source]
I think this is a good example of why people should disclose their background when commenting on competing products/projects. Even if the intentions were sound, which seems to be the case here, upfront disclosure would have given the conversation more weight and meaning.
7. kk3 ◴[] No.43688924[source]
Thank you for the response here. Not being able to upgrade the machine without completely re-indexing has actually become a huge issue for me. My use case is that I need to upgrade the machine to perform a big indexing operation that happens all at once and then after that reduce the machine resources. Typesense has future plans to persist the index to disk but it's not on the road map yet. And with the indexing improvements, Meilisearch may be a viable option for my use case now. I'll be checking this out!
8. jimmydoe ◴[] No.43695500[source]
+1 typesense is really fast. the only drawback is starting up is slow when index getting larger. the good thing is full text search (excl vector) is relatively stable feature set, so if your use case is just FTS, you won't need to restart very often for version upgrade.