←back to thread

548 points tifa2up | 1 comments | | HN request time: 0s | source
Show context
mediaman ◴[] No.45646532[source]
The point about synthetic query generation is good. We found users had very poor queries, so we initially had the LLM generate synthetic queries. But then we found that the results could vary widely based on the specific synthetic query it generated, so we had it create three variants (all in one LLM call, so that you can prompt it to generate a wide variety, instead of getting three very similar ones back), do parallel search, and then use reciprocal rank fusion to combine the list into a set of broadly strong performers. For the searches we use hybrid dense + sparse bm25, since dense doesn't work well for technical words.

This, combined with a subsequent reranker, basically eliminated any of our issues on search.

replies(4): >>45647148 #>>45647160 #>>45647255 #>>45649007 #
1. avereveard ◴[] No.45647148[source]
final tip is to also feed the interpretation of the user search to the user on the other side, so he can check if the llm understanding was correct.