←back to thread

548 points tifa2up | 1 comments | | HN request time: 0.207s | source
Show context
n_u ◴[] No.45646587[source]
> Reranking: the highest value 5 lines of code you'll add. The chunk ranking shifted a lot. More than you'd expect. Reranking can many times make up for a bad setup if you pass in enough chunks. We found the ideal reranker set-up to be 50 chunk input -> 15 output.

What is re-ranking in the context of RAG? Why not just show the code if it’s only 5 lines?

replies(1): >>45646678 #
tifa2up ◴[] No.45646678[source]
OP. Reranking is a specialized LLM that takes the user query, and a list of candidate results, then re-sets the order based on which ones are more relevant to the query.

Here's sample code: https://docs.cohere.com/reference/rerank

replies(1): >>45647377 #
yahoozoo ◴[] No.45647377[source]
What is the difference between reranking versus generating text embeddings and comparing with cosine similarity?
replies(5): >>45647756 #>>45648506 #>>45649932 #>>45652751 #>>45655281 #
1. PunchTornado ◴[] No.45655281[source]
the reranker is a cross encoder that sees the docs and the query at the same time. What you normally do is you generating embeddings ahead of time, independent of the prompt used, calculate cosine similarity with the prompt, select the top-k best chunks that match the prompt and only then use a reranker to sort them.

embeddings are a lossy compression, so if you feed the chunks with the prompt at the same time, the results are better. But you can't do this for your whole db, that's why the filtering with cosine similarity at the beginning.