(blog.abdellatif.io)

548 points tifa2up | 1 comments | 20 Oct 25 15:55 UTC | HN request time: 0s | source

Show context

jascha_eng ◴[20 Oct 25 16:40 UTC] No.45645905[source]▶

I have a RAG setup that doesn't work on documents but other data points that we use for generation (the original data is call recordings but it is heavily processed to just a few text chunks). Instead of a reranker model we do vector search and then simply ask GPT-5 in an extra call which of the results is the most relevant to the input question. Is there an advantage to actual reranker models rather than using a generic LLM?

replies(2): >>45645956 #>>45649058 #

1. alansaber ◴[20 Oct 25 20:41 UTC] No.45649058[source]▶

>>45645905 #

I think you should do both in parallel, rather than sequentially. Main reason is vector scoring could cut off something that an LLM will score as relevant

↑

Production RAG: what I learned from processing 5M+ documents