←back to thread

548 points tifa2up | 4 comments | | HN request time: 0.609s | source
1. manishsharan ◴[] No.45645772[source]
Thanks for sharing. TIL about rerankers.

Chunking strategy is a big issue. I found acceptable results by shoving large texts to to gemini flash and have it summarize and extract chunks instead of whatever text splitter I tried. I use the method published by Anthropic https://www.anthropic.com/engineering/contextual-retrieval i.e. include full summary along with chunks for each embedding.

I also created a tool to enable the LLM to do vector search on its own .

I do not use Langchain or python.. I use Clojure+ LLMs' REST APIs.

replies(2): >>45645995 #>>45692691 #
2. esafak ◴[] No.45645995[source]
Have you measured your latency, and how sensitive are you to it?
replies(1): >>45646290 #
3. manishsharan ◴[] No.45646290[source]
>> Have you measured your latency, and how sensitive are you to it?

Not sensitive to latency at all. My users would rather have well researched answers than poor answers.

Also, I use batch mode APIs for chunking .. it is so much cheaper.

4. crassT ◴[] No.45692691[source]
I made a startup, https://tokencrush.ai/, to do just this.

I've struggled to find a target market though. Would you mind sharing what your use case is? It would really help give me some direction.