←back to thread

230 points taikon | 1 comments | | HN request time: 0.249s | source
Show context
rastierastie ◴[] No.42546980[source]
What do other HNers make out of this? Would you use this? Responsible for a legaltech startup here.
replies(2): >>42547496 #>>42548418 #
1. ankit219 ◴[] No.42548418[source]
If you have to deal with domain specific data, then this would not work as well. I mean it will get you an incremental shift (based on what I see, it's just creating explicit relationships at the index time instead of letting the model do it at runtime before generating an output. Effective incrementally, but depends on type of data.) yes, though not enough to justify redoing your own pipeline. You are likely better off with your current approach and developing robust evals.

If you want a transformational shift in terms of accuracy and reasoning, the answer is different. Many a times RAG accuracy suffers because the text is out of distribution, and ICL does not work well. You get away with it if all your data is in public domain in some form (ergo, llm was trained on it), else you keep seeing the gaps with no way to bridge them. I published a paper around it and how to effciently solve it, if interested. Here is a simplified blog post on the same: https://medium.com/@ankit_94177/expanding-knowledge-in-large...

Edit: Please reach out here or on email if you would like further details. I might have skipped too many things in the above comment.