I wonder how this compares to KBLaM [1], which also has a preprocessing step to prepare a large amount of reference material for direct access by LLMs. One obvious difference is that it has a modified attention mechanism they call "rectangular attention". The paper was posted on HN a few times, but it hasn't generated any discussion yet.
[1]: Introducing KBLaM: Bringing plug-and-play external knowledge to LLMs | https://www.microsoft.com/en-us/research/blog/introducing-kb...
replies(1):