(arxiv.org)

147 points fzliu | 1 comments | 29 Aug 25 20:25 UTC | HN request time: 0.001s | source

Show context

gdiamos ◴[29 Aug 25 21:39 UTC] No.45069705[source]▶

Their idea is that capacity of even 4096-wide vectors limits their performance.

Sparse models like BM25 have a huge dimension and thus don’t suffer from this limit, but they don’t capture semantics and can’t follow instructions.

It seems like the holy grail is a sparse semantic model. I wonder how splade would do?

replies(3): >>45070552 #>>45070624 #>>45088848 #

CuriouslyC ◴[29 Aug 25 23:29 UTC] No.45070552[source]▶

>>45069705 #

We already have "sparse" embeddings. Google's Matryoshka embedding schema can scale embeddings from ~150 dimensions to >3k, and it's the same embedding with layers of representational meaning. Imagine decomposing an embedding along principle components, then streaming the embedding vectors in order of their eigenvalue, kind of the idea.

replies(3): >>45070777 #>>45071166 #>>45075319 #

3abiton ◴[30 Aug 25 01:33 UTC] No.45071166[source]▶

>>45070552 #

Doesn't PCA compress the embeddings in this case, ie reduce the accuracy? It's similar to quantization.

replies(1): >>45071547 #

1. CuriouslyC ◴[30 Aug 25 02:55 UTC] No.45071547{3}[source]▶

>>45071166 #

Component analysis doesn't fundamentally reduce information, it just rotates it into a more informative basis. People usually drop vectors using the eigenvalues to do dimensionality reduction, but you don't have to do that.

↑

The Theoretical Limitations of Embedding-Based Retrieval