←back to thread

147 points fzliu | 1 comments | | HN request time: 0.21s | source
Show context
gdiamos ◴[] No.45069705[source]
Their idea is that capacity of even 4096-wide vectors limits their performance.

Sparse models like BM25 have a huge dimension and thus don’t suffer from this limit, but they don’t capture semantics and can’t follow instructions.

It seems like the holy grail is a sparse semantic model. I wonder how splade would do?

replies(3): >>45070552 #>>45070624 #>>45088848 #
CuriouslyC ◴[] No.45070552[source]
We already have "sparse" embeddings. Google's Matryoshka embedding schema can scale embeddings from ~150 dimensions to >3k, and it's the same embedding with layers of representational meaning. Imagine decomposing an embedding along principle components, then streaming the embedding vectors in order of their eigenvalue, kind of the idea.
replies(3): >>45070777 #>>45071166 #>>45075319 #
jxmorris12 ◴[] No.45070777[source]
Matryoshka embeddings are not sparse. And SPLADE can scale to tens or hundreds of thousands of dimensions.
replies(2): >>45070869 #>>45088885 #
CuriouslyC ◴[] No.45070869[source]
If you consider the actual latent space the full higher dimensional representation, and you take the first principle component, the other vectors are zero. Pretty sparse. No it's not a linked list sparse matrix. Don't be a pedant.
replies(2): >>45072427 #>>45072485 #
1. yorwba ◴[] No.45072485[source]
When you truncate Matryoshka embeddings, you get the storage benefits of low-dimensional vectors with the limited expressiveness of low-dimensional vectors. Usually, what people look for in sparse vectors is to combine the storage benefits of low-dimensional vectors with the expressiveness of high-dimensional vectors. For that, you need the non-zero dimensions to be different for different vectors.