←back to thread

113 points alexmolas | 1 comments | | HN request time: 0.2s | source
Show context
numlocked ◴[] No.45141473[source]
I don’t quite understand. The article says things like:

“With the constant upward pressure on embedding sizes not limited by having to train models in-house, it’s not clear where we’ll slow down: Qwen-3, along with many others is already at 4096”

But aren’t embedding models separate from the LLMs? The size of attention heads in LLMs etc isn’t inherently connected to how a lab might train and release an embedding model. I don’t really understand why growth in LLM size fundamentally puts upward pressure on embedding size as they are not intrinsically connected.

replies(3): >>45141991 #>>45142050 #>>45142874 #
1. svachalek ◴[] No.45141991[source]
All LLMs use embeddings, it's just for embeddings models they stop there, while for a full chat/completion model that's only the first step of the process. Embeddings are coordinates in the latent space of the transformer.