I don’t quite understand. The article says things like:
“With the constant upward pressure on embedding sizes not limited by having to train models in-house, it’s not clear where we’ll slow down: Qwen-3, along with many others is already at 4096”
But aren’t embedding models separate from the LLMs? The size of attention heads in LLMs etc isn’t inherently connected to how a lab might train and release an embedding model. I don’t really understand why growth in LLM size fundamentally puts upward pressure on embedding size as they are not intrinsically connected.
replies(3):