←back to thread

113 points alexmolas | 1 comments | | HN request time: 0.206s | source
Show context
minimaxir ◴[] No.45143161[source]
It's the same Jevons paradox reason as why LLMs are so big despite massive diminishing returns. If we can output 4096Ds, why not use all the Ds?

Like LLMs, the bottleneck is still training data and the training regimen, but there's still a demand for smaller embedding models due to both storage and compute concerns. EmbeddingGemma (https://huggingface.co/google/embeddinggemma-300m), released just yesterday, beats the 4096D Qwen-3 benchmarks at 768D, and using the 128D equivalent via MRL beats many 768D embedding models.

replies(1): >>45146278 #
1. fredophile ◴[] No.45146278[source]
I'm not an expert on LLMs but my guess would be that this is a result of the curse of dimensionality. As a general rule more dimensions != more better.