←back to thread

77 points kaycebasques | 2 comments | | HN request time: 0.001s | source
Show context
jdthedisciple ◴[] No.45785827[source]
Intriguing! This inspired me to run the example "calculation" ("king" - "man" + "woman") against several well-known embedding models and order them by L2 distance between the actual output and the embedding for "queen". Result:

    voyage-3-large:             0.54
    voyage-code-3:              0.62
    qwen3-embedding:4b:         0.71
    embeddinggemma:             0.84
    voyage-3.5-lite:            0.94
    text-embedding-3-small:     0.97
    voyage-3.5:                 1.01
    text-embedding-3-large:     1.13
Shocked by the apparently bad performance of OpenAI's SOTA model. Also always had a gut feeling that `voyage-3-large` secretly may be the best embedding model out there. Have I been vindicated? Make of it what you will ...

Also `qwen3-embedding:4b` is my current favorite for local RAG for good reason...

replies(2): >>45786704 #>>45786951 #
1. smallerize ◴[] No.45786704[source]
That never exactly worked for word2vec either. https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935...
replies(1): >>45786774 #
2. kaycebasques ◴[] No.45786774[source]
From the linked article:

> The widely known example only works because the implementation of the algorithm will exclude the original vector from the possible results!

I saw this issue in the "same topic, different domain" experiment when using EmbeddingGemma with the default task types. But when using custom task types, the vector arithmetic worked as expected. I didn't have to remove the original vector from the results or control for that in any way. So while the criticism is valid for word2vec I'm skeptical that modern embedding models still have this issue.

Very curious to learn whether modern models are still better at some analogies (e.g. male/female) and worse at others, though. Is there any more recent research on that topic? The linked article is from 2019.