←back to thread

176 points nxa | 1 comments | | HN request time: 0.213s | source

I've been playing with embeddings and wanted to try out what results the embedding layer will produce based on just word-by-word input and addition / subtraction, beyond what many videos / papers mention (like the obvious king-man+woman=queen). So I built something that doesn't just give the first answer, but ranks the matches based on distance / cosine symmetry. I polished it a bit so that others can try it out, too.

For now, I only have nouns (and some proper nouns) in the dataset, and pick the most common interpretation among the homographs. Also, it's case sensitive.

Show context
montebicyclelo ◴[] No.43989512[source]
> king-man+woman=queen

Is the famous example everyone uses when talking about word vectors, but is it actually just very cherry picked?

I.e. are there a great number of other "meaningful" examples like this, or actually the majority of the time you end up with some kind of vaguely tangentially related word when adding and subtracting word vectors.

(Which seems to be what this tool is helping to illustrate, having briefly played with it, and looked at the other comments here.)

(Btw, not saying wordvecs / embeddings aren't extremely useful, just talking about this simplistic arithmetic)

replies(7): >>43989576 #>>43989687 #>>43989933 #>>43989963 #>>43990416 #>>43990646 #>>43995277 #
jbjbjbjb ◴[] No.43990416[source]
Well when it works out it is quite satisfying

India - Asia + Europe = Italy

Japan - Asia + Europe = Netherlands

China - Asia + Europe = Soviet-Union

Russia - Asia + Europe = European Russia

calculation + machine = computer

replies(2): >>43991280 #>>43993483 #
kgeist ◴[] No.43993483[source]
Interesting:

  Russia - Europe = Putin
  Ukraine + Putin = Russia
  Putin - Stalin = Bush
  Stalin - purge = Lenin
That means Bush = Ukraine+Putin-Europe-Lenin-purge.

However, the site gives Bush -4%, second best option (best is -2%, "fleet ballistic missile submarine", not sure what negative numbers mean).

replies(1): >>43997888 #
1. nxa ◴[] No.43997888[source]
My interpretation of negative numbers is that no "synonym" was found (no vector pointing in the same direction), and that the closest expression on record is something with an opposite meaning (pointing in reverse direction), so I'd say that's an antonym.