Show HN: Semantic Calculator (king-man+woman=?)

1. montebicyclelo ◴[14 May 25 21:41 UTC] No.43989512[source]▶

> king-man+woman=queen

Is the famous example everyone uses when talking about word vectors, but is it actually just very cherry picked?

I.e. are there a great number of other "meaningful" examples like this, or actually the majority of the time you end up with some kind of vaguely tangentially related word when adding and subtracting word vectors.

(Which seems to be what this tool is helping to illustrate, having briefly played with it, and looked at the other comments here.)

(Btw, not saying wordvecs / embeddings aren't extremely useful, just talking about this simplistic arithmetic)

replies(7): >>43989576 #>>43989687 #>>43989933 #>>43989963 #>>43990416 #>>43990646 #>>43995277 #

2. raddan ◴[14 May 25 21:51 UTC] No.43989576[source]▶

>>43989512 (TP) #

> is it actually just very cherry picked?

100%

3. gregschlom ◴[14 May 25 22:05 UTC] No.43989687[source]▶

>>43989512 (TP) #

Also, as I just learned the other day, the result was never equal, just close to "queen" in the vector space.

replies(2): >>43990990 #>>43992066 #

4. Retr0id ◴[14 May 25 22:42 UTC] No.43989933[source]▶

>>43989512 (TP) #

I think it's slightly uncommon for the vectors to "line up" just right, but here are a few I tried:

actor - man + woman = actress

garden + person = gardener

rat - sewer + tree = squirrel

toe - leg + arm = digit

5. groby_b ◴[14 May 25 22:45 UTC] No.43989963[source]▶

>>43989512 (TP) #

I think it's worth keeping in mind that word2vec was specifically trained on semantic similarity. Most embedding APIs don't really give a lick about the semantic space

And, worse, most latent spaces are decidedly non-linear. And so arithmetic loses a lot of its meaning. (IIRC word2vec mostly avoided nonlinearity except for the loss function). Yes, the distance metric sort-of survives, but addition/multiplication are meaningless.

(This is also the reason choosing your embedding model is a hard-to-reverse technical decision - you can't just transform existing embeddings into a different latent space. A change means "reembed all")

6. jbjbjbjb ◴[14 May 25 23:52 UTC] No.43990416[source]▶

>>43989512 (TP) #

Well when it works out it is quite satisfying

India - Asia + Europe = Italy

Japan - Asia + Europe = Netherlands

China - Asia + Europe = Soviet-Union

Russia - Asia + Europe = European Russia

calculation + machine = computer

replies(2): >>43991280 #>>43993483 #

7. bee_rider ◴[15 May 25 00:37 UTC] No.43990646[source]▶

>>43989512 (TP) #

Hmm, well I got

    cherry - picker = blackwood

if that helps.

8. charcircuit ◴[15 May 25 01:37 UTC] No.43990990[source]▶

>>43989687 #

And queen isn't even the closest.

replies(1): >>43991033 #

9. mcswell ◴[15 May 25 01:42 UTC] No.43991033{3}[source]▶

>>43990990 #

What is the closest?

replies(2): >>43991436 #>>43991529 #

10. trhway ◴[15 May 25 02:34 UTC] No.43991280[source]▶

>>43990416 #

democracy - vote = progressivism

I'll have to mediate on that.

replies(1): >>43992018 #

11. ◴[15 May 25 03:07 UTC] No.43991436{4}[source]▶

>>43991033 #

12. charcircuit ◴[15 May 25 03:23 UTC] No.43991529{4}[source]▶

>>43991033 #

Usually king is.

replies(2): >>43992891 #>>43992920 #

13. blipvert ◴[15 May 25 05:08 UTC] No.43992018{3}[source]▶

>>43991280 #

person + man + woman + camera + television = user

14. chis ◴[15 May 25 05:17 UTC] No.43992066[source]▶

>>43989687 #

I mean they are floating point vectors so

15. KeplerBoy ◴[15 May 25 08:09 UTC] No.43992891{5}[source]▶

>>43991529 #

That would be hilariously disappointing.

16. Narew ◴[15 May 25 08:15 UTC] No.43992920{5}[source]▶

>>43991529 #

yes and it's only work because we prevent the output to be in the input.

17. kgeist ◴[15 May 25 10:08 UTC] No.43993483[source]▶

>>43990416 #

Interesting:

  Russia - Europe = Putin
  Ukraine + Putin = Russia
  Putin - Stalin = Bush
  Stalin - purge = Lenin

That means Bush = Ukraine+Putin-Europe-Lenin-purge.

However, the site gives Bush -4%, second best option (best is -2%, "fleet ballistic missile submarine", not sure what negative numbers mean).

replies(1): >>43997888 #

18. loganmhb ◴[15 May 25 14:09 UTC] No.43995277[source]▶

>>43989512 (TP) #

I once saw an explanation which I can no longer find that what's really happening here is also partly "man" and "woman" are very similar vectors which nearly cancel each other out, and "king" is excluded from the result set to avoid returning identities, leaving "queen" as the closest next result. That's why you have to subtract and then add, and just doing single operations doesn't work very well. There's some semantic information preserved that might nudge it in the right direction but not as much as the naive algebra suggests, and you can't really add up a bunch of these high-dimensional vectors in a sensible way.

E.g. in this calculator "man - king + princess = woman", which doesn't make much sense. "airplane - engine", which has a potential sensible answer of "glider", instead "= Czechoslovakia". Go figure.

19. nxa ◴[15 May 25 18:31 UTC] No.43997888{3}[source]▶

>>43993483 #

My interpretation of negative numbers is that no "synonym" was found (no vector pointing in the same direction), and that the closest expression on record is something with an opposite meaning (pointing in reverse direction), so I'd say that's an antonym.