←back to thread

176 points nxa | 1 comments | | HN request time: 0.366s | source

I've been playing with embeddings and wanted to try out what results the embedding layer will produce based on just word-by-word input and addition / subtraction, beyond what many videos / papers mention (like the obvious king-man+woman=queen). So I built something that doesn't just give the first answer, but ranks the matches based on distance / cosine symmetry. I polished it a bit so that others can try it out, too.

For now, I only have nouns (and some proper nouns) in the dataset, and pick the most common interpretation among the homographs. Also, it's case sensitive.

Show context
godelski ◴[] No.43989245[source]

  data + plural = number
  data - plural = research
  king - crown = (didn't work... crown gets circled in red)
  king - princess = emperor
  king - queen = kingdom
  queen - king = worker
  king + queen = queen + king = kingdom
  boy + age = (didn't work... boy gets circled in red)
  man - age = woman
  woman - age = newswoman
  woman + age = adult female body (tied with man)
  girl + age = female child
  girl + old = female child
The other suggestions are pretty similar to the results I got in most cases. But I think this helps illustrate the curse of dimensionality (i.e. distances are ill-defined in high dimensional spaces). This is still quite an unsolved problem and seems a pretty critical one to resolve that doesn't get enough attention.
replies(9): >>43989480 #>>43989843 #>>43989994 #>>43990000 #>>43990270 #>>43992122 #>>43994931 #>>43996398 #>>44000804 #
n2d4 ◴[] No.43989843[source]
For fun, I pasted these into ChatGPT o4-mini-high and asked it for an opinion:

   data + plural    = datasets
   data - plural    = datum
   king - crown     = ruler
   king - princess  = man
   king - queen     = prince
   queen - king     = woman
   king + queen     = royalty
   boy + age        = man
   man - age        = boy
   woman - age      = girl
   woman + age      = elderly woman
   girl + age       = woman
   girl + old       = grandmother

The results are surprisingly good, I don't think I could've done better as a human. But keep in mind that this doesn't do embedding math like OP! Although it does show how generic LLMs can solve some tasks better than traditional NLP.

The prompt I used:

> Remember those "semantic calculators" with AI embeddings? Like "king - man + woman = queen"? Pretend you're a semantic calculator, and give me the results for the following:

replies(5): >>43989988 #>>43990061 #>>43990761 #>>43991235 #>>43998165 #
1. refulgentis ◴[] No.43990761[source]
...welcome to ChatGPT, everyone! If you've been asleep since...2022?

(some might say all an LLM does is embeddings :)