Show HN: Semantic Calculator (king-man+woman=?)

(calc.datova.ai)

176 points nxa | 1 comments | 14 May 25 19:54 UTC | HN request time: 0.959s | source

I've been playing with embeddings and wanted to try out what results the embedding layer will produce based on just word-by-word input and addition / subtraction, beyond what many videos / papers mention (like the obvious king-man+woman=queen). So I built something that doesn't just give the first answer, but ranks the matches based on distance / cosine symmetry. I polished it a bit so that others can try it out, too.

For now, I only have nouns (and some proper nouns) in the dataset, and pick the most common interpretation among the homographs. Also, it's case sensitive.

Show context

godelski ◴[14 May 25 21:08 UTC] No.43989245[source]▶

>>43988533 (OP) #

  data + plural = number
  data - plural = research
  king - crown = (didn't work... crown gets circled in red)
  king - princess = emperor
  king - queen = kingdom
  queen - king = worker
  king + queen = queen + king = kingdom
  boy + age = (didn't work... boy gets circled in red)
  man - age = woman
  woman - age = newswoman
  woman + age = adult female body (tied with man)
  girl + age = female child
  girl + old = female child

The other suggestions are pretty similar to the results I got in most cases. But I think this helps illustrate the curse of dimensionality (i.e. distances are ill-defined in high dimensional spaces). This is still quite an unsolved problem and seems a pretty critical one to resolve that doesn't get enough attention.

replies(9): >>43989480 #>>43989843 #>>43989994 #>>43990000 #>>43990270 #>>43992122 #>>43994931 #>>43996398 #>>44000804 #

n2d4 ◴[14 May 25 22:28 UTC] No.43989843[source]▶

>>43989245 #

For fun, I pasted these into ChatGPT o4-mini-high and asked it for an opinion:

   data + plural    = datasets
   data - plural    = datum
   king - crown     = ruler
   king - princess  = man
   king - queen     = prince
   queen - king     = woman
   king + queen     = royalty
   boy + age        = man
   man - age        = boy
   woman - age      = girl
   woman + age      = elderly woman
   girl + age       = woman
   girl + old       = grandmother

The results are surprisingly good, I don't think I could've done better as a human. But keep in mind that this doesn't do embedding math like OP! Although it does show how generic LLMs can solve some tasks better than traditional NLP.

The prompt I used:

> Remember those "semantic calculators" with AI embeddings? Like "king - man + woman = queen"? Pretend you're a semantic calculator, and give me the results for the following:

replies(5): >>43989988 #>>43990061 #>>43990761 #>>43991235 #>>43998165 #

1. refulgentis ◴[15 May 25 00:54 UTC] No.43990761[source]▶

>>43989843 #

...welcome to ChatGPT, everyone! If you've been asleep since...2022?

(some might say all an LLM does is embeddings :)

↑