←back to thread

176 points nxa | 2 comments | | HN request time: 0.405s | source

I've been playing with embeddings and wanted to try out what results the embedding layer will produce based on just word-by-word input and addition / subtraction, beyond what many videos / papers mention (like the obvious king-man+woman=queen). So I built something that doesn't just give the first answer, but ranks the matches based on distance / cosine symmetry. I polished it a bit so that others can try it out, too.

For now, I only have nouns (and some proper nouns) in the dataset, and pick the most common interpretation among the homographs. Also, it's case sensitive.

Show context
lcnPylGDnU4H9OF ◴[] No.43989603[source]
Some of these make more sense than others (and bookshop is hilarious even if it's only the best answer by a small margin; no shade to bookshop owners).

  map - legend = Mercator projection
  noodle - wheat = egg noodle
  noodle - gluten = tagliatelle
  architecture - calculus = architectural style
  answer - question = comment
  shop - income = bookshop
  curry - curry powder = cuisine
  rice - grain = chicken and rice
  rice + chicken = poultry
  milk + cereal = grain
  blue - yellow = Fiji
  blue - Fiji = orange
  blue - Arkansas + Bahamas + Florida - Pluto = Grenada
replies(2): >>43992397 #>>43995321 #
1. C-x_C-f ◴[] No.43992397[source]
I don't want to dump too many but I found

   chess - checkers = wormseed mustard (63%)
pretty funny and very hard to understand. All the other options are hyperspecific grasslike plants like meadow salsify.
replies(1): >>43992724 #
2. ccppurcell ◴[] No.43992724[source]
My philosophical take on it is that natural language has many many more dimensions than we could hope to represent. Whenever you do dimension reduction you lose information.