Most active commenters

nxa(9)
godelski(7)
n2d4(5)
(4)
ericdiao(3)
bee_rider(3)
neom(3)
downboots(3)

Popular/hot comments

>>43989245 #
>>43989961 #
>>43989512 #
>>43988786 #
>>43989843 #
>>43988904 #
>>43989925 #
>>43991235 #

Show HN: Semantic Calculator (king-man+woman=?)

(calc.datova.ai)

I've been playing with embeddings and wanted to try out what results the embedding layer will produce based on just word-by-word input and addition / subtraction, beyond what many videos / papers mention (like the obvious king-man+woman=queen). So I built something that doesn't just give the first answer, but ranks the matches based on distance / cosine symmetry. I polished it a bit so that others can try it out, too.

For now, I only have nouns (and some proper nouns) in the dataset, and pick the most common interpretation among the homographs. Also, it's case sensitive.

1. antidnan ◴[14 May 25 20:00 UTC] No.43988592[source]▶

>>43988533 (OP) #

Neat! Reminds me of infinite craft

https://neal.fun/infinite-craft/

replies(1): >>43989904 #

2. firejake308 ◴[14 May 25 20:02 UTC] No.43988601[source]▶

>>43988533 (OP) #

King-man+woman=Navratilova, who is apparently a Czech tennis player. Apparently, it's very case-sensitive. Cool idea!

replies(1): >>43988861 #

3. nikolay ◴[14 May 25 20:23 UTC] No.43988786[source]▶

>>43988533 (OP) #

Really?!

  man - brain = woman
  woman - brain = businesswoman

replies(6): >>43988818 #>>43988887 #>>43988910 #>>43988964 #>>43988972 #>>43989276 #

4. 2muchcoffeeman ◴[14 May 25 20:27 UTC] No.43988818[source]▶

>>43988786 #

Man - brain = Irish sea

replies(1): >>43988825 #

5. nikolay ◴[14 May 25 20:28 UTC] No.43988825{3}[source]▶

>>43988818 #

Case matters, obviously! Try "man" with a lower-case "M"!

replies(1): >>43988974 #

6. adzm ◴[14 May 25 20:31 UTC] No.43988842[source]▶

>>43988533 (OP) #

noodle+tomato=pasta

this is pretty fun

replies(1): >>43988909 #

7. fph ◴[14 May 25 20:33 UTC] No.43988861[source]▶

>>43988601 #

"King" (capital) probably was interpreted as https://en.wikipedia.org/wiki/Billie_Jean_King , that's why a tennis player showed up.

replies(2): >>43988944 #>>43989099 #

8. karel-3d ◴[14 May 25 20:35 UTC] No.43988887[source]▶

>>43988786 #

woman+penis=newswoman (businesswoman is second)

man+vagina=woman (ok that is boring)

9. cabalamat ◴[14 May 25 20:37 UTC] No.43988904[source]▶

>>43988533 (OP) #

What does it mean when it surrounds a word in red? Is this signalling an error?

replies(3): >>43988929 #>>43988992 #>>43989055 #

10. growlNark ◴[14 May 25 20:37 UTC] No.43988909[source]▶

>>43988842 #

Surely the correct answer would be `pasta-in-tomato-sauce`? Pasta exists outside of tomato sauce.

11. sapphicsnail ◴[14 May 25 20:37 UTC] No.43988910[source]▶

>>43988786 #

Telling that Jewess, feminist, and spinster were near matches as well.

12. nxa ◴[14 May 25 20:39 UTC] No.43988929[source]▶

>>43988904 #

Yes, word in red = word not found mostly the case when you try plurals or non-nouns (for now)

replies(1): >>43989067 #

13. nxa ◴[14 May 25 20:40 UTC] No.43988944{3}[source]▶

>>43988861 #

when I first tried it, king was referring to the instrument and I was getting a result king-man+woman=flute ... :-D

14. zerof1l ◴[14 May 25 20:40 UTC] No.43988948[source]▶

>>43988533 (OP) #

male + age = female

female + age = male

15. nxa ◴[14 May 25 20:41 UTC] No.43988964[source]▶

>>43988786 #

I probably should have prefaced this with "try at your own risk, results don't reflect the author's opinions"

replies(1): >>43989943 #

16. dalmo3 ◴[14 May 25 20:42 UTC] No.43988972[source]▶

>>43988786 #

I think subtraction is broken. None of what I tried made any sense. Water - oxygen = gin and tonic.

17. Alifatisk ◴[14 May 25 20:42 UTC] No.43988974{4}[source]▶

>>43988825 #

Why does case matter? How does it affect the meaning?

replies(2): >>43989004 #>>43989028 #

18. iambateman ◴[14 May 25 20:44 UTC] No.43988992[source]▶

>>43988904 #

Try Lower casing, my phone tried to capitalize and it was a problem.

19. G1N ◴[14 May 25 20:45 UTC] No.43988999[source]▶

>>43988533 (OP) #

twelve-ten+five=

six (84%)

Close enough I suppose

20. bfLives ◴[14 May 25 20:45 UTC] No.43989004{5}[source]▶

>>43988974 #

“Man” is probably being interpreted as the Isle of Man.

https://en.m.wikipedia.org/wiki/Isle_of_Man

21. lightyrs ◴[14 May 25 20:45 UTC] No.43989014[source]▶

>>43988533 (OP) #

I don't get it but I'm not sure I'm supposed to.

    life + death = mortality
    life - death = lifestyle

    drug + time = occasion
    drug - time = narcotic

    art + artist + money = creativity
    art + artist - money = muse

    happiness + politics = contentment
    happiness + art      = gladness
    happiness + money    = joy
    happiness + love     = joy

replies(2): >>43989096 #>>43989577 #

22. ◴[14 May 25 20:46 UTC] No.43989020[source]▶

>>43988533 (OP) #

23. woodruffw ◴[14 May 25 20:46 UTC] No.43989021[source]▶

>>43988533 (OP) #

colorless+green+ideas doesn't produce anything of interest, which is disappointing.

replies(1): >>43989949 #

24. G1N ◴[14 May 25 20:47 UTC] No.43989028{5}[source]▶

>>43988974 #

Man (capital M) is probably being interpreted as some proper noun, maybe Isle of Man in this case?

25. skeptrune ◴[14 May 25 20:48 UTC] No.43989048[source]▶

>>43988533 (OP) #

This is super fun. Offering the ranked matches makes it significantly more engaging than just showing the final result.

26. fallinghawks ◴[14 May 25 20:49 UTC] No.43989055[source]▶

>>43988904 #

Seems to be a word not in its dictionary. Seems to not have any country or language names.

Edit: these must be capitalized to be recognized.

27. spindump8930 ◴[14 May 25 20:49 UTC] No.43989060[source]▶

>>43988533 (OP) #

First off, this interface is very nice and a pleasure to use, congrats!

Are you using word2vec for these, or embeddings from another model?

I also wanted to add some flavor since it looks like many folks in this thread haven't seen something like this - it's been known since 2013 that we can do this (but it's great to remind folks especially with all the "modern" interest in NLP).

It's also known (in some circles!) that a lot of these vector arithmetic things need some tricks to really shine. For example, excluding the words already present in the query[1]. Others in this thread seem surprised at some of the biases present - there's also a long history of work on that [2,3].

[1] https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935...

[2] https://arxiv.org/abs/1905.09866

[3] https://arxiv.org/abs/1903.03862

replies(1): >>43989121 #

28. rpastuszak ◴[14 May 25 20:50 UTC] No.43989067{3}[source]▶

>>43988929 #

This is neat!

I think you need to disable auto-capitalisation because on mobile the first word becomes uppercase and triggers a validation error.

29. 7373737373 ◴[14 May 25 20:50 UTC] No.43989070[source]▶

>>43988533 (OP) #

it doesn't know the word human

30. grey-area ◴[14 May 25 20:51 UTC] No.43989077[source]▶

>>43988533 (OP) #

As you might expect from a system with knowledge of word relations but without understanding or a model of the world, this generates gibberish which occasionally sounds interesting.

31. fallinghawks ◴[14 May 25 20:51 UTC] No.43989084[source]▶

>>43988533 (OP) #

goshawk-cocaine = gyrfalcon , which is funny if you know anything about goshawks and gyrfalcons

(Goshawks are very intense, gyrs tend to be leisurely in flight.)

32. grey-area ◴[14 May 25 20:53 UTC] No.43989096[source]▶

>>43989014 #

Does the system you’re querying ‘get it’? From the answers it doesn’t seem to understand these words or their relations. Once in a while it’ll hit on something that seems to make sense.

33. BeetleB ◴[14 May 25 20:53 UTC] No.43989099{3}[source]▶

>>43988861 #

Heh. This is fun:

Navratilova - woman + man = Lendl

34. nxa ◴[14 May 25 20:56 UTC] No.43989121[source]▶

>>43989060 #

Thank you! I actually had a hard time finding prior work on this, so I appreciate the references.

The dictionary is based on https://wordnet.princeton.edu/, no word2vec. It's just a plain lookup among precomputed embeddings (with mxbai-embed-large). And yes, I'm excluding words that are present in the query because.

It would be interesting to see how other models perform. I tried one (forgot the name) that was focused on coding, and it didn't perform nearly as well (in terms of human joy from the results).

replies(1): >>43989426 #

35. kataqatsi ◴[14 May 25 21:00 UTC] No.43989162[source]▶

>>43988533 (OP) #

garden + sin = gardening

hmm...

36. MYEUHD ◴[14 May 25 21:05 UTC] No.43989205[source]▶

>>43988533 (OP) #

king - man + woman = queen

queen - woman + man = drone

replies(1): >>43989604 #

37. blobbers ◴[14 May 25 21:06 UTC] No.43989216[source]▶

>>43988533 (OP) #

rice + fish = fish meat

rice + fish + raw = meat

hahaha... I JUST WANT SUSHI!

38. godelski ◴[14 May 25 21:08 UTC] No.43989245[source]▶

>>43988533 (OP) #

  data + plural = number
  data - plural = research
  king - crown = (didn't work... crown gets circled in red)
  king - princess = emperor
  king - queen = kingdom
  queen - king = worker
  king + queen = queen + king = kingdom
  boy + age = (didn't work... boy gets circled in red)
  man - age = woman
  woman - age = newswoman
  woman + age = adult female body (tied with man)
  girl + age = female child
  girl + old = female child

The other suggestions are pretty similar to the results I got in most cases. But I think this helps illustrate the curse of dimensionality (i.e. distances are ill-defined in high dimensional spaces). This is still quite an unsolved problem and seems a pretty critical one to resolve that doesn't get enough attention.

replies(9): >>43989480 #>>43989843 #>>43989994 #>>43990000 #>>43990270 #>>43992122 #>>43994931 #>>43996398 #>>44000804 #

39. ericdiao ◴[14 May 25 21:09 UTC] No.43989254[source]▶

>>43988533 (OP) #

Interesting: parent + male = female (83%)

Can not personally find the connection here, was expecting father or something.

replies(1): >>43989271 #

40. ericdiao ◴[14 May 25 21:11 UTC] No.43989271[source]▶

>>43989254 #

Though dad is in the list with lower confidence (77%).

High dimension vector is always hard to explain. This is an example.

41. ◴[14 May 25 21:12 UTC] No.43989276[source]▶

>>43988786 #

42. TZubiri ◴[14 May 25 21:20 UTC] No.43989348[source]▶

>>43988533 (OP) #

I'm getting Navralitova instead of queen. And can't get other words to work, I get red circles or no answer at all.

replies(1): >>43989566 #

43. nxa ◴[14 May 25 21:24 UTC] No.43989378[source]▶

>>43988533 (OP) #

This might be helpful: I haven't implemented it in the UI, but from the API response you can see what the word definitions are, both for the input and the output. If the output has homographs, likeliness is split per definition, but the UI only shows the best one.

Also, if it gets buried in comments, proper nouns need to be capitalized (Paris-France+Germany).

I am planning on patching up the UI based on your feedback.

44. kaycebasques ◴[14 May 25 21:29 UTC] No.43989426{3}[source]▶

>>43989121 #

(Question for anyone) how could I go about replicating this with Gemini Embedding? Generate and store an embedding for every word in the dictionary?

replies(1): >>43989502 #

45. Affric ◴[14 May 25 21:36 UTC] No.43989480[source]▶

>>43989245 #

Yeah I did similar tests and got similar results.

Curious tool but not what I would call accurate.

46. ericdiao ◴[14 May 25 21:38 UTC] No.43989486[source]▶

>>43988533 (OP) #

wine - alcohol = grape juice (32%)

Accurate.

47. nxa ◴[14 May 25 21:40 UTC] No.43989502{4}[source]▶

>>43989426 #

Yes, that's pretty much what it is. Watch out for homographs.

48. afandian ◴[14 May 25 21:41 UTC] No.43989509[source]▶

>>43988533 (OP) #

There was a site like this a few years ago (before all the LLM stuff kicked off) that had this and other NLP functionality. Styling was grey and basic. That’s all I remember.

I’ve been unable to find it since. Does anyone know which site I’m thinking of?

replies(1): >>43989697 #

49. montebicyclelo ◴[14 May 25 21:41 UTC] No.43989512[source]▶

>>43988533 (OP) #

> king-man+woman=queen

Is the famous example everyone uses when talking about word vectors, but is it actually just very cherry picked?

I.e. are there a great number of other "meaningful" examples like this, or actually the majority of the time you end up with some kind of vaguely tangentially related word when adding and subtracting word vectors.

(Which seems to be what this tool is helping to illustrate, having briefly played with it, and looked at the other comments here.)

(Btw, not saying wordvecs / embeddings aren't extremely useful, just talking about this simplistic arithmetic)

replies(7): >>43989576 #>>43989687 #>>43989933 #>>43989963 #>>43990416 #>>43990646 #>>43995277 #

50. jumploops ◴[14 May 25 21:46 UTC] No.43989543[source]▶

>>43988533 (OP) #

This is super neat.

I built a game[0] along similar lines, inspired by infinite craft[1].

The idea is that you combine (or subtract) “elements” until you find the goal element.

I’ve had a lot of fun with it, but it often hits the same generated element. Maybe I should update it to use the second (third, etc.) choice, similar to your tool.

[0] https://alchemy.magicloops.app/

[1] https://neal.fun/infinite-craft/

51. gus_massa ◴[14 May 25 21:49 UTC] No.43989566[source]▶

>>43989348 #

From another comment, https://news.ycombinator.com/item?id=43988861 King (with capital K) was a top 1 male tenis player.

52. ezbie ◴[14 May 25 21:50 UTC] No.43989569[source]▶

>>43988533 (OP) #

Can someone explain me what the fuck this is supposed to be!?

replies(2): >>43989917 #>>43989957 #

53. raddan ◴[14 May 25 21:51 UTC] No.43989576[source]▶

>>43989512 #

> is it actually just very cherry picked?

100%

54. bee_rider ◴[14 May 25 21:51 UTC] No.43989577[source]▶

>>43989014 #

    Life + death = mortality

is pretty good IMO, it is a nice blend of the concepts in an intuitive manner. I don’t really get

   drug + time = occasion

But

   drug - time = narcotic

Is kind of interesting; one definition of narcotic is

> a drug (such as opium or morphine) that in moderate doses dulls the senses, relieves pain, and induces profound sleep but in excessive doses causes stupor, coma, or convulsions

https://www.merriam-webster.com/dictionary/narcotic

So we can see some element of losing time in that type of drug. I guess? Maybe I’m anthropomorphizing a bit.

55. matallo ◴[14 May 25 21:51 UTC] No.43989584[source]▶

>>43988533 (OP) #

uncle + aunt = great-uncle (91%)

great idea, but I find the results unamusing

replies(1): >>43989784 #

56. lcnPylGDnU4H9OF ◴[14 May 25 21:53 UTC] No.43989603[source]▶

>>43988533 (OP) #

Some of these make more sense than others (and bookshop is hilarious even if it's only the best answer by a small margin; no shade to bookshop owners).

  map - legend = Mercator projection
  noodle - wheat = egg noodle
  noodle - gluten = tagliatelle
  architecture - calculus = architectural style
  answer - question = comment
  shop - income = bookshop
  curry - curry powder = cuisine
  rice - grain = chicken and rice
  rice + chicken = poultry
  milk + cereal = grain
  blue - yellow = Fiji
  blue - Fiji = orange
  blue - Arkansas + Bahamas + Florida - Pluto = Grenada

replies(2): >>43992397 #>>43995321 #

57. bee_rider ◴[14 May 25 21:53 UTC] No.43989604[source]▶

>>43989205 #

The second makes sense, I think, if you are a bee.

replies(1): >>43992811 #

58. kylecazar ◴[14 May 25 22:00 UTC] No.43989647[source]▶

>>43988533 (OP) #

Woman + president = man

59. gregschlom ◴[14 May 25 22:05 UTC] No.43989687[source]▶

>>43989512 #

Also, as I just learned the other day, the result was never equal, just close to "queen" in the vector space.

replies(2): >>43990990 #>>43992066 #

60. halter73 ◴[14 May 25 22:06 UTC] No.43989697[source]▶

>>43989509 #

I'm not sure this is old enough, but could you be referencing https://neal.fun/infinite-craft/ from https://news.ycombinator.com/item?id=39205020?

replies(1): >>43992996 #

61. HWR_14 ◴[14 May 25 22:20 UTC] No.43989784[source]▶

>>43989584 #

Your aunt's uncle is your great-uncle. It's more correct than your intuition.

replies(1): >>43989911 #

62. n2d4 ◴[14 May 25 22:28 UTC] No.43989843[source]▶

>>43989245 #

For fun, I pasted these into ChatGPT o4-mini-high and asked it for an opinion:

   data + plural    = datasets
   data - plural    = datum
   king - crown     = ruler
   king - princess  = man
   king - queen     = prince
   queen - king     = woman
   king + queen     = royalty
   boy + age        = man
   man - age        = boy
   woman - age      = girl
   woman + age      = elderly woman
   girl + age       = woman
   girl + old       = grandmother

The results are surprisingly good, I don't think I could've done better as a human. But keep in mind that this doesn't do embedding math like OP! Although it does show how generic LLMs can solve some tasks better than traditional NLP.

The prompt I used:

> Remember those "semantic calculators" with AI embeddings? Like "king - man + woman = queen"? Pretend you're a semantic calculator, and give me the results for the following:

replies(5): >>43989988 #>>43990061 #>>43990761 #>>43991235 #>>43998165 #

63. tlhunter ◴[14 May 25 22:36 UTC] No.43989897[source]▶

>>43988533 (OP) #

man + woman = adult female body

64. thaumasiotes ◴[14 May 25 22:36 UTC] No.43989904[source]▶

>>43988592 #

I went to look at infinite craft.

It provides a panel filled with slowly moving dots. Right of the panel, there are objects labeled "water", "fire", "wind", and "earth" that you can instantiate on the panel and drag around. As you drag them, the background dots, if nearby, will grow lines connecting to them. These lines are not persistent.

And that's it. Nothing ever happens, there are no interactions except for the lines that appear while you're holding the mouse down, and while there is notionally a help window listing the controls, the only controls are "select item", "delete item", and "duplicate item". There is also an "about" panel, which contains no information.

replies(1): >>43989914 #

65. matallo ◴[14 May 25 22:38 UTC] No.43989911{3}[source]▶

>>43989784 #

I asked ChatGPT (after posting my comment) and this is the response. "Uncle + Aunt = Great-Uncle is incorrect. A great-uncle is the brother of your grandparent."

66. n2d4 ◴[14 May 25 22:38 UTC] No.43989914{3}[source]▶

>>43989904 #

In the panel, you can drag one of the items (eg. Water) onto another one (eg. Earth), and it will create a new word (eg. Plant). It uses AI, so it goes very deep

replies(1): >>43989925 #

67. mhitza ◴[14 May 25 22:38 UTC] No.43989917[source]▶

>>43989569 #

Semantical subtraction within embeddings representation of text ("meaning")

68. thaumasiotes ◴[14 May 25 22:40 UTC] No.43989925{4}[source]▶

>>43989914 #

No, that was the first thing I tried. The only thing that happens is that the two objects will now share their location. There are no interactions.

replies(3): >>43989944 #>>43993230 #>>43993307 #

69. Retr0id ◴[14 May 25 22:42 UTC] No.43989933[source]▶

>>43989512 #

I think it's slightly uncommon for the vectors to "line up" just right, but here are a few I tried:

actor - man + woman = actress

garden + person = gardener

rat - sewer + tree = squirrel

toe - leg + arm = digit

70. dmonitor ◴[14 May 25 22:43 UTC] No.43989943{3}[source]▶

>>43988964 #

I'm sure it would be trivial to get it to say something incredibly racist, so that's probably a worthwhile disclaimer to put on the website

71. n2d4 ◴[14 May 25 22:43 UTC] No.43989944{5}[source]▶

>>43989925 #

Probably a bug then, you can check YouTube to find videos of people playing it (eg. [0])

[0] https://youtu.be/8-ytx84lUK8

72. dmonitor ◴[14 May 25 22:43 UTC] No.43989949[source]▶

>>43989021 #

well green is not a creative color, so that's to be expected

73. ◴[14 May 25 22:45 UTC] No.43989957[source]▶

>>43989569 #

74. __MatrixMan__ ◴[14 May 25 22:45 UTC] No.43989961[source]▶

>>43988533 (OP) #

Here's a challenge: find something to subtract from "hammer" which does not result in a word that has "gun" as a substring. I've been unsuccessful so far.

replies(8): >>43990011 #>>43990015 #>>43990016 #>>43990019 #>>43990027 #>>43990335 #>>43990495 #>>43995910 #

75. groby_b ◴[14 May 25 22:45 UTC] No.43989963[source]▶

>>43989512 #

I think it's worth keeping in mind that word2vec was specifically trained on semantic similarity. Most embedding APIs don't really give a lick about the semantic space

And, worse, most latent spaces are decidedly non-linear. And so arithmetic loses a lot of its meaning. (IIRC word2vec mostly avoided nonlinearity except for the loss function). Yes, the distance metric sort-of survives, but addition/multiplication are meaningless.

(This is also the reason choosing your embedding model is a hard-to-reverse technical decision - you can't just transform existing embeddings into a different latent space. A change means "reembed all")

76. neom ◴[14 May 25 22:46 UTC] No.43989967[source]▶

>>43988533 (OP) #

cool but not enough data to be useful yet I guess. Most of mine either didn't have the words or were a few % off the answer, vehicle - road + ocean gave me hydrosphere, but the other options below were boat, ship, etc. Klimt almost made it from Mozart - music + painting. doctor - hospital + school = teacher, nailed it.

Getting to cornbread elegantly has been challenging.

77. nbardy ◴[14 May 25 22:49 UTC] No.43989988{3}[source]▶

>>43989843 #

I hate to be pedantic, but the llm is definitely doing embedding math. In fact that’s all it does.

replies(1): >>43991466 #

78. gweinberg ◴[14 May 25 22:50 UTC] No.43989994[source]▶

>>43989245 #

I got a bunch of red stuff also. I imagine the author cached embeddings for some words but not really all that many to save on credits. I gave it mermaid - woman and got merman, but when I tried to give it boar + woman - man or ram + woman - man, it turns out it has never heard of rams or boars.

79. downboots ◴[14 May 25 22:50 UTC] No.43989996[source]▶

>>43988533 (OP) #

three + two = four (90%)

replies(1): >>43990026 #

80. thatguysaguy ◴[14 May 25 22:51 UTC] No.43990000[source]▶

>>43989245 #

Can you elaborate on what the unsolved problem you're referring to is?

replies(1): >>44000107 #

81. neom ◴[14 May 25 22:52 UTC] No.43990011[source]▶

>>43989961 #

if I'm allowed only 1 something, I can't find anything either, if I'm allowed a few somethings, "hammer - wine - beer - red - child" will get you there. Guessing given that a gun has a hammer and is also a tool, it's too heavily linked in the small dataset.

82. tough ◴[14 May 25 22:52 UTC] No.43990015[source]▶

>>43989961 #

hammer + man = adult male body (75%)

replies(1): >>43990410 #

83. Retr0id ◴[14 May 25 22:52 UTC] No.43990016[source]▶

>>43989961 #

Well that's easy, subtract "gun" :P

84. mrastro ◴[14 May 25 22:53 UTC] No.43990019[source]▶

>>43989961 #

The word "gun" itself seems to work. Package this as a game and you've got a pretty fun game on your hands :)

replies(1): >>43991561 #

85. LadyCailin ◴[14 May 25 22:54 UTC] No.43990026[source]▶

>>43989996 #

Haha, yes, this was my first thought too. It seems it’s quite bad at actual math!

86. downboots ◴[14 May 25 22:54 UTC] No.43990027[source]▶

>>43989961 #

Bullet

87. yigitkonur35 ◴[14 May 25 22:57 UTC] No.43990052[source]▶

>>43988533 (OP) #

shows how bad embeddings are in a practical way

88. franga2000 ◴[14 May 25 22:58 UTC] No.43990061{3}[source]▶

>>43989843 #

This is an LLM approximating a semantic calculator, based solely on trained-in knowledge of what that is and probably a good amount of sample output, yet somehow beating the results of a "real" semantic calculator. That's crazy!

The more I think about it the less surprised I am, but my initial thoughts were quite simply "now way" - surely an approximation of an NLP model made by another NLP model can't beat the original, but the LLM training process (and data volume) is just so much more powerful I guess...

replies(1): >>43990301 #

89. rdlw ◴[14 May 25 23:01 UTC] No.43990084[source]▶

>>43988533 (OP) #

I've always wondered if there's s way to find which vectors are most important in a model like this. The gender vector man-woman or woman-man is the one always used in examples, since English has many gendered terms, but I wonder if it's possible to generate these pairs given the data. Maybe to list all differences of pairs of vectors, and see if there are any clusters. I imagine some grammatical features would show up, like the plurality vector people-person, or the past tense vector walked-walk, but maybe there would be some that are surprisingly common but don't seem to map cleanly to an obvious concept.

Or maybe they would all be completely inscrutable and man-woman would be like the 50th strongest result.

90. Jimmc414 ◴[14 May 25 23:18 UTC] No.43990190[source]▶

>>43988533 (OP) #

dog - cat = paleolith

paleolith + cat = Paleolithic Age

paleolith + dog = Paleolithic Age

paleolith - cat = neolith

paleolith - dog = hand ax

cat - dog = meow

Wonder if some of the math is off or I am not using this properly

replies(1): >>43999589 #

91. mathgradthrow ◴[14 May 25 23:29 UTC] No.43990270[source]▶

>>43989245 #

Distance is extremely well defined in high dimensional spaces. That isn't the problem.

replies(1): >>43991575 #

92. CamperBob2 ◴[14 May 25 23:34 UTC] No.43990301{4}[source]▶

>>43990061 #

This is basically the whole idea behind the transformer. Attention is much more powerful than embedding alone.

replies(1): >>43991277 #

93. downboots ◴[14 May 25 23:35 UTC] No.43990304[source]▶

>>43988533 (OP) #

mathematics - Santa Claus = applied mathematics

hacker - code = professional golf

94. aniviacat ◴[14 May 25 23:39 UTC] No.43990335[source]▶

>>43989961 #

Gun related stuff works: bullet, holster, barrel

Other stuff that works: key, door, lock, smooth

Some words that result in "flintlock": violence, anger, swing, hit, impact

95. quantum_state ◴[14 May 25 23:48 UTC] No.43990384[source]▶

>>43988533 (OP) #

The app produces nonsense ... such as quantum - superposition = quantum theory !!!

96. rdlw ◴[14 May 25 23:52 UTC] No.43990410{3}[source]▶

>>43990015 #

Close, that's addition

97. jbjbjbjb ◴[14 May 25 23:52 UTC] No.43990416[source]▶

>>43989512 #

Well when it works out it is quite satisfying

India - Asia + Europe = Italy

Japan - Asia + Europe = Netherlands

China - Asia + Europe = Soviet-Union

Russia - Asia + Europe = European Russia

calculation + machine = computer

replies(2): >>43991280 #>>43993483 #

98. soxfox42 ◴[15 May 25 00:06 UTC] No.43990495[source]▶

>>43989961 #

hammer - red = lock

99. nxa ◴[15 May 25 00:11 UTC] No.43990517[source]▶

>>43988533 (OP) #

artificial intelligence - bullsh*t = computer science (34%)

replies(1): >>43990545 #

100. behnamoh ◴[15 May 25 00:17 UTC] No.43990545[source]▶

>>43990517 #

This. I'm tired of so many "it's over, shocking, game changer, it's so over, we're so back" announcements that turn out to be just gpt-wrappers or resume-builder projects.

Very few papers that actually say something meaningful are left unnoticed, but as soon as you say something generic like "language models can do this", it gets featured in "AI influencer" posts.

101. galaxyLogic ◴[15 May 25 00:20 UTC] No.43990566[source]▶

>>43988533 (OP) #

What about starting with the result and finding set of words that when summed together give that result?

That could be seen as trying to find the true "meaning" of a word.

102. GrantMoyer ◴[15 May 25 00:32 UTC] No.43990625[source]▶

>>43988533 (OP) #

These are pretty good results. I messed around with a dumber and more naive version of this a few years ago[1], and it wasn't easy to get sensinble output most of the time.

[1]: https://github.com/GrantMoyer/word_alignment

103. bee_rider ◴[15 May 25 00:37 UTC] No.43990646[source]▶

>>43989512 #

Hmm, well I got

    cherry - picker = blackwood

if that helps.

104. refulgentis ◴[15 May 25 00:54 UTC] No.43990761{3}[source]▶

>>43989843 #

...welcome to ChatGPT, everyone! If you've been asleep since...2022?

(some might say all an LLM does is embeddings :)

105. charcircuit ◴[15 May 25 01:37 UTC] No.43990990{3}[source]▶

>>43989687 #

And queen isn't even the closest.

replies(1): >>43991033 #

106. mcswell ◴[15 May 25 01:42 UTC] No.43991033{4}[source]▶

>>43990990 #

What is the closest?

replies(2): >>43991436 #>>43991529 #

107. e____g ◴[15 May 25 01:46 UTC] No.43991058[source]▶

>>43988533 (OP) #

man - intelligence = woman (36%)

woman + intelligence = man (77%)

Oof.

108. maxcomperatore ◴[15 May 25 02:03 UTC] No.43991160[source]▶

>>43988533 (OP) #

Just use a LLM api to generate results, it will be far better and more accurate than a weird home cooked algorithm

109. godelski ◴[15 May 25 02:21 UTC] No.43991235{3}[source]▶

>>43989843 #

  > The results are surprisingly good, I don't think I could've done better as a human

I'm actually surprised that the performance is so poor and would expect a human to do much better. The GPT model has embedding PLUS a whole transformer model that can untangle the embedded structure.

To clarify some of the issues:

  data is both singular and plural, being a mass noun[0,1]. Datum is something you'll find in the dictionary, but not common in use[2]. The dictionary lags actual definitions. I mean words only mean what we collectively agree they mean (dictionary definitely helps with that but we also invent words all the time -- i.e. slang). I see how this one could trick up a human, feeling the need to change the output and would likely consult a dictionary but I don't think that's a fair comparison here as LLMs don't have these same biases.

  King - crown really seems like it should be something like "man" or "person". The crown is the manifestation of the ruling power. We still use phrases like "heavy is the head that wears the crown" in reference to general leaders, not just monarchs.

  king - princess I honestly don't know what to expect. Man is technically gender neutral so I'll take this one.

  king - queen I would expect similar outputs to the previous one. Don't quite agree here.

  queen - king I get why is removing royalty but given the previous (two) results I think is showing a weird gender bias. Remember that queen is something like (woman + crown) and king is akin to (man + crown). So subtracting should be woman - man. 

  The others I agree with. These were actually done because I was quite surprised at the results and was thinking about the aforementioned gender bias.

  > But keep in mind that this doesn't do embedding math like OP!

I think you are misunderstanding the architecture of these models. The embedding sub-network is the translation of text to numeric tokens. You'll find mention of the embedding sub-networks in both the GPT3[3] and GPT4 papers. Though they are given lower importance than other works. While much smaller than the main network, don't forget that embedding networks are still quite large. For the smaller models they constitute a significant part of the total parameter count[4]

After the embedding sub-network is your main transformer network. The purpose of this network is to perform embedding math! It is just that the goal is to do significantly more complicated math. Remember, these are learnable mappings (see Optimal Transport). We're just breaking it down into their two main intermediate mappings. But the embeddings still end up being a bottleneck. It is your literal gateway from words to numbers.

[0] https://en.wikipedia.org/wiki/Mass_noun

[1] https://www.merriam-webster.com/dictionary/data

[2] https://www.sciotoanalysis.com/news/2023/1/18/this-data-or-t...

[3] https://arxiv.org/abs/2005.14165

[4] https://arxiv.org/abs/2303.08774

[4] https://www.lesswrong.com/posts/3duR8CrvcHywrnhLo/how-does-g...

replies(3): >>43991448 #>>43992037 #>>43995084 #

110. godelski ◴[15 May 25 02:33 UTC] No.43991277{5}[source]▶

>>43990301 #

The transformers are initialized by embedding models...

Your embedding model is literally the translation layer converting the text to numbers. The transformers are the main processing unit of the embeddings. You can even see some self-reflection in the model as the transformer is composed of attention and a MLP sub-network. The attention mechanism generates the interrelational dependence of the data and the MLP projects up into a higher dimension before coming down so that this can untangle these relationships. But the idea is that you just repeat this process over and over. The attention mechanism has the benefit over CNN models because it has a larger receptive field, so can better process long range relationships (long range being across the input data) where CNNs bias for local relationships.

111. trhway ◴[15 May 25 02:34 UTC] No.43991280{3}[source]▶

>>43990416 #

democracy - vote = progressivism

I'll have to mediate on that.

replies(1): >>43992018 #

112. ◴[15 May 25 03:07 UTC] No.43991436{5}[source]▶

>>43991033 #

113. n2d4 ◴[15 May 25 03:10 UTC] No.43991448{4}[source]▶

>>43991235 #

You are being unnecessarily cynical. These are all subjective. I thought "datum" and "datasets" was quite clever, and while I would've chosen "man" for "king - crown" myself, I actually find "ruler" a better solution after seeing it. But each to their own.

The rant about network architecture misses my point, which is that an LLM does not just do a linear transformation and a similarity search. Sure, in the most abstract sense it still just computes an output embedding from two input embeddings, but only in a very distant, pedantic way. (Actually, to be VERY pedantic, that would not even be true, because ChatGPT's tokenizer embeds tokens, not words. The in- and output of the model is more than just the semantic embedding of words; using two different but semantically equivalent words may result in different outputs with a transformer LLM, but not in a word semantics model.)

I just thought it was cool that ChatGPT is so good at it.

replies(1): >>43991689 #

114. n2d4 ◴[15 May 25 03:14 UTC] No.43991466{4}[source]▶

>>43989988 #

Sure! Although I think we both agree that the way those embeddings are transformed is significantly different ;)

(what I meant to say is that it doesn't do embedding math "LIKE" the OP — not that it doesn't do embedding math at all.)

replies(1): >>43993755 #

115. charcircuit ◴[15 May 25 03:23 UTC] No.43991529{5}[source]▶

>>43991033 #

Usually king is.

replies(2): >>43992891 #>>43992920 #

116. __MatrixMan__ ◴[15 May 25 03:29 UTC] No.43991561{3}[source]▶

>>43990019 #

Doh why didn't I think of that

117. godelski ◴[15 May 25 03:32 UTC] No.43991575{3}[source]▶

>>43990270 #

Would you care to elaborate? To clarify, I mean that variance reduces as dimensionality increases

118. godelski ◴[15 May 25 03:54 UTC] No.43991689{5}[source]▶

>>43991448 #

I'm an engineer and researcher, it is my job to find problems, so that they can be resolved. I'd say this is different from being cynical as that tends to be dismissive. I understand how my comment can come off that way, though it wasn't my intention, so I'm clarifying.

You're right that there's subjectivity but not infinitely so. There is a bound to this and that's both required for language to work and for us to build these models. I did agree that the data one was tricky so not really going to argue, I was just pointing out a critical detail given that the models learn through pattern matching rather than a dictionary. It's why I made the comment about humans. As for ruler minus crown, I gave my explication, would you care to share yours? I'd like to understand your point of view so I can better my interpretation of the results, because frankly I don't understand. What is the semantic relationship being changed if not the attribute of ruler?

The architecture part was a miscommunication. I hope you understand how I misunderstood you when you said "this doesn't do embedding math like OP!". It is clear I'm not alone either.

  > Actually, to be VERY pedantic, that would not even be true, because ChatGPT's tokenizer embeds tokens, not words.

To be pedantic, people generally refer to the tokenization and embedding simply as embedding. It's the common verbiage. This is because with BPE you are performing these steps simultaneously and the term is appropriate given the longer usage in math.

I was just trying to help you understand a different viewpoint.

119. hagen_dogs ◴[15 May 25 03:56 UTC] No.43991701[source]▶

>>43988533 (OP) #

fluid + liquid = solid (85%) -- didn't expect that

blue + red = yellow (87%) -- rgb, neat

black + {red,blue,yellow,green} = white 83% -- weird

replies(1): >>43992563 #

120. doubtfuluser ◴[15 May 25 03:57 UTC] No.43991710[source]▶

>>43988533 (OP) #

doctor - man + woman = medical practitioner

Good to understand this bias before blindly applying these models (Yes- doctor is gender neutral - even women can be doctors!!)

replies(1): >>43991724 #

121. heyitsguay ◴[15 May 25 03:59 UTC] No.43991724[source]▶

>>43991710 #

Fwiw, doctor - woman + man = medical practitioner too

122. erulabs ◴[15 May 25 04:43 UTC] No.43991921[source]▶

>>43988533 (OP) #

dog - fur = Aegean civilization (22%)

huh

123. havkom ◴[15 May 25 04:49 UTC] No.43991944[source]▶

>>43988533 (OP) #

I tried:

-red

and:

red-red-red

But it did not work and did not get any response. Maybe I am stupid but should this not work?

124. blipvert ◴[15 May 25 05:08 UTC] No.43992018{4}[source]▶

>>43991280 #

person + man + woman + camera + television = user

125. Sharlin ◴[15 May 25 05:11 UTC] No.43992037{4}[source]▶

>>43991235 #

"King-crown=ruler" is IMO absolutely apt. Arguing that "crown" can be used metaphorically is a bit disingenuous because first, it's very rarely applied to non-monarchs, and is a very physical, concrete symbol of power that separates monarchs from other rulers.

"King-princess=man" can be thought to subtract the "royalty" part of "king"; "man" is just as good an answer as any else.

"King-queen=prince" I'd think of as subtracting "ruler" from "king", leaving a male non-ruling member of royalty. "gender-unspecified non-ruling royal" would be even better, but there's no word for that in English.

replies(2): >>43992614 #>>43996501 #

126. chis ◴[15 May 25 05:17 UTC] No.43992066{3}[source]▶

>>43989687 #

I mean they are floating point vectors so

127. sdeframond ◴[15 May 25 05:27 UTC] No.43992122[source]▶

>>43989245 #

Such results are inherently limited because a same word can have different meanings depending on context.

The role of the Attention Layer in LLMs is to give each token a better embedding by accounting for context.

128. C-x_C-f ◴[15 May 25 06:33 UTC] No.43992397[source]▶

>>43989603 #

I don't want to dump too many but I found

   chess - checkers = wormseed mustard (63%)

pretty funny and very hard to understand. All the other options are hyperspecific grasslike plants like meadow salsify.

replies(1): >>43992724 #

129. Finbel ◴[15 May 25 06:38 UTC] No.43992422[source]▶

>>43988533 (OP) #

London-England+France=Maupassant

130. atum47 ◴[15 May 25 06:58 UTC] No.43992518[source]▶

>>43988533 (OP) #

horse+man

78% male horse 72% horseman

131. darepublic ◴[15 May 25 07:07 UTC] No.43992555[source]▶

>>43988533 (OP) #

man - courage = husband

132. moefh ◴[15 May 25 07:09 UTC] No.43992563[source]▶

>>43991701 #

> blue + red = yellow (87%) -- rgb, neat

Blue + red is magenta. Yellow would be red + green.

None of these results make much sense to me.

133. godelski ◴[15 May 25 07:18 UTC] No.43992614{5}[source]▶

>>43992037 #

  > it's very rarely applied to non-monarchs

I take your point but highly disagree that it's disingenuous to view this metaphorically. The crown has always been a symbol of the seat of power and that usage dates back centuries. I've seen it commonly used to refer to leadership in general. Actually more often.

  - https://en.wikipedia.org/wiki/Heavy_Lies_the_Crown
  - https://en.wikipedia.org/wiki/Heavy_Is_the_Head

Notably even in the usage of Henry IV that the idiom draws from is using it in the metaphorical sense, despite also talking about a ruler so would wear a literal crown. There's similar frequent usage in widely popular shows like Game of Thrones. So I hope you can see why I really do not think it's fair to call me disingenuous. The metaphorical usage is extremely common.

I'll buy the king price relationship. That's fair. But it also seems to be in disagreement from the king queen one.

134. dtj1123 ◴[15 May 25 07:28 UTC] No.43992670[source]▶

>>43988533 (OP) #

"man-intelligence=woman" is a particularly interesting result.

135. ainiriand ◴[15 May 25 07:36 UTC] No.43992710[source]▶

>>43988533 (OP) #

dog+woman = man

That's weird.

136. ccppurcell ◴[15 May 25 07:40 UTC] No.43992724{3}[source]▶

>>43992397 #

My philosophical take on it is that natural language has many many more dimensions than we could hope to represent. Whenever you do dimension reduction you lose information.

137. neom ◴[15 May 25 07:52 UTC] No.43992811{3}[source]▶

>>43989604 #

So, are you a bee keeper then?

138. KeplerBoy ◴[15 May 25 08:09 UTC] No.43992891{6}[source]▶

>>43991529 #

That would be hilariously disappointing.

139. Narew ◴[15 May 25 08:15 UTC] No.43992920{6}[source]▶

>>43991529 #

yes and it's only work because we prevent the output to be in the input.

140. afandian ◴[15 May 25 08:28 UTC] No.43992996{3}[source]▶

>>43989697 #

Thanks, no it wasn't that, it was a basic HTML form.

141. mkl ◴[15 May 25 09:19 UTC] No.43993230{5}[source]▶

>>43989925 #

There are definitely interactions. https://news.ycombinator.com/item?id=39205020

142. bluelightning2k ◴[15 May 25 09:29 UTC] No.43993284[source]▶

>>43988533 (OP) #

potato + microwave = potato tree

143. tiborsaas ◴[15 May 25 09:31 UTC] No.43993288[source]▶

>>43988533 (OP) #

I've tried to get to "garage", but failed at a few attempts, ChatGPT's ideas also seemed reasonable, but failed. Any takers? :)

replies(1): >>43996633 #

144. gaoryrt ◴[15 May 25 09:34 UTC] No.43993307{5}[source]▶

>>43989925 #

After turning off adblock everything goes well.

145. anonu ◴[15 May 25 10:08 UTC] No.43993481[source]▶

>>43988533 (OP) #

Reminds me of the very annoying word game https://contexto.me/en/

146. kgeist ◴[15 May 25 10:08 UTC] No.43993483{3}[source]▶

>>43990416 #

Interesting:

  Russia - Europe = Putin
  Ukraine + Putin = Russia
  Putin - Stalin = Bush
  Stalin - purge = Lenin

That means Bush = Ukraine+Putin-Europe-Lenin-purge.

However, the site gives Bush -4%, second best option (best is -2%, "fleet ballistic missile submarine", not sure what negative numbers mean).

replies(1): >>43997888 #

147. hello_computer ◴[15 May 25 10:28 UTC] No.43993571[source]▶

>>43988533 (OP) #

doesn’t do anything on my iphone

148. ignat_244639 ◴[15 May 25 10:46 UTC] No.43993647[source]▶

>>43988533 (OP) #

Huh, that's strange, I wanted to check whether your embeddings have biases, but I cannot use "white" word at all. So I cannot get answer to "man - white + black = ?".

But if I assume the biased answer and rearrange the operands, I get "man - criminal + black = white". Which clearly shows, how biased your embeddings are!

Funny thing, fixing biases and ways to circumvent the fixes (while keeping good UX) might be much challenging task :)

149. coolcase ◴[15 May 25 11:02 UTC] No.43993734[source]▶

>>43988533 (OP) #

Oh you have all the damn words. Even the Ricky Gervais ones.

150. clbrmbr ◴[15 May 25 11:04 UTC] No.43993740[source]▶

>>43988533 (OP) #

A few favorites:

wine - beer = grape juice

beer - wine = bowling

astrology - astronomy + mathematics = arithmancy

151. coolcase ◴[15 May 25 11:06 UTC] No.43993755{5}[source]▶

>>43991466 #

Yeah we'd be impressed if an LLM calculated the product of a couple of 1000x1000 matrices.

152. krishna-vakx ◴[15 May 25 11:27 UTC] No.43993885[source]▶

>>43988533 (OP) #

for founders :

love + time = commitment

boredom + curiosity = exploration

vision + execution = innovation

resilience - fear = courage

ambition + humility = leadership

failure + reflection = learning

knowledge + application = wisdom

feedback + openness = improvement

experience - ego = mastery

idea + validation = product-market fit

153. ale42 ◴[15 May 25 12:15 UTC] No.43994225[source]▶

>>43988533 (OP) #

Not what it's meant for, I guess, but it's not very strong at chemistry ;-)

  salt - chlorine + potassium = sodium
  chlorine + sodium = rubidium
  water - hydrogen = tap water

It also has some other interesting outputs:

  woman + man = adult female body (already reported by someone else)
  man - hand = woman
  woman - hand = businesswoman
  businessman - male + female = industrialist
  telephone + antenna = television equipment
  olive oil - oil = hearth money

154. wdutch ◴[15 May 25 12:54 UTC] No.43994524[source]▶

>>43988533 (OP) #

It's interesting that I find loops. For example

car + stupid = idiot, car + idiot = stupid

155. virgilp ◴[15 May 25 13:34 UTC] No.43994931[source]▶

>>43989245 #

hacker+news-startup = golfer

156. drabbiticus ◴[15 May 25 13:50 UTC] No.43995084{4}[source]▶

>>43991235 #

The specific cherry-picked examples from GP make sense to me.

   data + plural    = datasets 
   data - plural    = datum

If +/- plural can be taken to mean "make explicitly plural or singular", then this roughly works.

   king - crown     = ruler

Rearrange (because embeddings are just vector math), and you get "king = ruler + crown". Yes, a king is a ruler who has a crown.

   king - princess  = man

This isn't great, I'll grant, but there are many YA novels where someone becomes king (eventually) through marriage to a princess, or there is intrigue for the princess's hand for reasons of kingly succession, so "king = man + princess" roughly works.

   king - queen     = prince
   queen - king     = woman

I agree it's hard to make sense of "king - queen = prince". "A queen is a woman king" is often how queens are described to young children. In Chinese, it's actually the literal breakdown of 女王. I also agree there's a gender bias, but also literally everything about LLMs and various AI trained on large human-generated data encodes the bias of how we actually use language and thought patterns. It's one of the big concerns of those in the civil liberties space. Search "llm discrimination" or similar for more on this.

Playing around with age/time related gives a lot of interesting results:

    adult + age = adulthood
    child + age = female child
    year + age = chronological age
    time + year = day
    child + old = today
    adult - old = adult body
    adult - age = powerhouse
    adult - year = man

I think a lot of words are hard to distill into a single embedding. A word may embed a number of conceptually distinct definitions, but my (incomplete) understanding of embeddings is that they are not context-sensitive, right? So averaging those distinct definitions through 1 label is probably fraught with problems when trying to do meaningful vector math with them that context/attention are able to help with.

[EDIT:formatting is hard without preview]

157. mannykannot ◴[15 May 25 13:59 UTC] No.43995170[source]▶

>>43988533 (OP) #

Now I'm wondering if this could be helpful in doing the NY Times Connections puzzle.

158. loganmhb ◴[15 May 25 14:09 UTC] No.43995277[source]▶

>>43989512 #

I once saw an explanation which I can no longer find that what's really happening here is also partly "man" and "woman" are very similar vectors which nearly cancel each other out, and "king" is excluded from the result set to avoid returning identities, leaving "queen" as the closest next result. That's why you have to subtract and then add, and just doing single operations doesn't work very well. There's some semantic information preserved that might nudge it in the right direction but not as much as the naive algebra suggests, and you can't really add up a bunch of these high-dimensional vectors in a sensible way.

E.g. in this calculator "man - king + princess = woman", which doesn't make much sense. "airplane - engine", which has a potential sensible answer of "glider", instead "= Czechoslovakia". Go figure.

159. ActionHank ◴[15 May 25 14:15 UTC] No.43995321[source]▶

>>43989603 #

dog - fur = Aegean civilization

160. cosmicgadget ◴[15 May 25 14:33 UTC] No.43995511[source]▶

>>43988533 (OP) #

  car + dragon = panzer

161. jryb ◴[15 May 25 14:39 UTC] No.43995568[source]▶

>>43988533 (OP) #

Just inverting the canonical example fails: queen - woman + man = drone

replies(1): >>43995946 #

162. andrelaszlo ◴[15 May 25 15:10 UTC] No.43995853[source]▶

>>43988533 (OP) #

    hand - arm + leg = vertebrate foot
    snowman - man =  snowflake
    snowman - snow = snowbank

163. ttctciyf ◴[15 May 25 15:15 UTC] No.43995910[source]▶

>>43989961 #

hammer - keyboard = hammerhead

Makes no sense, admittedly!

- dulcimer and - zither are both in firmly in .*gun.* territory it seems..

164. x3y1 ◴[15 May 25 15:18 UTC] No.43995946[source]▶

>>43995568 #

This kind of makes sense for bees.

165. spinarrets ◴[15 May 25 15:42 UTC] No.43996210[source]▶

>>43988533 (OP) #

cheeseburger-giraffe+space-kidney-monkey = cheesecake

166. pjc50 ◴[15 May 25 16:01 UTC] No.43996398[source]▶

>>43989245 #

Ah yes, 女 + 子 = girl but if combined in a kanji you get 好 = like.

167. Glyptodon ◴[15 May 25 16:03 UTC] No.43996421[source]▶

>>43988533 (OP) #

Car - Wheel(s) doesn't really have results I'd guess at (boat, sled, etc.). Just specific four wheeled vehicles.

168. FabHK ◴[15 May 25 16:14 UTC] No.43996501{5}[source]▶

>>43992037 #

“King - queen = male” strikes me as logical, if we take king = (+human, +male, +royal), and queen = (+human, -male, +royal), then the difference is (0human, 2male, 0royal).

169. mynameajeff ◴[15 May 25 16:26 UTC] No.43996633[source]▶

>>43993288 #

"car + house + door" worked for me (interestingly "car + home + door" did not)

replies(1): >>44000933 #

170. insane_dreamer ◴[15 May 25 16:50 UTC] No.43996877[source]▶

>>43988533 (OP) #

carbon + oxygen = nitrogen

LOL

171. nxa ◴[15 May 25 18:31 UTC] No.43997888{4}[source]▶

>>43993483 #

My interpretation of negative numbers is that no "synonym" was found (no vector pointing in the same direction), and that the closest expression on record is something with an opposite meaning (pointing in reverse direction), so I'd say that's an antonym.

172. amdivia ◴[15 May 25 18:58 UTC] No.43998165{3}[source]▶

>>43989843 #

Can you do the same but each line is done in a seperate context?

173. Glyptodon ◴[15 May 25 21:40 UTC] No.43999589[source]▶

>>43990190 #

I figure the mathematically highest value must defer from the semantically most accurate relatively frequently. (Because Car - Wheel = Touring Car doesn't make a lot of sense to me.)

174. godelski ◴[15 May 25 22:44 UTC] No.44000107{3}[source]▶

>>43990000 #

Dealing with metrics in high dimensions. As you increase dimensionality the variance decreases, leading to indistinguishablity.

You can get some help in high dimensions when you're more concerned with (clearly disjoint) clusters. But this is akin to doing a dimensional reduction, treating independent clusters as individual points. (Say we have set S which has disjoint subsets {S_0,...,S_n}, your new set is now {a_0,...,a_n}, where each a_i is an element representing all elements in S_i. Think like "set of sets") But you do not get help with interrelationships (i.e. d(s_x,s_y) \in S_i \forall x≠y) and I think you can gather that when clusters are not clearly disjoint then we're in the same situation as trying to differentiate inter-cluster.

Understanding this can help you understand why these models (including LLMs) are good in broader concepts like differentiating between obvious things but struggle more in nuance. A good litmus test is to ask them about any subject you have good deep knowledge in. Essentially test yourself for Murray-Gelmann Amnesia. The things are designed for human preference. When they fail they're likely to fail without warning (i.e. in ways that are not so obvious)

175. charlieyu1 ◴[16 May 25 00:50 UTC] No.44000804[source]▶

>>43989245 #

I think you need to do A-B+C types? A+B or A-B wouldn’t make much sense when the magnitude changes

176. tiborsaas ◴[16 May 25 01:14 UTC] No.44000933{3}[source]▶

>>43996633 #

Thanks, nice :) House sounds more general, I guess.

I've had some fun finding this:

    car - move + shape = car wheel

↑