Show HN: Semantic Calculator (king-man+woman=?)

1. spindump8930 ◴[14 May 25 20:49 UTC] No.43989060[source]▶

First off, this interface is very nice and a pleasure to use, congrats!

Are you using word2vec for these, or embeddings from another model?

I also wanted to add some flavor since it looks like many folks in this thread haven't seen something like this - it's been known since 2013 that we can do this (but it's great to remind folks especially with all the "modern" interest in NLP).

It's also known (in some circles!) that a lot of these vector arithmetic things need some tricks to really shine. For example, excluding the words already present in the query[1]. Others in this thread seem surprised at some of the biases present - there's also a long history of work on that [2,3].

[1] https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935...

[2] https://arxiv.org/abs/1905.09866

[3] https://arxiv.org/abs/1903.03862

replies(1): >>43989121 #

2. nxa ◴[14 May 25 20:56 UTC] No.43989121[source]▶

>>43989060 (TP) #

Thank you! I actually had a hard time finding prior work on this, so I appreciate the references.

The dictionary is based on https://wordnet.princeton.edu/, no word2vec. It's just a plain lookup among precomputed embeddings (with mxbai-embed-large). And yes, I'm excluding words that are present in the query because.

It would be interesting to see how other models perform. I tried one (forgot the name) that was focused on coding, and it didn't perform nearly as well (in terms of human joy from the results).

replies(1): >>43989426 #

3. kaycebasques ◴[14 May 25 21:29 UTC] No.43989426[source]▶

>>43989121 #

(Question for anyone) how could I go about replicating this with Gemini Embedding? Generate and store an embedding for every word in the dictionary?

replies(1): >>43989502 #

4. nxa ◴[14 May 25 21:40 UTC] No.43989502{3}[source]▶

>>43989426 #

Yes, that's pretty much what it is. Watch out for homographs.