←back to thread

Embeddings are underrated (2024)

(technicalwriting.dev)
484 points jxmorris12 | 1 comments | | HN request time: 0.302s | source
Show context
tyho ◴[] No.43964392[source]
> The 2D map analogy was a nice stepping stone for building intuition but now we need to cast it aside, because embeddings operate in hundreds or thousands of dimensions. It’s impossible for us lowly 3-dimensional creatures to visualize what “distance” looks like in 1000 dimensions. Also, we don’t know what each dimension represents, hence the section heading “Very weird multi-dimensional space”.5 One dimension might represent something close to color. The king - man + woman ≈ queen anecdote suggests that these models contain a dimension with some notion of gender. And so on. Well Dude, we just don’t know.

nit. This suggests that the model contains a direction with some notion of gender, not a dimension. Direction and dimension appear to be inextricably linked by definition, but with some handwavy maths, you find that the number of nearly orthogonal dimensions within n dimensional space is exponential with regards to n. This helps explain why spaces on the order of 1k dimensions can "fit" billions of concepts.

replies(12): >>43964509 #>>43964649 #>>43964659 #>>43964705 #>>43964934 #>>43965081 #>>43965183 #>>43965258 #>>43965725 #>>43965971 #>>43966531 #>>43967165 #
gweinberg ◴[] No.43965183[source]
It's not at all a nit. If one of the dimensions did indeed correspond to gender, you might find "king" and "queen" pretty much only differed in one dimension. More generally, if these dimensions individually refer to human-meaningful concepts, you can find out what these concepts are just by looking at words that pretty much differ only along one dimension.
replies(2): >>43965734 #>>43975861 #
otabdeveloper4 ◴[] No.43965734[source]
That's the layman intuition, but actual models can give surprising results.

You can test this hypothesis with some clever LLM prompting. When I did this I got "male monarch" for "king" but "British ruler" for "queen".

Oops!

replies(1): >>43966827 #
gweinberg ◴[] No.43966827[source]
I'm sorry, I don't get your point at all, and have no idea what you mean by "did this". If you asked for an embedding, you would have gotten a 768 (or whatever) dimensional array right?
replies(1): >>43968957 #
1. kaycebasques ◴[] No.43968957[source]
For word2vec I know that there's a bunch of demos that let you do the king - man + woman computation, but I don't know how you do this with modern embeddings. https://turbomaze.github.io/word2vecjson/