Embeddings are underrated (2024)

(technicalwriting.dev)

484 points jxmorris12 | 1 comments | 12 May 25 15:05 UTC | HN request time: 0.195s | source

Show context

tyho ◴[12 May 25 15:51 UTC] No.43964392[source]▶

> The 2D map analogy was a nice stepping stone for building intuition but now we need to cast it aside, because embeddings operate in hundreds or thousands of dimensions. It’s impossible for us lowly 3-dimensional creatures to visualize what “distance” looks like in 1000 dimensions. Also, we don’t know what each dimension represents, hence the section heading “Very weird multi-dimensional space”.5 One dimension might represent something close to color. The king - man + woman ≈ queen anecdote suggests that these models contain a dimension with some notion of gender. And so on. Well Dude, we just don’t know.

nit. This suggests that the model contains a direction with some notion of gender, not a dimension. Direction and dimension appear to be inextricably linked by definition, but with some handwavy maths, you find that the number of nearly orthogonal dimensions within n dimensional space is exponential with regards to n. This helps explain why spaces on the order of 1k dimensions can "fit" billions of concepts.

replies(12): >>43964509 #>>43964649 #>>43964659 #>>43964705 #>>43964934 #>>43965081 #>>43965183 #>>43965258 #>>43965725 #>>43965971 #>>43966531 #>>43967165 #

PaulHoule ◴[12 May 25 16:15 UTC] No.43964659[source]▶

>>43964392 #

Note you don't see arXiv papers where somebody feeds in 1000 male gendered words into a word embedding and gets 950 correct female gendered words. Statistically it does better than chance, but word embeddings don't do very well.

https://nlp.stanford.edu/projects/glove/

there are a number of graphs where they have about N=20 points that seem to fall in "the right place" but there are a lot of dimensions involved and with 50 dimensions to play with you can always find a projection that makes the 20 points fall exactly where you want them fall. If you try experiments with N>100 words you go endlessly in circles and produce the kind of inconclusively negative results that people don't publish.

The BERT-like and other transformer embeddings far outperform word vectors because they can take into account the context of the word. For instance you can't really build a "part of speech" classifier that can tell you "red" is an adjective because it is also a noun, but give it the context and you can.

In the context of full text search, bringing in synonyms is a mixed bag because a word might have 2 or 3 meanings and the the irrelevant synonyms are... irrelevant and will bring in irrelevant documents. Modern embeddings that recognize context not only bring in synonyms but the will suppress usages of the word with different meanings, something the IR community has tried to figure out for about 50 years.

replies(4): >>43965006 #>>43965085 #>>43965683 #>>43965720 #

yorwba ◴[12 May 25 17:50 UTC] No.43965720[source]▶

>>43964659 #

> there are a lot of dimensions involved and with 50 dimensions to play with you can always find a projection that makes the 20 points fall exactly where you want them fall.

While it would certainly have been possible to choose a projection where the two groups of words are linearly separable, that isn't even the case for https://nlp.stanford.edu/projects/glove/images/man_woman.jpg : "woman" is inside the "nephew"-"man"-"earl" triangle, so there is no way to draw a line neatly dividing the masculine from the feminine words. But I think the graph wasn't intended to show individual words classified by gender, but rather to demonstrate that in pairs of related words, the difference between the feminine and masculine word vectors points in a consistent direction.

Of course that is hardly useful for anything (if you could compare unrelated words, at least you would've been able to use it to sort lists...) but I don't think the GloVe authors can be accused of having created unrealistic graphs when their graph actually very realistically shows a situation where the kind of simple linear classifier that people would've wanted doesn't exist.

replies(1): >>43967021 #

avidiax ◴[12 May 25 20:13 UTC] No.43967021[source]▶

>>43965720 #

> the two groups of words are linearly separable

This is missing the point. What we have is two dimensions* of hundreds, but those two dimensions chosen show that the vector between a masculine word and its feminine counterpart is very nearly constant, at least across these words and excluding other dimensions.

What you're saying, a line/plane/hyper-plane that separates a dimension of gender into male and female, might also exist. But since gender neutral terms also exist, we would expect that to be a plane at which gender neutral terms have a 50/50% chance of falling to either side of the plane, and ideally nearby.

* Possibly a pseudo dimension that's a composite of multiple dimensions; IDK, I didn't read the paper.

replies(1): >>43967090 #

1. tomrod ◴[12 May 25 20:23 UTC] No.43967090[source]▶

>>43967021 #

Just needs to be a separating manifold if we use the kernel trick ;)

↑