←back to thread

86 points alexop | 1 comments | | HN request time: 0s | source
Show context
dvt ◴[] No.43328174[source]
Great post, but what struck me (again, like every time I look at cos similiarity) is how unreasonably well it works. It's just one of those things that's so "weird" about our world: why would cosine similarity work in n-dimensional semantic spaces? It's so stupid simple, and it intuitively makes sense, and it works really well. Crazy cool.

I'm reminded of that old Eugene Wigner quote: "The most incomprehensible thing about the universe is that it is comprehensible."

replies(2): >>43329590 #>>43331752 #
hansvm ◴[] No.43329590[source]
That cosine distance works at all as a concept isn't terribly shocking, especially given our habit of norming everything. Cosine similarity in a unit vector space is monotonic with euclidean distance, and we're using this stuff in "select the K most relevant vectors" sorts of queries, so cosine similarity behaves identically to euclidean distance. Tack on the fact that every finite set of vectors, with your favorite metric, can be embedded in euclidean space with at most ~41% relative error (and errors that high require somewhat special circumstances, so you'd expect most real-world data to have lower errors -- plus, the error doesn't apply to every pair of points, and many will definitely have much lower error), and you're able to use normed cosine similarity somewhat reasonably on every finite set of stuff you care about, so long as you choose an appropriate embedding. All sets of things you care about in ML are finite, and the sub-metric induced by whichever infinite set you're considering works just fine for everything we've discussed, so cosine similarity is reasonable for all practical ML purposes.

It's much more interesting that almost any set of ML-adjacent vectors can be somewhat reasonably compared via cosine distance _even without_ explicitly constructing an optimal embedding. It's not at all intuitive to me that an autoencoder's interior layer should behave well with respect to cosine similarity and not have any knots or anything warping that (associated) metric's usefulness.

replies(2): >>43330042 #>>43336152 #
1. dvt ◴[] No.43330042[source]
> behaves identically to euclidean distance

Tbh, I would argue that's also pretty surprising, as Euclidean distance is notoriously unintuitive[1] (and noisy) in higher dimensions. (I guess norming does help, so that's likely a good point.)

[1] https://bib.dbvis.de/uploadedFiles/155.pdf