How to Implement a Cosine Similarity Function in TypeScript

(alexop.dev)

1. schappim ◴[10 Mar 25 21:01 UTC] No.43326041[source]▶

I attempted to implement this on the front end of my e-commerce site, which has approximately 20,000 products (see gist [1]). My goal was to enhance search speed by performing as many local operations as possible.

Biggest impact in performance was by moving to dot products.

Regrettably, the sheer size of the index of embeddings rendered it impractical to achieve the desired results.

1. https://gist.github.com/schappim/d4a6f0b29c1ef4279543f6b6740...

replies(1): >>43326478 #

2. itishappy ◴[10 Mar 25 21:18 UTC] No.43326216[source]▶

>>43307541 (OP) #

Those are some janky diagrams. The labels are selectable, and therefore are repeatedly highlighted and un-highlighted while dragging the vector around. The "direction only" arrow prevents you from changing the magnitude, but it doesn't prevent said magnitude from changing and it does so often because the inputs are quantized but the magnitude isn't. Multiple conventions for decimals are used within the same diagram. The second diagram doesn't center the angle indicator on the actual angle. Also the "send me feedback on X" popup doesn't respond to the close button, but then disappeared when I scrolled again so maybe it did? I'm running Chrome 134.0.6998.36 for Windows 10 Enterprise 22H2 build 19045.5487.

This whole thing looks like unreviewed AI. Stylish but fundamentally broken. I haven't had a chance to dig into the meat of the article yet, but unfortunately this is distracting enough that I'm not sure I will.

Edit: I'm digging into the meat, and it's better! Fortunately, it appears accurate. Unfortunately, it's rather repetitive. There's two paragraphs discussing the meaning of -1, 0, and +1 interleaved with multiple paragraphs explaining how cosine similarity allows vectors to be compared regardless of magnitude. The motivation is spread throughout the whole thing and repetitive, and the real world examples seem similar though formatted just differently enough to make it hard to tell at a glance.

To try to offer suggestions instead of just complaining... Here's my recommended edits:

I'd move the simple English explanation to the top after the intro, then remove everything but the diagrams until you reach the example. I'd completely remove the explanation of vectors unless you're going to include an explanation of dot products. I really like the functional approach, but feel like you could combine it with the `Math.hypot` example (leave the full formula as a comment, the rest is identical), and with the full example (although it's missing the `Math.hypot` optimization). Finally, I feel like you could get away with just one real web programming example, though don't know which one I'd choose. The last section about using OpenAI for embedding and it's disclaimer is already great!

replies(1): >>43326464 #

3. alexop ◴[10 Mar 25 21:47 UTC] No.43326464[source]▶

>>43326216 #

Thank you for the good feedback. I tried to improve that. I was writing the blog post for myself to understand Cosine Similarity, which is why it's maybe a bit repetitive, but this is the best way for me to learn something. I get your point. Next time I will write it better. Good feedback - I love that.

replies(1): >>43327068 #

4. alexop ◴[10 Mar 25 21:49 UTC] No.43326478[source]▶

>>43326041 #

This looks nice. I also played on the weekend with Vue and Transformer.js to build the embeddings locally. See https://github.com/alexanderop/vue-vector-search

5. ashvardanian ◴[10 Mar 25 21:54 UTC] No.43326534[source]▶

>>43307541 (OP) #

It’s a nice post, but “using array methods” probably shouldn’t be placed in the “Efficient Implementation” section. As often happens with high-level languages, a single plain old loop is faster than three array methods.

Similarly, if you plan to query those vectors in search, you should consider continuous `TypedArray` types and smaller scalars than the double precision `number`.

I know very little about JS, but some of the amazing HackerNews community members have previously helped port SimSIMD to JavaScript (https://github.com/ashvardanian/SimSIMD), and I wrote a blog post covering some of those JS/TS-specifics, NumJS, and MathJS in 2023 (https://ashvardanian.com/posts/javascript-ai-vector-search/).

Hopefully, it should help unlock another 10-100x in performance.

replies(2): >>43327104 #>>43328246 #

6. itishappy ◴[10 Mar 25 22:54 UTC] No.43327068{3}[source]▶

>>43326464 #

Ha, when you put it that way, I can totally see why it read like that!

It looks super great now. What you have here leaves an entirely different impression, and a stylish one!

Two last suggestions:

* Now I'm thinking the Why Cosine Similarity Matters for Modern Web Development section belongs at the top, right after your intro.

* The angle indicator is still a bit wonky in the diagram. I might even take direction only mode out entirely, as you point out cosine similarity is invariant to changes in magnitude.

replies(1): >>43328266 #

7. mariusseufzer ◴[10 Mar 25 22:58 UTC] No.43327104[source]▶

>>43326534 #

Oh this is awesome! We actually have a really good use case for this, it should improve performance in our case a lot and help us stay away from WASM a bit longer. Thanks for sharing!

8. ZoomZoomZoom ◴[10 Mar 25 23:47 UTC] No.43327546[source]▶

>>43307541 (OP) #

I've just written a tag similarity calculation for a custom set of tagged documents, and if you're going to actually use the metrics, it's probably a good idea to fill a similarity table instead of recalculating it all on the spot.

While doing that, the next logical step is precalculating the squared magnitudes for each item, and in my small test case of <1000 items that sped the calculation up almost 300 times. The gains are not exponential, but economy on that constant for each pair considered is not insignificant, especially on small data sets (of course, with large sets a table won't cut it due to memory limitations and requires more sophisticated techniques).

9. ◴[11 Mar 25 00:02 UTC] No.43327662[source]▶

>>43307541 (OP) #

10. yesthisiswes ◴[11 Mar 25 00:16 UTC] No.43327760[source]▶

>>43307541 (OP) #

I liked your article. The chart with the vectors on it was cool though kinda hard to use on mobile.

I went to the typescript tag and tried to read a few other articles and got 404 errors. Just wanted to let ya know.

Nice blog and good work!

11. dvt ◴[11 Mar 25 01:19 UTC] No.43328174[source]▶

>>43307541 (OP) #

Great post, but what struck me (again, like every time I look at cos similiarity) is how unreasonably well it works. It's just one of those things that's so "weird" about our world: why would cosine similarity work in n-dimensional semantic spaces? It's so stupid simple, and it intuitively makes sense, and it works really well. Crazy cool.

I'm reminded of that old Eugene Wigner quote: "The most incomprehensible thing about the universe is that it is comprehensible."

replies(2): >>43329590 #>>43331752 #

12. jgord ◴[11 Mar 25 01:31 UTC] No.43328246[source]▶

>>43326534 #

ooh, fantastic.. thankyou !

Im doing a bunch of 3D rotation transforms on millions of points in javascript .. this looks like what I need.

replies(1): >>43331189 #

13. jgord ◴[11 Mar 25 01:35 UTC] No.43328266{4}[source]▶

>>43327068 #

I think the web animation is really useful for people new to the concept.

replies(1): >>43329207 #

14. frotaur ◴[11 Mar 25 02:02 UTC] No.43328389[source]▶

>>43307541 (OP) #

As a physicist, I always found it funny that in ML people renamed 'the angle between vectors' into something as fancy-feeling as 'cosine similarity'.

replies(6): >>43328621 #>>43329354 #>>43329714 #>>43329823 #>>43330344 #>>43331512 #

15. esafak ◴[11 Mar 25 02:38 UTC] No.43328621[source]▶

>>43328389 #

https://insights.sei.cmu.edu/blog/translating-between-statis...

16. sunami-ai ◴[11 Mar 25 03:14 UTC] No.43328823[source]▶

>>43307541 (OP) #

I do all my work in Rust now via o3-mini-high and convert to WASM... JS just for DOM and event handling. What is the point of building these CPU-bound functions in TS. Why not Rust->WASM?

replies(1): >>43328974 #

17. tills13 ◴[11 Mar 25 03:47 UTC] No.43328974[source]▶

>>43328823 #

Because op knows TS and doesn't know Rust?

replies(1): >>43329103 #

18. sunami-ai ◴[11 Mar 25 04:15 UTC] No.43329103{3}[source]▶

>>43328974 #

What I'm asking really is what is the benefit of TS if we could use decent and improving coding AIs to help us code in Rust, even if we are new to it, or never used it, and compile to WASM. Much faster execution per my experience. I mean not even comparable.

replies(2): >>43329466 #>>43330356 #

19. itishappy ◴[11 Mar 25 04:42 UTC] No.43329207{5}[source]▶

>>43328266 #

Oh, certainly, I meant remove the "Direction-only mode" toggle, not the whole animation!

20. jesse__ ◴[11 Mar 25 05:16 UTC] No.43329354[source]▶

>>43328389 #

I have a feeling it might be because the dot product of two unit length vectors is equal to the cosine of the angle between them.. but that's just a wild guess

21. tills13 ◴[11 Mar 25 05:36 UTC] No.43329466{4}[source]▶

>>43329103 #

How sad is it to want a solution but none of the journey to get there.

The future is bleak.

22. hansvm ◴[11 Mar 25 06:00 UTC] No.43329590[source]▶

>>43328174 #

That cosine distance works at all as a concept isn't terribly shocking, especially given our habit of norming everything. Cosine similarity in a unit vector space is monotonic with euclidean distance, and we're using this stuff in "select the K most relevant vectors" sorts of queries, so cosine similarity behaves identically to euclidean distance. Tack on the fact that every finite set of vectors, with your favorite metric, can be embedded in euclidean space with at most ~41% relative error (and errors that high require somewhat special circumstances, so you'd expect most real-world data to have lower errors -- plus, the error doesn't apply to every pair of points, and many will definitely have much lower error), and you're able to use normed cosine similarity somewhat reasonably on every finite set of stuff you care about, so long as you choose an appropriate embedding. All sets of things you care about in ML are finite, and the sub-metric induced by whichever infinite set you're considering works just fine for everything we've discussed, so cosine similarity is reasonable for all practical ML purposes.

It's much more interesting that almost any set of ML-adjacent vectors can be somewhat reasonably compared via cosine distance _even without_ explicitly constructing an optimal embedding. It's not at all intuitive to me that an autoencoder's interior layer should behave well with respect to cosine similarity and not have any knots or anything warping that (associated) metric's usefulness.

replies(2): >>43330042 #>>43336152 #

23. raverbashing ◴[11 Mar 25 06:26 UTC] No.43329714[source]▶

>>43328389 #

As an engineer I find it funny when physicists (and general ML bros) don't know about linear algebra concepts

replies(1): >>43332809 #

24. raverbashing ◴[11 Mar 25 06:29 UTC] No.43329726[source]▶

>>43307541 (OP) #

Meanwhile in Python this is just

    (a @ b.T)/(np.linalg.norm(a)*np.linalg.norm(b))

replies(1): >>43330264 #

25. Lerc ◴[11 Mar 25 06:50 UTC] No.43329823[source]▶

>>43328389 #

It sounds pretentious, but I think it comes from having multiple ways to compare things for similarity. Using the word similarity says what you are using the angle for. Angle similarity might work just as well until someone comes up with another angle based calculation. Cosine Similarity becomes a specific meaning indicating how it's caclulated and why it's calculated.

If you really wanted to be pretentious you could invent a Dimension Insensitive Euclidean Metric and call it DIEM to make it sound like you are putting some Latin into your papers.

26. fergie ◴[11 Mar 25 07:15 UTC] No.43329952[source]▶

>>43307541 (OP) #

Very well explained (although not sure if the TypeScript is necessary)

I wonder though- isn't classic TF-IDF scoring also a type of basic vector comparison? What is the actual difference between "vector database" and something like elastic search?

27. dvt ◴[11 Mar 25 07:29 UTC] No.43330042{3}[source]▶

>>43329590 #

> behaves identically to euclidean distance

Tbh, I would argue that's also pretty surprising, as Euclidean distance is notoriously unintuitive[1] (and noisy) in higher dimensions. (I guess norming does help, so that's likely a good point.)

[1] https://bib.dbvis.de/uploadedFiles/155.pdf

28. forgotpwd16 ◴[11 Mar 25 08:08 UTC] No.43330264[source]▶

>>43329726 #

Because you use numpy. Could as well import cosine_similarity from sklearn.

replies(1): >>43331468 #

29. Ultimatt ◴[11 Mar 25 08:24 UTC] No.43330344[source]▶

>>43328389 #

I think the point is they're often doing it on things that aren't strictly vectors by definition, as in the physics sense, rather just some sack of stuff organised like they might be a vector and assuming its valid to compare angles because its just something numerically without much bias that does the job.

replies(2): >>43330733 #>>43330770 #

30. sampullman ◴[11 Mar 25 08:28 UTC] No.43330356{4}[source]▶

>>43329103 #

It's nice for performance, but if that's not a bottleneck, Typescript is pretty convenient. It has better tooling for the web, and compiles almost instantly. Rust -> WASM can be frustratingly slow if you're exploring a new idea or prototyping.

31. zorked ◴[11 Mar 25 09:34 UTC] No.43330733{3}[source]▶

>>43330344 #

It's stil a vector though.

32. frontfor ◴[11 Mar 25 09:43 UTC] No.43330770{3}[source]▶

>>43330344 #

I think what you meant to say is that they aren’t strictly geometric or “interpretable”. But they absolutely are vectors in the linear algebraic sense.

33. MrLeap ◴[11 Mar 25 10:56 UTC] No.43331189{3}[source]▶

>>43328246 #

I'd probably use gpu.js for something like this.

34. Etherlord87 ◴[11 Mar 25 11:41 UTC] No.43331468{3}[source]▶

>>43330264 #

you could also normalize (divide all components by magnitude) both vectors and simply take the dot product?

35. gpderetta ◴[11 Mar 25 11:49 UTC] No.43331512[source]▶

>>43328389 #

It is used as a distance function, so the important thing it is not that it is an angle, but that it is a similarity function.

And I think today it is preferred to call it "similarity" instead of "distance" because it is not a true distance function (15 years ago we called it "cosine distance").

36. blackbear_ ◴[11 Mar 25 12:28 UTC] No.43331752[source]▶

>>43328174 #

No mystery there IMO. It works well because linear projections are the basic building block of modern neural networks.

37. ◴[11 Mar 25 13:53 UTC] No.43332460[source]▶

>>43307541 (OP) #

38. frotaur ◴[11 Mar 25 14:30 UTC] No.43332809{3}[source]▶

>>43329714 #

huh? This is just a funny joke on the way its named, what made you think I don't know about dot products ?

replies(1): >>43332885 #

39. raverbashing ◴[11 Mar 25 14:38 UTC] No.43332885{4}[source]▶

>>43332809 #

It could be a joke, but the way you wrote it makes it sound like you didn't know it

> in ML people renamed 'the angle between vectors' into something as fancy-feeling as 'cosine similarity'.

Since cosine similarity is a name derived from linear algebra

40. cionx ◴[11 Mar 25 19:30 UTC] No.43336152{3}[source]▶

>>43329590 #

> every finite set of vectors, with your favorite metric, can be embedded in euclidean space with at most ~41% relative error

I’ve never heard of this before. Do you have a reference?

replies(1): >>43336626 #

41. hansvm ◴[11 Mar 25 20:24 UTC] No.43336626{4}[source]▶

>>43336152 #

I don't have any references off the top of my head, but I can state a result (which hopefully is easier to search) and then build to the thing I claimed:

- There's a well-known result that says you can embed any finite, undirected, unweighted graph (with shortest-path as the metric of choice) in Euclidean space with at most sqrt(2) multiplicative error (~41% relative error). The proof of that, as I recall, basically shows that the worst-case embedding is any graph with a 4-vertex cycle, from which the result immediately follows.

- Construct the complete graph where each vertex is one of your (finitely many) vectors.

- Let G(0) denote that graph.

- Greedily construct a sequence of graphs G(1), G(2), ... by adding one extra vertex splitting some edge. The edge you split is the one that minimizes skew between your metric's definition of distance between your original set of vectors and the new distance defined using shortest paths. When more than one equivalent split exists, choose one.

- Note that G(N) consists of better and better rational approximations of the desired distances, approaching 0 skew as N approaches infinity.

- Using that well-known result, we can embed G(N) with at most ~41% relative error, and as N grows the extra relative error approaches 0, so multiplying those together you get ~41%.

- Delete the extra vertices from G(N) to obtain the embedding for your original vectors.

- This isn't strictly necessary, but you can (I'm 90% sure -- I'd have to work through these details very carefully to be positive) obtain sqrt(2) exactly (instead of just less than sqrt(2)+epsilon for every positive epsilon) in a few ways. The one I'm most familiar with falls back to logical compaction, basically taking all the math that exists for the hyperreals and using that to create a nonstandard space holding all of these graphs. The limit approaching 0 can be phrased as a first-order statement with consequences that allow you to conclude that there exists an embedding which actually attains the limit. Topological arguments of some sort would probably be a more common way to go about it.

replies(1): >>43339675 #

42. hansvm ◴[12 Mar 25 03:25 UTC] No.43339675{5}[source]▶

>>43336626 #

I thought about it a bit more:

- That last point (attaining sqrt(2)) follows from Bolzano-Weierstrass. Just unroll the embedding into an n^2-dimensional space if you have n points, and that sequence of embeddings (as a technical detail, center them on the origin to squeeze them all into a bounded space). The function (skew of that embedding) of the limit equals the limit of the function in this case, so there is some embedding attaining that bound.

I'm still not exactly sure how the result I'm basing this all on is proven.