(joseprupi.github.io)

66 points melenaboija | 2 comments | 14 Oct 24 21:43 UTC | HN request time: 0.444s | source

1. encypruon ◴[15 Oct 24 21:53 UTC] No.41853508[source]▶

> dot += A[i] * B[i];

Isn't it pretty bad for accuracy to accumulate large numbers of floats in this fashion? o.O In the example it's 640,000 numbers. log2(640,000) is ~19.3 but the significand of a float has only 23 bits plus an implicit one.

replies(1): >>41859152 #

2. hansvm ◴[16 Oct 24 14:03 UTC] No.41859152[source]▶

>>41853508 (TP) #

Python's floats are usually doubles by default, so it's mostly fine.

That said, yeah, that implementation isn't ideal. At a minimum, Kahan summation is usually free on large vectors (you're bottlenecked on memory bandwidth anyway), give or take the fact that you need to disable floating point re-ordering to keep the compiler from screwing it up and therefore have to order the operations correctly to make it efficient (see some other top-level comments about data dependencies as an example).

↑

A not so fast implementation of cosine similarity in C++ and SIMD