←back to thread

302 points sebg | 1 comments | | HN request time: 1.154s | source
Show context
cl3misch ◴[] No.45051286[source]
In the entropy implementation:

    return -np.sum(p * np.log(p, where=p > 0))
Using `where` in ufuncs like log results in the output being uninitialized (undefined) at the locations where the condition is not met. Summing over that array will return incorrect results for sure.

Better would be e.g.

    return -np.sum((p * np.log(p))[p > 0])
Also, the cross entropy code doesn't match the equation. And, as explained in the comment below the post, Ax+b is not a linear operation but affine (because of the +b).

Overall it seems like an imprecise post to me. Not bad, but not stringent enough to serve as a reference.

replies(1): >>45051423 #
jpcompartir ◴[] No.45051423[source]
I would echo some caution if using as a reference, as in another blog the writer states:

"Backpropagation, often referred to as “backward propagation of errors,” is the cornerstone of training deep neural networks. It is a supervised learning algorithm that optimizes the weights and biases of a neural network to minimize the error between predicted and actual outputs.."

https://chizkidd.github.io/2025/05/30/backpropagation/

backpropagation is a supervised machine learning algorithm, pardon?

replies(1): >>45051573 #
cl3misch ◴[] No.45051573[source]
I actually see this a lot: confusing backpropagation with gradient descent (or any optimizer). Backprop is just a way to compute the gradients of the weights with respect to the cost function, not an algorithm to minimize the cost function wrt. the weights.

I guess giving the (mathematically) simple principle of computing a gradient with the chain rule the fancy name "backpropagation" comes from the early days of AI where the computers were much less powerful and this seemed less obvious?

replies(2): >>45052206 #>>45052222 #
1. imtringued ◴[] No.45052222[source]
The German Wikipedia article makes the same mistake and it is quite infuriating.