Important machine learning equations

(chizkidd.github.io)

302 points sebg | 1 comments | 28 Aug 25 11:38 UTC | HN request time: 1.154s | source

Show context

cl3misch ◴[28 Aug 25 12:21 UTC] No.45051286[source]▶

In the entropy implementation:

    return -np.sum(p * np.log(p, where=p > 0))

Using `where` in ufuncs like log results in the output being uninitialized (undefined) at the locations where the condition is not met. Summing over that array will return incorrect results for sure.

Better would be e.g.

    return -np.sum((p * np.log(p))[p > 0])

Also, the cross entropy code doesn't match the equation. And, as explained in the comment below the post, Ax+b is not a linear operation but affine (because of the +b).

Overall it seems like an imprecise post to me. Not bad, but not stringent enough to serve as a reference.

replies(1): >>45051423 #

jpcompartir ◴[28 Aug 25 12:38 UTC] No.45051423[source]▶

>>45051286 #

I would echo some caution if using as a reference, as in another blog the writer states:

"Backpropagation, often referred to as “backward propagation of errors,” is the cornerstone of training deep neural networks. It is a supervised learning algorithm that optimizes the weights and biases of a neural network to minimize the error between predicted and actual outputs.."

https://chizkidd.github.io/2025/05/30/backpropagation/

backpropagation is a supervised machine learning algorithm, pardon?

replies(1): >>45051573 #

cl3misch ◴[28 Aug 25 12:52 UTC] No.45051573[source]▶

>>45051423 #

I actually see this a lot: confusing backpropagation with gradient descent (or any optimizer). Backprop is just a way to compute the gradients of the weights with respect to the cost function, not an algorithm to minimize the cost function wrt. the weights.

I guess giving the (mathematically) simple principle of computing a gradient with the chain rule the fancy name "backpropagation" comes from the early days of AI where the computers were much less powerful and this seemed less obvious?

replies(2): >>45052206 #>>45052222 #

1. imtringued ◴[28 Aug 25 13:53 UTC] No.45052222[source]▶

>>45051573 #

The German Wikipedia article makes the same mistake and it is quite infuriating.

↑