(karpathy.medium.com)

346 points swatson741 | 2 comments | 02 Nov 25 05:20 UTC | HN request time: 0s | source

Show context

WithinReason ◴[02 Nov 25 10:20 UTC] No.45789232[source]▶

Karpathy suggests the following error:

  def clipped_error(x): 
    return tf.select(tf.abs(x) < 1.0, 
                   0.5 * tf.square(x), 
                   tf.abs(x) - 0.5) # condition, true, false

Following the same principles that he outlines in this post, the "- 0.5" part is unnecessary since the gradient of 0.5 is 0, therefore -0.5 doesn't change the backpropagated gradient. In addition, a nicer formula that achieves the same goal as the above is √(x²+1)

replies(3): >>45789324 #>>45790005 #>>45791588 #

macleginn ◴[02 Nov 25 10:40 UTC] No.45789324[source]▶

>>45789232 #

If we don't subtract from the second branch, there will be a discontinuity around x = 1, so the derivative will not be well-defined. Also the value of the loss will jump at this value, which will make it hard to inspect the errors, for one thing.

replies(1): >>45789361 #

1. WithinReason ◴[02 Nov 25 10:49 UTC] No.45789361[source]▶

>>45789324 #

No, that's not how backprop works. There will be no discontinuity in a backpropagated gradient.

replies(1): >>45789667 #

2. macleginn ◴[02 Nov 25 11:57 UTC] No.45789667[source]▶

>>45789361 (TP) #

I did not say there will be a discontinuity in the gradient; I said that the modified loss function will not have a mathematically well-defined derivative because of the discontinuity in the function.

↑

Backpropagation is a leaky abstraction (2016)