←back to thread

346 points swatson741 | 2 comments | | HN request time: 0s | source
Show context
WithinReason ◴[] No.45789232[source]
Karpathy suggests the following error:

  def clipped_error(x): 
    return tf.select(tf.abs(x) < 1.0, 
                   0.5 * tf.square(x), 
                   tf.abs(x) - 0.5) # condition, true, false
Following the same principles that he outlines in this post, the "- 0.5" part is unnecessary since the gradient of 0.5 is 0, therefore -0.5 doesn't change the backpropagated gradient. In addition, a nicer formula that achieves the same goal as the above is √(x²+1)
replies(3): >>45789324 #>>45790005 #>>45791588 #
macleginn ◴[] No.45789324[source]
If we don't subtract from the second branch, there will be a discontinuity around x = 1, so the derivative will not be well-defined. Also the value of the loss will jump at this value, which will make it hard to inspect the errors, for one thing.
replies(1): >>45789361 #
1. WithinReason ◴[] No.45789361[source]
No, that's not how backprop works. There will be no discontinuity in a backpropagated gradient.
replies(1): >>45789667 #
2. macleginn ◴[] No.45789667[source]
I did not say there will be a discontinuity in the gradient; I said that the modified loss function will not have a mathematically well-defined derivative because of the discontinuity in the function.