(karpathy.medium.com)

346 points swatson741 | 1 comments | 02 Nov 25 05:20 UTC | HN request time: 0.198s | source

Show context

WithinReason ◴[02 Nov 25 10:20 UTC] No.45789232[source]▶

Karpathy suggests the following error:

  def clipped_error(x): 
    return tf.select(tf.abs(x) < 1.0, 
                   0.5 * tf.square(x), 
                   tf.abs(x) - 0.5) # condition, true, false

Following the same principles that he outlines in this post, the "- 0.5" part is unnecessary since the gradient of 0.5 is 0, therefore -0.5 doesn't change the backpropagated gradient. In addition, a nicer formula that achieves the same goal as the above is √(x²+1)

replies(3): >>45789324 #>>45790005 #>>45791588 #

1. kingstnap ◴[02 Nov 25 13:05 UTC] No.45790005[source]▶

>>45789232 #

You do that to make things smoother when plotted. You could in theory add some crazy stairstep that adds a hundred to the middle part. It would make your loss curves spike and increase towards convergence but then those spikes are just visual artifacts from doing weird discontinuous nonsense with yoru loss.

↑

Backpropagation is a leaky abstraction (2016)