(www.understandingai.org)

306 points slyall | 5 comments | 06 Nov 24 04:05 UTC | HN request time: 0.823s | source

Show context

2sk21 ◴[06 Nov 24 09:00 UTC] No.42058282[source]▶

>>42057139 (OP) #

I'm surprised that the article doesn't mention that one of the key factors that enabled deep learning was the use of RELU as the activation function in the early 2010s. RELU behaves a lot better than the logistic sigmoid that we used until then.

replies(2): >>42059243 #>>42061534 #

1. sanxiyn ◴[06 Nov 24 10:12 UTC] No.42059243[source]▶

>>42058282 #

Geoffrey Hinton (now a Nobel Prize winner!) himself did a summary. I think it is the single best summary on this topic.

  Our labeled datasets were thousands of times too small.
  Our computers were millions of times too slow.
  We initialized the weights in a stupid way.
  We used the wrong type of non-linearity.

replies(3): >>42059572 #>>42076459 #>>42119083 #

2. imjonse ◴[06 Nov 24 10:34 UTC] No.42059572[source]▶

>>42059243 (TP) #

That is a pithier formulation of the widely accepted summary of "more data + more compute + algo improvements"

replies(1): >>42059591 #

3. sanxiyn ◴[06 Nov 24 10:36 UTC] No.42059591[source]▶

>>42059572 #

No, it isn't. It emphasizes importance of Glorot initialization and ReLU.

4. helltone ◴[07 Nov 24 13:27 UTC] No.42076459[source]▶

>>42059243 (TP) #

I'm curious and it's not obvious to me: what changed in terms of weight initialisation?

5. HarHarVeryFunny ◴[12 Nov 24 20:01 UTC] No.42119083[source]▶