(www.understandingai.org)

306 points slyall | 2 comments | 06 Nov 24 04:05 UTC | HN request time: 0.428s | source

Show context

2sk21 ◴[06 Nov 24 09:00 UTC] No.42058282[source]▶

>>42057139 (OP) #

I'm surprised that the article doesn't mention that one of the key factors that enabled deep learning was the use of RELU as the activation function in the early 2010s. RELU behaves a lot better than the logistic sigmoid that we used until then.

replies(2): >>42059243 #>>42061534 #

1. cma ◴[06 Nov 24 12:55 UTC] No.42061534[source]▶

>>42058282 #

As compute has outpaced memory bandwidth most recent stuff has moved away from ReLU. I think Llama 3.x uses SwiGLU. Still probably closer to ReLU than logistic sigmoid, but it's back to being something more smooth than ReLU.

replies(1): >>42064106 #

2. 2sk21 ◴[06 Nov 24 15:49 UTC] No.42064106[source]▶

>>42061534 (TP) #

Indeed, there have been so many new activation functions that I have stopped following the literature after I retired. I am glad to see that people are trying out new things.

↑

The deep learning boom caught almost everyone by surprise