←back to thread

251 points slyall | 1 comments | | HN request time: 0s | source
Show context
2sk21 ◴[] No.42058282[source]
I'm surprised that the article doesn't mention that one of the key factors that enabled deep learning was the use of RELU as the activation function in the early 2010s. RELU behaves a lot better than the logistic sigmoid that we used until then.
replies(2): >>42059243 #>>42061534 #
cma ◴[] No.42061534[source]
As compute has outpaced memory bandwidth most recent stuff has moved away from ReLU. I think Llama 3.x uses SwiGLU. Still probably closer to ReLU than logistic sigmoid, but it's back to being something more smooth than ReLU.
replies(1): >>42064106 #
1. 2sk21 ◴[] No.42064106[source]
Indeed, there have been so many new activation functions that I have stopped following the literature after I retired. I am glad to see that people are trying out new things.