←back to thread

584 points Alifatisk | 4 comments | | HN request time: 0s | source
Show context
riku_iki ◴[] No.46182853[source]
Post starts with wrong statement right away:

"The Transformer architecture revolutionized sequence modeling with its introduction of attention"

Attention was developed before transformers.

replies(1): >>46186002 #
1. Alifatisk ◴[] No.46186002[source]
> Attention was developed before transformers.

I just looked this up and it’s true, this changes the timeline I had in my mind completely! I thought the paper on Transformers is what also introduced the attention mechanism, but it existed before too and was applied on RNN encoder-decoder. Wow

replies(1): >>46186614 #
2. logicchains ◴[] No.46186614[source]
Knowing how such things go, it was probably invented by Schmidhuber in the 90s.
replies(1): >>46187385 #
3. esafak ◴[] No.46187385[source]
https://people.idsia.ch/~juergen/1991-unnormalized-linear-tr...
replies(1): >>46190589 #
4. cubefox ◴[] No.46190589{3}[source]
Of course.