"Attention Is All You Need" - I've always wondered if the authors of that paper used such a casual and catchy title because they knew it would be groundbreaking and massively cited in the future....
replies(9):
The preceding seq-2-seq architectures had been RNN (LSTM) based, then RNN + attention (Bahdanau et al "Jointly Learning to Align & Translate"), with the Transformer "attention is all you need" paper then meaning you can drop use of RNNs altogether and just use attention.
Of course NOT using RNNs was the key motivator behind the new Transformer architecture - not only did you not NEED an RNN, but they explicitly wanted to avoid it since the goal was to support parallel vs sequential processing for better performance on the available highly parallel hardware.