Here is another take on visualizing transformers from Georgia Tech researchers: https://poloclub.github.io/transformer-explainer/
The Illustrated Transformer: https://jalammar.github.io/illustrated-transformer/
Sebastian Raschka, PhD has a post on the architectures: https://magazine.sebastianraschka.com/p/from-gpt-2-to-gpt-os...
This HN comment has numerous resources: https://news.ycombinator.com/item?id=35712334