Transformers have been all the rage in the NLP community ever since GPT-3 was released and have recently become more well-known to the public after ChatGPT was released. I’m going to keep track of my favorite ways to learn about the Transformer architecture here.
Papers
- Attention is All You Need - Google Brain
- Language Models are Few-Shot Learners - OpenAI
- Aligning Language Models to Follow Instructions - OpenAI
- Learning to summarize from human feedback - OpenAI
Blog Posts
- The Illustrated Transformer - Jay Alammar
- The Attention Mechanism - Jay Alammar
- The Annotated Transformer - Sasha Rush
- Transformer Inference Arithmetic - Kipply
- Transformer Math 101 - EleutherAI
- Transformers from Scratch - Brandon Rohrer
- GPT in 60 lines of Numpy
YouTube Videos
- Attention is All You Need - Yannic Kilcher
- Building GPT from scratch - Andrej Karpathy
- Illustrated Guide to Transformers - Michael Phi
Courses
- Transformers United - Stanford CS25
- NLP with Deep Learning - Stanford CS224N