The Transformer neural network architecture EXPLAINED. “Attention is all you need”

July 5, 2020

3 views

2 min read

Cinema Mode

⚙️ It is time to explain how Transformers work. If you are looking for a simple explanation, you found the right video!
Here we learn how large language models such as ChatGPT, Bard, Claude, GPT4 work. Vision Transformers work with the same principles, but we have a dedicated ViT video here: https://youtu.be/DVoHvmww2lQ
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/

🔗 Table of contents with links:
* 00:00 The Transformer
* 00:14 Check out the implementations of variuos Transformer-based architectures from huggingface! https://github.com/huggingface/transformers
* 00:38 RNNs recap
* 01:14 Transformers high-level
* 01:56 Tenney, Ian, Dipanjan Das, and Ellie Pavlick. “BERT rediscovers the classical NLP pipeline.” https://arxiv.org/pdf/1905.05950.pdf
* 02:27 The Transformer encoder
* 03:39 Self-attention compared to attention
* 04:51 Parallelisation
* 05:37 Encoding word order
* 06:13 Residual connections
* 06:35 Generating the output sequence
* 07:59 Masked word prediction
* 08:40 Self-supervised learning FTW!
* 09:08 Pre-training and fine-tuning and Probing
* 09:44 End dance 😉

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

Hungry for more?
📄 Paper: Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems. 2017. https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

📚 Check out the blog of @arp_ai: http://jalammar.github.io/illustrated-transformer/! It has helped me a lot in understanding the Transformers better and served as an inspiration for this video!

📺 @YannicKilcher paper explanation: https://youtu.be/iDulhoQ2pro

🔗 Links:
YouTube: https://www.youtube.com/AICoffeeBreak
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/

#AICoffeeBreak #MsCoffeeBean #TransformerinML #MachineLearning #AI #research

The Transformer neural network architecture EXPLAINED. “Attention is all you need”

Add comment

Cancel reply

Categories

All Topics

210,000 CODERS lost jobs as NVIDIA released NEW coding language.

Kurzweil: AI will be smarter than all humans combined by 2029

The AI Revolution: Will Robots Take Your Job?

Artificial Intelligence | 60 Minutes Full Episodes

The A.I. Dilemma – March 9, 2023

In the Age of AI (full documentary) | FRONTLINE

The Transformer neural network architecture EXPLAINED. “Attention is all you need”

You may also like

Add comment

Categories

All Topics