Here we learn how large language models such as ChatGPT, Bard, Claude, GPT4 work. Vision Transformers work with the same principles, but we have a dedicated ViT video here: https://youtu.be/DVoHvmww2lQ
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/
🔗 Table of contents with links:
* 00:00 The Transformer
* 00:14 Check out the implementations of variuos Transformer-based architectures from huggingface! https://github.com/huggingface/transformers
* 00:38 RNNs recap
* 01:14 Transformers high-level
* 01:56 Tenney, Ian, Dipanjan Das, and Ellie Pavlick. “BERT rediscovers the classical NLP pipeline.” https://arxiv.org/pdf/1905.05950.pdf
* 02:27 The Transformer encoder
* 03:39 Self-attention compared to attention
* 04:51 Parallelisation
* 05:37 Encoding word order
* 06:13 Residual connections
* 06:35 Generating the output sequence
* 07:59 Masked word prediction
* 08:40 Self-supervised learning FTW!
* 09:08 Pre-training and fine-tuning and Probing
* 09:44 End dance 😉
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Hungry for more?
📄 Paper: Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems. 2017. https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
📚 Check out the blog of @arp_ai: http://jalammar.github.io/illustrated-transformer/! It has helped me a lot in understanding the Transformers better and served as an inspiration for this video!
📺 @YannicKilcher paper explanation: https://youtu.be/iDulhoQ2pro
🔗 Links:
YouTube: https://www.youtube.com/AICoffeeBreak
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
#AICoffeeBreak #MsCoffeeBean #TransformerinML #MachineLearning #AI #research
Add comment