How do Vision Transformers work? – Paper explained | multi-head self-attention & convolutions
SPONSOR: Weights & Biases 👉 https://wandb.me/ai-coffee-break
⏩ Vision Transformers explained playlist: https://youtube.com/playlist?list=PLpZBeKTZRGPMddKHcsJAOIghV8MwzwQV6
📺 ViT: An image is worth 16×16 pixels: https://youtu.be/DVoHvmww2lQ
📺 Swin Transformer: https://youtu.be/SndHALawoag
📺 ConvNext: https://youtu.be/QqejV0LNDHA
📺 DeiT: https://youtu.be/-FbV2KgRM8A
📺 Adversarial attacks: https://youtu.be/YyTyWGUUhmo
❓Check out our daily #MachineLearning Quiz Questions: ►
https://www.youtube.com/c/AICoffeeBreak/community
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Don Rosenthal, Dres. Trost GbR, banana.dev — Kyle Morris, Joel Ang
Paper 📜:
Park, Namuk, and Songkuk Kim. “How Do Vision Transformers Work?.” In International Conference on Learning Representations. 2021. https://openreview.net/forum?id=D78Go4hVcxO
🔗 Official implementation: https://github.com/xxxnell/how-do-vits-work
Outline:
00:00 Transformers vs ConvNets
01:04 Sponsor: Weights & Biases
02:21 Convolutions explained in a nutshell
03:35 Multi-Head Self-Attention explained
06:46 Why we thought that MSA is cool
09:56 Paper insights
15:26 MSA vs. Convs (more insight)
16:07 Low-pass filters (MSA) and high-pass filters (Convs)
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Music 🎵 : Bella Bella Beat by Nana Kwabena
Add comment