Because Ms. Coffee Bean also wonders what it takes for a transformer(-like) architecture to be named transformer and when does it become something else, e.g. a CNN. Join the comment section to discuss!
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Referenced videos:
📺 Self-attention replaced with the Fourier Transform: https://youtu.be/j7pWPdGEfMA
📺 Ms. Coffee Bean explains the Transformer: https://youtu.be/FWFA4DGuzSc
Discussed paper:
📄 Tay, Y., Dehghani, M., Gupta, J., Bahri, D., Aribandi, V., Qin, Z., & Metzler, D. (2021). Are Pre-trained Convolutions Better than Pre-trained Transformers? https://arxiv.org/abs/2105.03322
Outline:
* 00:00 Are you tired of transformers?
* 01:12 What makes transformers so good?
* 05:13 CNN vs. Transformers
* 09:53 What makes a transformer a transformer? — Discussion
Music 🎵 : Savior Search – DJ Freedem
🔗 Links:
YouTube: https://www.youtube.com/AICoffeeBreak
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Add comment