The efficiency misnomer | Size does not matter | What does the number of parameters mean in a model?
► Check out our sponsor Aleph Alpha 👉 https://www.aleph-alpha.de/ !
Follow them on Twitter: Aleph__Alpha
Paper 📜:
Dehghani, Mostafa, Anurag Arnab, Lucas Beyer, Ashish Vaswani, and Yi Tay. “The Efficiency Misnomer.” arXiv preprint arXiv:2110.12894 (2021). https://arxiv.org/abs/2110.12894
🔗 Megatron-Turing NLG 530B: https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
donor, Dres. Trost GbR, Yannik Schneider
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/
Outline:
00:00 Model efficiency comparison
02:51 FLOPs
03:55 Number of parameters: means what?
06:31 Speed / throughput
09:39 Aleph Alpha (Sponsor)
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Add comment