What is masked language modelling? Or next sentence prediction? And why are they working so well? If you ever wondered what tasks the Transformer architectures are trained on and how the Multimodal Transfomer learns about the connection between images and text, then this is the right video for you!
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🎬 Ms. Coffee Bean explained the Multimodal Transformer: https://youtu.be/dd7nE4nbxN0
🎬 She also explained the Language-based Transformer: https://youtu.be/FWFA4DGuzSc

Content:
* 00:00 Pre-training strategies
* 00:48 Masked language modelling
* 03:37 Next sentence prediction
* 04:31 Sentence image alignment
* 05:07 Image region classification
* 06:14 Image region regression
* 06:53 Pre-training and fine-tuning on the downstream task

📄 This video has been enabled by the beautiful overview table in the Appendix of this paper:
VL-BERT: Su, Weijie, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. “Vl-bert: Pre-training of generic visual-linguistic representations.” arXiv preprint arXiv:1908.08530 (2019). https://arxiv.org/pdf/1908.08530.pdf


🔗 Links:
YouTube: https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA/
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research #BERT

Video and thumbnail contain emojis designed by OpenMoji – the open-source emoji and icon project. License: CC BY-SA 4.0

Add comment

Your email address will not be published. Required fields are marked *

Categories

All Topics