1st Multilingual Model Workshop – Pretraining the Jais Bilingual Arabic-English Language Models
Joel discusses pretraining techniques that result in state-of-the-art Arabic capabilities. The vocabulary selection process ensures the model can access balanced capability in both Arabic and English. Joel describes the use of Maximal Update Parameterization, which simplifies hyperparameter selection leading to predictable model scaling. The scaling laws tests show we can mix Arabic and English in a 1:2 ratio and achieve near-perfect scaling in both languages.
Jais models are developed through a collaboration between Core42’s Inception, the Mohammed Bin Zayed University (MBZUAI), and Cerebras Systems.
Add comment