The proliferation of large models based on Transformer has outpaced advances in hardware, resulting in an urgent need for the ability to distribute enormous models across multiple GPUs. Despite this growing demand, best practices for choosing an optimal strategy are still lacking due to the breadth of knowledge required across HPC, DL, and distributed systems. These difficulties have stimulated both AI and HPC developers to explore the key questions: How can training and inference efficiency of large models be improved to reduce costs? How can larger AI models be accommodated even with limited resources?

What can be done to enable more community members to easily access large models and large-scale applications? In this session, we investigate efforts to solve the questions mentioned above. Firstly, diverse parallelization is an important tool to improve the efficiency of large model training and inference. Heterogeneous memory management can help enhance the model accommodation capacity of processors like GPUs.

Furthermore, user-friendly DL systems for large models significantly reduce the specialized background knowledge users need, allowing more community members to get started with larger models more efficiently. We will provide participants with a system-level open-source solution, Colossal-AI. More information can be found at https://github.com/hpcaitech/ColossalAI.

Talk by: James Demmel and Yang You

Here’s more to explore:
LLM Compact Guide: https://dbricks.co/43WuQyb
Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc
Facebook: https://www.facebook.com/databricksinc

Add comment

Your email address will not be published. Required fields are marked *

Categories

All Topics