Elixir: Train a Large Language Model on a Small GPU Cluster

Huang, Haichen
Fang, Jiarui
Liu, Hongxin
Li, Shenggui
You, Yang

Publication date

December 2022

Language

English

Abstract

In recent years, the number of parameters of one deep learning (DL) model has been growing much faster than the growth of GPU memory space. People who are inaccessible to a large number of GPUs resort to heterogeneous training systems for storing model parameters in CPU memory. Existing heterogeneous systems are based on parallelization plans in the scope of the whole model. They apply a consistent parallel training method for all the operators in the computation. Therefore, engineers need to pay a huge effort to incorporate a new type of model parallelism and patch its compatibility with other parallelisms. For example, Mixture-of-Experts (MoE) is still incompatible with ZeRO-3 in Deepspeed. Also, current systems face efficiency problems o...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Elixir: Train a Large Language Model on a Small GPU Cluster

Abstract

Extracted data

Elixir: Train a Large Language Model on a Small GPU Cluster

Abstract

Extracted data

Related items

Related items