Chimera: Efficiently training large-scale neural networks with bidirectional pipelines

Li, Shigang
Hoefler, Torsten

Open link

Publication date

November 2021

DOI

10.1145/3458817.3476145

Publisher

Association for Computing Machinery (ACM)

Abstract

Training large deep learning models at scale is very challenging. This paper proposes Chimera, a novel pipeline parallelism scheme which combines bidirectional pipelines for efficiently training largescale models. Chimera is a synchronous approach and therefore no loss of accuracy, which is more convergence-friendly than asynchronous approaches. Compared with the latest synchronous pipeline approach, Chimera reduces the number of bubbles by up to 50%; benefiting from the sophisticated scheduling of bidirectional pipelines, Chimera has a more balanced activation memory consumption. Evaluations are conducted on Transformer based language models. For a GPT-2 model with 1.3 billion parameters running on 2,048 GPU nodes of the Piz Daint supercom...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Chimera: Efficiently training large-scale neural networks with bidirectional pipelines

Abstract

Extracted data

Chimera: Efficiently training large-scale neural networks with bidirectional pipelines

Abstract

Extracted data

Related items

Related items