Near-Optimal Sparse Allreduce for Distributed Deep Learning

Li, Shigang
Hoefler, Torsten

Open link

Publication date

April 2022

DOI

10.1145/3503221.3508399

Publisher

Association for Computing Machinery (ACM)

Abstract

Communication overhead is one of the major obstacles to train large deep learning models at scale. Gradient sparsification is a promising technique to reduce the communication volume. However, it is very challenging to obtain real performance improvement because of (1) the difficulty of achieving an scalable and efficient sparse allreduce algorithm and (2) the sparsification overhead. This paper proposes Ok-Topk, a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved. To reduce the sparsification overhead, Ok-Topk...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Near-Optimal Sparse Allreduce for Distributed Deep Learning

Abstract

Extracted data

Near-Optimal Sparse Allreduce for Distributed Deep Learning

Abstract

Extracted data

Related items

Related items