Efficient Strong Scaling Through Burst Parallel Training

Park, Seo Jin
Fried, Joshua
Kim, Sunghyun
Alizadeh, Mohammad
Belay, Adam

Publication date

May 2022

Abstract

As emerging deep neural network (DNN) models continue to grow in size, using large GPU clusters to train DNNs is becoming an essential requirement to achieving acceptable training times. In this paper, we consider the case where future increases in cluster size will cause the global batch size that can be used to train models to reach a fundamental limit: beyond a certain point, larger global batch sizes cause sample efficiency to degrade, increasing overall time to accuracy. As a result, to achieve further improvements in training performance, we must instead consider "strong scaling" strategies that hold the global batch size constant and allocate smaller batches to each GPU. Unfortunately, this makes it significantly more difficult to us...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Efficient Strong Scaling Through Burst Parallel Training

Abstract

Extracted data

Efficient Strong Scaling Through Burst Parallel Training

Abstract

Extracted data

Related items

Related items