Why globally re-shuffle? Revisiting data shuffling in large scale deep learning

Nguyen, Thao Truong
Trahay, François
Domke, Jens
Drozd, Aleksandr
Vatai, Emil
Liao, Jianwei
Wahib, Mohamed
Gerofi, Balazs

Publication date

May 2022

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Abstract

International audienceStochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural Networks (DNN). SGD iterates the input data set in each training epoch processing data samples in a random access fashion. Because this puts enormous pressure on the I/O subsystem, the most common approach to distributed SGD in HPC environments is to replicate the entire dataset to node local SSDs. However, due to rapidly growing data set sizes this approach has become increasingly infeasible. Surprisingly, the questions of why and to what extent random access is required have not received a lot of attention in the literature from an empirical standpoint. In this paper, we revisit data shuffling in DL workloads to investigate the...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Why globally re-shuffle? Revisiting data shuffling in large scale deep learning

Abstract

Extracted data

Why globally re-shuffle? Revisiting data shuffling in large scale deep learning

Abstract

Extracted data

Related items

Related items