Accelerating distributed deep neural network training with pipelined MPI allreduce

Castelló, Adrián
Quintana-Ortí, Enrique S.
Duato Marín, José Francisco

Open PDF

Open link

Publication date

December 2021

DOI

10.1007/s10586-021-03370-9

Publisher

Springer Science and Business Media LLC

Abstract

[EN] TensorFlow (TF) is usually combined with the Horovod (HVD) workload distribution package to obtain a parallel tool to train deep neural network on clusters of computers. HVD in turn utilizes a blocking Allreduce primitive to share information among processes, combined with a communication thread to overlap communication with computation. In this work, we perform a thorough experimental analysis to expose (1) the importance of selecting the best algorithm in MPI libraries to realize the Allreduce operation; and (2) the performance acceleration that can be attained when replacing a blocking Allreduce with its non-blocking counterpart (while maintaining the blocking behaviour via the appropriate synchronization mechanism). Furthermore, (3...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Accelerating distributed deep neural network training with pipelined MPI allreduce

Abstract

Extracted data

Accelerating distributed deep neural network training with pipelined MPI allreduce

Abstract

Extracted data

Related items

Related items