GPU-Aware Intranode MPI_Allreduce

Iman Faraji
Ahmad Afsahi

Publication date

September 2015

Abstract

Modern multi-core clusters are increasingly using GPUs to achieve higher performance and power efficiency. In such clusters, efficient communication among processes with data residing in GPU memory is of paramount importance to the performance of MPI applications. This paper investigates the efficient design of intranode MPI Allreduce operation in GPU clusters. We propose two design alternatives that ex-ploit in-GPU reduction and fast intranode communication capabilities of modern GPUs. Our GPU shared-buffer aware design and GPU-aware Binomial reduce-broadcast algorith-mic approach provide significant speedup over MVAPICH2 by up to 22 and 16 times, respectively

Extracted data

We use cookies to provide a better user experience.

Data Protection

GPU-Aware Intranode MPI_Allreduce

Abstract

Extracted data

GPU-Aware Intranode MPI_Allreduce

Abstract

Extracted data

Related items

Related items