In this paper, we develop portable and scalable algorithms for performing irregular all-to-all communication in High Performance Computing (HPC) systems. To minimize the communication latency, the algorithm reduces the total number of messages transmitted, reduces the variance of the lengths of these messages, and overlaps the communication with computation. The performance of the algorithm is characterized using a simple model of HPC systems. Our implementations are performed using the Message Passing Interface (MPI) standard and they can be ported to various HPC platforms. The performance of our algorithms is evaluated on CM5, T3D and SP2. The results show the effectiveness of the techniques as well as the interplay between the architectu...
With the current continuation of Moore’s law and the presumed end of improved single core performanc...
In this paper, we study the communication characteristics of the CM-5 and the performance effects of...
We have implemented eight of the MPI collective routines using MPI point-to-point communication rou...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
In this paper we present several algorithms for performing all-to-many personalized communication on...
In this paper we present several algorithms for all-too-many personalized communications which avoid...
In High Performance Computing (HPC), minimizing communication overhead is one of the most important ...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
We present an algorithm for all-to-all personalized communication, in which every processor has an i...
In this paper, we consider the communications involved in the execution of a complex application, de...
This paper describes a number of optimizations that can be used to support the efficient execution o...
Abstract Many parallel applications from scientific computing use collective MPI communication oper-...
International audienceHPC systems have experienced significant growth over the past years, with mode...
Abstract:- Clusters are high performance computation systems, built up out of standard off-the-self ...
With the current continuation of Moore’s law and the presumed end of improved single core performanc...
In this paper, we study the communication characteristics of the CM-5 and the performance effects of...
We have implemented eight of the MPI collective routines using MPI point-to-point communication rou...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
In this paper we present several algorithms for performing all-to-many personalized communication on...
In this paper we present several algorithms for all-too-many personalized communications which avoid...
In High Performance Computing (HPC), minimizing communication overhead is one of the most important ...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
We present an algorithm for all-to-all personalized communication, in which every processor has an i...
In this paper, we consider the communications involved in the execution of a complex application, de...
This paper describes a number of optimizations that can be used to support the efficient execution o...
Abstract Many parallel applications from scientific computing use collective MPI communication oper-...
International audienceHPC systems have experienced significant growth over the past years, with mode...
Abstract:- Clusters are high performance computation systems, built up out of standard off-the-self ...
With the current continuation of Moore’s law and the presumed end of improved single core performanc...
In this paper, we study the communication characteristics of the CM-5 and the performance effects of...
We have implemented eight of the MPI collective routines using MPI point-to-point communication rou...