Many parallel applications from scientific computing use MPI collective communication operations to collect or distribute data. Since the execution times of these communication operations increase with the number of participating processors, scalability problems might occur. In this article, we show for different MPI implementations how the execution time of collective communication operations can be significantly improved by a restructuring based on orthogonal processor structures with two or more levels. As platform, we consider a dual Xeon cluster, a Beowulf cluster and a Cray T3E with different MPI implementations. We show that the execution time of operations like MPI Bcast or MPI Allgather can be reduced by 40 % and 70 % on the dual X...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
The performance of MPI implementation operations still presents critical issues for high performance...
The emergence of meta computers and computational grids makes it feasible to run parallel programs o...
Many parallel applications from scientific computing use MPI collective communication operations to ...
Abstract Many parallel applications from scientific computing use collective MPI communication oper-...
Previous studies of application usage show that the per-formance of collective communications are cr...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
Collective communications occupy 20-90% of total execution times in many MPI applications. In this p...
MPI provides a portable message passing interface for many parallel execution platforms but may lead...
The performance of MPI implementation operations still presents critical issues for high performance...
Parallel computing on clusters of workstations and personal computers has very high potential, sinc...
Further performance improvements of parallel simulation applications will not be reached by simply s...
We have implemented eight of the MPI collective routines using MPI point-to-point communication rou...
We evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3...
Parallel computing on clusters of workstations and personal computers has very high potential, since...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
The performance of MPI implementation operations still presents critical issues for high performance...
The emergence of meta computers and computational grids makes it feasible to run parallel programs o...
Many parallel applications from scientific computing use MPI collective communication operations to ...
Abstract Many parallel applications from scientific computing use collective MPI communication oper-...
Previous studies of application usage show that the per-formance of collective communications are cr...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
Collective communications occupy 20-90% of total execution times in many MPI applications. In this p...
MPI provides a portable message passing interface for many parallel execution platforms but may lead...
The performance of MPI implementation operations still presents critical issues for high performance...
Parallel computing on clusters of workstations and personal computers has very high potential, sinc...
Further performance improvements of parallel simulation applications will not be reached by simply s...
We have implemented eight of the MPI collective routines using MPI point-to-point communication rou...
We evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3...
Parallel computing on clusters of workstations and personal computers has very high potential, since...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
The performance of MPI implementation operations still presents critical issues for high performance...
The emergence of meta computers and computational grids makes it feasible to run parallel programs o...