This work presents and evaluates algorithms for MPI collective communication operations on high performance systems. Collective communication algorithms are extensively investigated, and a universal algorithm to improve the performance of MPI collective operations on hierarchical clusters is introduced. This algorithm exploits shared-memory buffers for efficient intra-node communication while still allowing the use of unmodified, hierarchy-unaware traditional collectives for inter-node communication. The universal algorithm shows impressive performance results with a variety of collectives, improving upon the MPICH algorithms as well as the Cray MPT algorithms. Speedups average 15x - 30x for most collectives with improved scalability up to ...
This paper focuses on the performance of basic communication primitives, namely the overlap of messa...
Many parallel applications from scientific computing use MPI collective communication operations to ...
Collective communication allows efficient communication and synchronization among a collection of pr...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
Multicore or many-core clusters have become the most prominent form of High Performance Computing (H...
Collective communications occupy 20-90% of total execution times in many MPI applications. In this p...
Message passing is one of the most commonly used paradigms of parallel programming. Message Passing ...
Abstract. Most parallel systems on which MPI is used are now hierar-chical: some processors are much...
Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistribute...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
Collective communication is an important subset of Message Passing Interface. Improving the perform...
High Performance Computing (HPC) systems interconnect a large number of Processing Elements (PEs) in...
This paper focuses on the performance of basic communication primitives, namely the overlap of messa...
Many parallel applications from scientific computing use MPI collective communication operations to ...
Collective communication allows efficient communication and synchronization among a collection of pr...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
Multicore or many-core clusters have become the most prominent form of High Performance Computing (H...
Collective communications occupy 20-90% of total execution times in many MPI applications. In this p...
Message passing is one of the most commonly used paradigms of parallel programming. Message Passing ...
Abstract. Most parallel systems on which MPI is used are now hierar-chical: some processors are much...
Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistribute...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
Collective communication is an important subset of Message Passing Interface. Improving the perform...
High Performance Computing (HPC) systems interconnect a large number of Processing Elements (PEs) in...
This paper focuses on the performance of basic communication primitives, namely the overlap of messa...
Many parallel applications from scientific computing use MPI collective communication operations to ...
Collective communication allows efficient communication and synchronization among a collection of pr...