MPI libraries are widely used in applications of high performance computing. Yet, effective tuning of MPI colletives on large parallel systems is an outstanding challenge. This process often follows a trial-and-error approach and requires expert insights into the subtle interactions between software and the underlying hardware. This paper presents an empirical approach to choose and switch MPI communication algorithms at runtime to optimize the application performance. We achieve this by first modeling offline, through microbenchmarks, to find how the runtime parameters with different message sizes affect the choice of MPI communication algorithms. We then apply the knowledge to automatically optimize new unseen MPI programs. We evaluate ou...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
International audienceHPC systems have experienced significant growth over the past years, with mode...
MPI is widely used for programming large HPC clusters. MPI also includes persistent operations, whic...
MPI libraries are widely used in applications of high performance computing. Yet, effective tuning o...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
Abstract Many parallel applications from scientific computing use collective MPI communication oper-...
Message passing is one of the most commonly used paradigms of parallel programming. Message Passing ...
The availability of cheap computers with outstanding single-processor performance coupled with Ether...
In order for collective communication routines to achieve high performance on different platforms, t...
optimization, Abstract—MPI is the de facto standard for portable parallel programming on high-end sy...
Many parallel applications from scientific computing use MPI collective communication operations to ...
Collective communications occupy 20-90% of total execution times in many MPI applications. In this p...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
Previous studies of application usage show that the per-formance of collective communications are cr...
The large variety of production implementations of the message passing interface (MPI) each provide ...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
International audienceHPC systems have experienced significant growth over the past years, with mode...
MPI is widely used for programming large HPC clusters. MPI also includes persistent operations, whic...
MPI libraries are widely used in applications of high performance computing. Yet, effective tuning o...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
Abstract Many parallel applications from scientific computing use collective MPI communication oper-...
Message passing is one of the most commonly used paradigms of parallel programming. Message Passing ...
The availability of cheap computers with outstanding single-processor performance coupled with Ether...
In order for collective communication routines to achieve high performance on different platforms, t...
optimization, Abstract—MPI is the de facto standard for portable parallel programming on high-end sy...
Many parallel applications from scientific computing use MPI collective communication operations to ...
Collective communications occupy 20-90% of total execution times in many MPI applications. In this p...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
Previous studies of application usage show that the per-formance of collective communications are cr...
The large variety of production implementations of the message passing interface (MPI) each provide ...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
International audienceHPC systems have experienced significant growth over the past years, with mode...
MPI is widely used for programming large HPC clusters. MPI also includes persistent operations, whic...