Message matching within MPI is an important performance consideration for applications that utilize two-sided semantics. In this work, we present an instrumentation of the CrayMPI library that allows the collection of detailed message-matching statistics as well as an implementation of hashed matching in software. We use this functionality to profile key DOE applications with complex communication patterns to determine under what circumstances an application might benefit from hardware offload capabilities within the NIC to accelerate message matching. We find that there are several applications and libraries that exhibit sufficiently long match list lengths to motivate a Binned Message Matching approach
Abstract. In this paper, we analyze existing MPI benchmarking suites, focusing on two restrictions t...
In this paper we describe the difficulties inherent in making accurate, reproducible measurements of...
The Message Passing Interface (MPI) is the de-facto standard for distributed memory computing in hig...
Message matching within MPI is an important performance consideration for applications that utilize ...
International audienceHPC systems have experienced significant growth over the past years, with mode...
International audienceNew kinds of applications with lots of threads or irregular communication patt...
MPI is widely used for programming large HPC clusters. MPI also includes persistent operations, whic...
The performance of massively parallel program is often impacted by the cost of communication across ...
International audienceAs the complexity and diversity of computer hardware and the elaborateness of ...
We have implemented eight of the MPI collective routines using MPI point-to-point communication rou...
Abstract. Large-scale parallel data analysis, where global information from a variety of problem dom...
optimization, Abstract—MPI is the de facto standard for portable parallel programming on high-end sy...
Understanding the behavior of parallel applications that use the Message Passing Interface (MPI) is ...
Communication hardware and software have a significant impact on the performance of clusters and sup...
The Message Passing Interface (MPI) is the standard API for parallelization in high-performance and ...
Abstract. In this paper, we analyze existing MPI benchmarking suites, focusing on two restrictions t...
In this paper we describe the difficulties inherent in making accurate, reproducible measurements of...
The Message Passing Interface (MPI) is the de-facto standard for distributed memory computing in hig...
Message matching within MPI is an important performance consideration for applications that utilize ...
International audienceHPC systems have experienced significant growth over the past years, with mode...
International audienceNew kinds of applications with lots of threads or irregular communication patt...
MPI is widely used for programming large HPC clusters. MPI also includes persistent operations, whic...
The performance of massively parallel program is often impacted by the cost of communication across ...
International audienceAs the complexity and diversity of computer hardware and the elaborateness of ...
We have implemented eight of the MPI collective routines using MPI point-to-point communication rou...
Abstract. Large-scale parallel data analysis, where global information from a variety of problem dom...
optimization, Abstract—MPI is the de facto standard for portable parallel programming on high-end sy...
Understanding the behavior of parallel applications that use the Message Passing Interface (MPI) is ...
Communication hardware and software have a significant impact on the performance of clusters and sup...
The Message Passing Interface (MPI) is the standard API for parallelization in high-performance and ...
Abstract. In this paper, we analyze existing MPI benchmarking suites, focusing on two restrictions t...
In this paper we describe the difficulties inherent in making accurate, reproducible measurements of...
The Message Passing Interface (MPI) is the de-facto standard for distributed memory computing in hig...