Hybrid MPI+threads programming is gaining prominence as an alternative to the traditional "MPI everywhere'" model to better handle the disproportionate increase in the number of cores compared with other on-node resources. Current implementations of these two models represent the two extreme cases of communication resource sharing in modern MPI implementations. In the MPI-everywhere model, each MPI process has a dedicated set of communication resources (also known as endpoints), which is ideal for performance but is resource wasteful. With MPI+threads, current MPI implementations share a single communication endpoint for all threads, which is ideal for resource usage but is hurtful for performance. In this paper, we explore the tradeoff s...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
As high-end computing systems continue to grow in scale, recent advances in multi- and many-core arc...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
Abstract—With the increasing prominence of many-core archi-tectures and decreasing per-core resource...
The current MPI model defines a one-to-one relationship between MPI processes and MPI ranks. This mo...
Abstract—Modern high-speed interconnection networks are designed with capabilities to support commun...
Supercomputing applications rely on strong scaling to achieve faster results on a larger number of p...
Hybrid MPI+Threads programming has emerged as an alternative model to the “MPI everywhere ” model to...
Threading support for Message Passing Interface (MPI) has been defined in the MPI standard for more ...
A recent trend in high performance computing shows a rising number of cores per compute node, while ...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
Many-core architectures, such as the Intel Xeon Phi, provide dozens of cores and hundreds of hardwar...
International audienceSince the last decade, most of the supercomputer architectures are based on cl...
Communication hardware and software have a significant impact on the performance of clusters and sup...
Abstract. To make the most effective use of parallel machines that are being built out of increasing...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
As high-end computing systems continue to grow in scale, recent advances in multi- and many-core arc...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
Abstract—With the increasing prominence of many-core archi-tectures and decreasing per-core resource...
The current MPI model defines a one-to-one relationship between MPI processes and MPI ranks. This mo...
Abstract—Modern high-speed interconnection networks are designed with capabilities to support commun...
Supercomputing applications rely on strong scaling to achieve faster results on a larger number of p...
Hybrid MPI+Threads programming has emerged as an alternative model to the “MPI everywhere ” model to...
Threading support for Message Passing Interface (MPI) has been defined in the MPI standard for more ...
A recent trend in high performance computing shows a rising number of cores per compute node, while ...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
Many-core architectures, such as the Intel Xeon Phi, provide dozens of cores and hundreds of hardwar...
International audienceSince the last decade, most of the supercomputer architectures are based on cl...
Communication hardware and software have a significant impact on the performance of clusters and sup...
Abstract. To make the most effective use of parallel machines that are being built out of increasing...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
As high-end computing systems continue to grow in scale, recent advances in multi- and many-core arc...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...