International audienceMPI applications may waste thousands of CPU cycles if they do not efficiently overlap communications and computation. In this paper, we present a generic and portable I/O manager that is able to make communication progress asynchronously using tasklets. It chooses automatically the most appropriate communication method, depending on the context: multi-threaded application or not, SMP machine or not. We have implemented and evaluated our I/O manager with Mad-MPI, our own MPI implementation, and compared it to other existing MPI implementations regarding the ability to efficiently overlap communication and computation
Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistribute...
In this paper we present the Task-Aware MPI library (TAMPI) that integrates both blocking and non-bl...
In High Performance Computing (HPC), minimizing communication overhead is one of the most important ...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
International audienceSince the advent of multi-core processors, the physionomy of typical clusters ...
International audienceRecent cluster architectures include dozens of cores per node, with all cores ...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
Reactivity to I/O events is a crucial factor for the performance of modern multithreaded distributed...
Asynchronous task-based programming models are gaining popularity to address the programmability and...
International audienceThe current trend in clusters leads towards an increase of the number of cores...
International audienceRecent cluster architectures include dozens of cores per node, with all cores ...
In this paper we propose an API to pause and resume task execution depending on external events. We ...
Supercomputing applications rely on strong scaling to achieve faster results on a larger number of p...
International audienceMulticore processors have not only reintroduced Non-Uniform Memory Access (NUM...
La tendance actuelle des constructeurs pour le calcul scientifique est à l'utilisation de grappes de...
Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistribute...
In this paper we present the Task-Aware MPI library (TAMPI) that integrates both blocking and non-bl...
In High Performance Computing (HPC), minimizing communication overhead is one of the most important ...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
International audienceSince the advent of multi-core processors, the physionomy of typical clusters ...
International audienceRecent cluster architectures include dozens of cores per node, with all cores ...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
Reactivity to I/O events is a crucial factor for the performance of modern multithreaded distributed...
Asynchronous task-based programming models are gaining popularity to address the programmability and...
International audienceThe current trend in clusters leads towards an increase of the number of cores...
International audienceRecent cluster architectures include dozens of cores per node, with all cores ...
In this paper we propose an API to pause and resume task execution depending on external events. We ...
Supercomputing applications rely on strong scaling to achieve faster results on a larger number of p...
International audienceMulticore processors have not only reintroduced Non-Uniform Memory Access (NUM...
La tendance actuelle des constructeurs pour le calcul scientifique est à l'utilisation de grappes de...
Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistribute...
In this paper we present the Task-Aware MPI library (TAMPI) that integrates both blocking and non-bl...
In High Performance Computing (HPC), minimizing communication overhead is one of the most important ...