International audienceParallel runtime systems such as MPI or task-based libraries provide models to manage both computation and communication by allocating cores, scheduling threads, executing communication algorithms. Efficiently implementing such models is challenging due to their interplay within the runtime system. In this paper, we assess interferences between communications and computations when they run side by side. We study the impact of communications on computations, and conversely the impact of computations on communication performance. We consider two aspects: CPU frequency, and memory contention. We have designed benchmarks to measure these phenomena. We show that CPU frequency variations caused by computation have a small im...
As high-performance computing (HPC) systems advance towards exascale (10^18 operations per second), ...
Asynchronous task-based programming models are gaining popularity to address the programmability and...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
International audienceParallel runtime systems such as MPI or task-based libraries provide models to...
International audienceTo amortize the cost of MPI communications, distributed parallel HPC applicati...
In High Performance Computing (HPC), minimizing communication overhead is one of the most important ...
International audienceTo amortize the cost of MPI communications, distributed parallel HPC applicati...
Effective overlap of computation and communication is a well understood technique for latency hiding...
International audienceIn HPC applications, one of the major overhead compared to sequentiel code, is...
Many contemporary HPC systems expose their jobs to substantial amounts of interference, leading to s...
International audienceOverlapping communications with computation is an efficient way to amortize th...
HPC applications are large software packages with high computation and storage requirements. To meet...
Modern MPI simulator frameworks assume the existence of a Computation-Communication Divide: thus, th...
International audienceBy allowing computation/communication overlap, MPI nonblocking collectives (NB...
In modern MPI applications, communication between separate computational nodes quickly add up to a s...
As high-performance computing (HPC) systems advance towards exascale (10^18 operations per second), ...
Asynchronous task-based programming models are gaining popularity to address the programmability and...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...
International audienceParallel runtime systems such as MPI or task-based libraries provide models to...
International audienceTo amortize the cost of MPI communications, distributed parallel HPC applicati...
In High Performance Computing (HPC), minimizing communication overhead is one of the most important ...
International audienceTo amortize the cost of MPI communications, distributed parallel HPC applicati...
Effective overlap of computation and communication is a well understood technique for latency hiding...
International audienceIn HPC applications, one of the major overhead compared to sequentiel code, is...
Many contemporary HPC systems expose their jobs to substantial amounts of interference, leading to s...
International audienceOverlapping communications with computation is an efficient way to amortize th...
HPC applications are large software packages with high computation and storage requirements. To meet...
Modern MPI simulator frameworks assume the existence of a Computation-Communication Divide: thus, th...
International audienceBy allowing computation/communication overlap, MPI nonblocking collectives (NB...
In modern MPI applications, communication between separate computational nodes quickly add up to a s...
As high-performance computing (HPC) systems advance towards exascale (10^18 operations per second), ...
Asynchronous task-based programming models are gaining popularity to address the programmability and...
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardw...