Abstract — In this paper we revisit the supernode-shape selec-tion problem, that has been widely discussed in bibliography. In general, the selection of the supernode transformation greatly affects the parallel execution time of the transformed algorithm. Since the minimization of the overall parallel execution time via an appropriate supernode transformation is very difficult to accomplish, researchers have focused on scheduling-aware supernode transformations that maximize parallelism during the execution. In this paper we argue that the communication volume of the transformed algorithm is an important criterion, and its minimization should be given high priority. For this reason we define the metric of the per process communication volum...
Many parallel applications from scientific computing use MPI collective communication operations to ...
A new methodology named CALMANT (CC-cube Algorithms on Meshes and Tori) for mapping a type of algori...
Although parallel hardware has become ubiquitous, many designers still use sequential programming la...
With the objective of minimizing the total execution time of a parallel program on a distributed mem...
Many parallel algorithms exhibit a hypercube communication topology. Such algorithms can easily be e...
The orchestration of communication of distributed memory parallel applications on a parallel compute...
The Message Passing Interface (MPI) standard defines virtual topologies that can be applied to syste...
Petascale machines with hundreds of thousands of cores are being built. These machines have varying ...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
In this thesis we study the behavior of parallel applications represented by a precedence graph. The...
Network contention has an increasingly adverse effect on the performance of parallel ap-plications w...
Abstract—We present a new method for mapping applica-tions ’ MPI tasks to cores of a parallel comput...
In this book chapter, the authors discuss some important communication issues to obtain a highly sca...
Many parallel applications from scientific computing use MPI collective communication operations to ...
With the current continuation of Moore’s law and the presumed end of improved single core performanc...
Many parallel applications from scientific computing use MPI collective communication operations to ...
A new methodology named CALMANT (CC-cube Algorithms on Meshes and Tori) for mapping a type of algori...
Although parallel hardware has become ubiquitous, many designers still use sequential programming la...
With the objective of minimizing the total execution time of a parallel program on a distributed mem...
Many parallel algorithms exhibit a hypercube communication topology. Such algorithms can easily be e...
The orchestration of communication of distributed memory parallel applications on a parallel compute...
The Message Passing Interface (MPI) standard defines virtual topologies that can be applied to syste...
Petascale machines with hundreds of thousands of cores are being built. These machines have varying ...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
In this thesis we study the behavior of parallel applications represented by a precedence graph. The...
Network contention has an increasingly adverse effect on the performance of parallel ap-plications w...
Abstract—We present a new method for mapping applica-tions ’ MPI tasks to cores of a parallel comput...
In this book chapter, the authors discuss some important communication issues to obtain a highly sca...
Many parallel applications from scientific computing use MPI collective communication operations to ...
With the current continuation of Moore’s law and the presumed end of improved single core performanc...
Many parallel applications from scientific computing use MPI collective communication operations to ...
A new methodology named CALMANT (CC-cube Algorithms on Meshes and Tori) for mapping a type of algori...
Although parallel hardware has become ubiquitous, many designers still use sequential programming la...