Reduction is a core operation in parallel computing. Optimizing its cost has a high potential impact on the application execution time, particularly in MPI and MapReduce computations. In this paper, we propose an optimal algorithm for scheduling associative reductions. We focus on the case where communications and computations can be overlapped to fully exploit resources. Our algorithm greedily builds a spanning tree by starting from the sink and by adding a parent at each iteration. Bounds on the completion time of optimal schedules are then characterized. To show the algorithm extensibility, we adapt it to model variations in which either communication or computation resources are limited. Moreover, we study two specific spanning trees: w...
We deal with the problem of partitioning and mapping uniform loop nests onto physical processor arra...
A contract algorithm is an algorithm which is given, as part of the input, a specified amount of all...
Consider a network of processor elements arranged in a d-dimensional grid, where each processor can ...
Reduction is a core operation in parallel computing. Optimizing its cost has a high potential impact...
Abstract—Collective communications are ubiquitous in parallel applications. We present two new algor...
In this thesis we study the behavior of parallel applications represented by a precedence graph. The...
This paper addresses the problem of designing a parallel reduction architecture for applicative lang...
This paper addresses the problem of designing a parallel reduction architecture for applicative lang...
This thesis focus on the problem of scheduling the tasks of a parallel application taking into accou...
AbstractWe study the problem of scheduling a parallel computation so as to minimize the maximum numb...
AbstractWe present here an nτ+1 algorithm for optimally scheduling a dag of n nodes on a multiproces...
AbstractWe consider a family of jobs that are organized as a task-tree which, in particular, capture...
This thesis explores a fundamental issue in large-scale parallel computing: how to schedule tasks on...
International audienceIn this paper, we present an algorithm that builds optimal schedules for compl...
We consider the problem of scheduling trees on two identical processors in order to minimize the mak...
We deal with the problem of partitioning and mapping uniform loop nests onto physical processor arra...
A contract algorithm is an algorithm which is given, as part of the input, a specified amount of all...
Consider a network of processor elements arranged in a d-dimensional grid, where each processor can ...
Reduction is a core operation in parallel computing. Optimizing its cost has a high potential impact...
Abstract—Collective communications are ubiquitous in parallel applications. We present two new algor...
In this thesis we study the behavior of parallel applications represented by a precedence graph. The...
This paper addresses the problem of designing a parallel reduction architecture for applicative lang...
This paper addresses the problem of designing a parallel reduction architecture for applicative lang...
This thesis focus on the problem of scheduling the tasks of a parallel application taking into accou...
AbstractWe study the problem of scheduling a parallel computation so as to minimize the maximum numb...
AbstractWe present here an nτ+1 algorithm for optimally scheduling a dag of n nodes on a multiproces...
AbstractWe consider a family of jobs that are organized as a task-tree which, in particular, capture...
This thesis explores a fundamental issue in large-scale parallel computing: how to schedule tasks on...
International audienceIn this paper, we present an algorithm that builds optimal schedules for compl...
We consider the problem of scheduling trees on two identical processors in order to minimize the mak...
We deal with the problem of partitioning and mapping uniform loop nests onto physical processor arra...
A contract algorithm is an algorithm which is given, as part of the input, a specified amount of all...
Consider a network of processor elements arranged in a d-dimensional grid, where each processor can ...