Minimizing communication overhead when mapping affine loop nests onto distributed memory parallel computers (DMPCs) is a key problem with regard to performance, and many authors have dealt with it. All communications are not equivalent. Local communications (translations), simple communications (horizontal or vertical ones), or structured communications (broadcasts, gathers, scatters, or reductions) are performed much faster than general affine communications onto DMPCs. In this paper, we recall the mapping heuristic given by Dion and Robert which consists in minimizing the number of nonlocal communications and we focus on the next step: as it is generally impossible to obtain a communication local mapping, we show how to optimize residual ...
When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregat...
We present an intermediate representation of a program called the Alignment-Distribution Graph that ...
Reducing communication overhead is extremely important in distributed-memory messagepassing architec...
Minimizing communications when mapping affine loop nests onto distributed memory parallel computers ...
Minimizing communications when mapping affine loop nests onto distributed memory parallel computers ...
We present new techniques for compilation of arbitrarily nested loops with affine dependences for di...
[[abstract]]Intensive scientific algorithms can usually be formulated as nested loops which are the ...
Nested loops are normally the most time intensive tasks in computer algorithms. These loops often in...
This paper presents modulo unrolling without unrolling (mod-ulo unrolling WU), a method for message ...
Programming for parallel architectures that do not have a shared address space is extremely difficul...
This paper presents a technique for finding good distributions of arrays and suitable loop restructu...
This paper presents an algorithm to find the optimal affine partitions that maximize the degree of p...
In this paper, we propose a communication cost reduction computes rule for irregular loop partitioni...
this paper, we propose a communication cost reduction computes rule for irregular loop partitioning...
We deal with compiler support for parallelizing perfectly nested loops for coarse-grain distributed ...
When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregat...
We present an intermediate representation of a program called the Alignment-Distribution Graph that ...
Reducing communication overhead is extremely important in distributed-memory messagepassing architec...
Minimizing communications when mapping affine loop nests onto distributed memory parallel computers ...
Minimizing communications when mapping affine loop nests onto distributed memory parallel computers ...
We present new techniques for compilation of arbitrarily nested loops with affine dependences for di...
[[abstract]]Intensive scientific algorithms can usually be formulated as nested loops which are the ...
Nested loops are normally the most time intensive tasks in computer algorithms. These loops often in...
This paper presents modulo unrolling without unrolling (mod-ulo unrolling WU), a method for message ...
Programming for parallel architectures that do not have a shared address space is extremely difficul...
This paper presents a technique for finding good distributions of arrays and suitable loop restructu...
This paper presents an algorithm to find the optimal affine partitions that maximize the degree of p...
In this paper, we propose a communication cost reduction computes rule for irregular loop partitioni...
this paper, we propose a communication cost reduction computes rule for irregular loop partitioning...
We deal with compiler support for parallelizing perfectly nested loops for coarse-grain distributed ...
When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregat...
We present an intermediate representation of a program called the Alignment-Distribution Graph that ...
Reducing communication overhead is extremely important in distributed-memory messagepassing architec...