In this paper, we propose a communication cost reduction computes rule for irregular loop partitioning, called least communication computes rule. For an irregular loop with nonlinear array subscripts, the loop is transformed to a normalized single loop, then we partition the loop iterations to processors on which the minimal communication cost is ensured when executing those iterations. We also give some interprocedural optimization techniques for communication preprocessing when the irregular code has the procedure call. The experimental results show that, in most cases, our approaches achieved better performance than other loop partitioning rules
Nested loops are normally the most time intensive tasks in computer algorithms. These loops often in...
Different parallelization methods for irregular reductions on shared memory multiprocessors have bee...
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming ...
this paper, we propose a communication cost reduction computes rule for irregular loop partitioning...
Abstract. In most cases of distributed memory computations, node programs are executed on processors...
In most cases of distributed memory computations, node programs are executed on processors according...
In this paper, some automatic parallelization and opti-mization techniques for irregular scientific ...
This paper describes a number of optimizations that can be used to support the efficient execution o...
Communication (data movement) often dominates a computation's runtime and energy costs, motivating o...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
Communication set generation significantly influences the performance of parallel programs. However...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
There are many important applications in computational fluid dynamics, circuit simulation and struct...
[[abstract]]Intensive scientific algorithms can usually be formulated as nested loops which are the ...
153 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2000.We have studied five differen...
Nested loops are normally the most time intensive tasks in computer algorithms. These loops often in...
Different parallelization methods for irregular reductions on shared memory multiprocessors have bee...
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming ...
this paper, we propose a communication cost reduction computes rule for irregular loop partitioning...
Abstract. In most cases of distributed memory computations, node programs are executed on processors...
In most cases of distributed memory computations, node programs are executed on processors according...
In this paper, some automatic parallelization and opti-mization techniques for irregular scientific ...
This paper describes a number of optimizations that can be used to support the efficient execution o...
Communication (data movement) often dominates a computation's runtime and energy costs, motivating o...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
Communication set generation significantly influences the performance of parallel programs. However...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
There are many important applications in computational fluid dynamics, circuit simulation and struct...
[[abstract]]Intensive scientific algorithms can usually be formulated as nested loops which are the ...
153 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2000.We have studied five differen...
Nested loops are normally the most time intensive tasks in computer algorithms. These loops often in...
Different parallelization methods for irregular reductions on shared memory multiprocessors have bee...
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming ...