this paper, we propose a communication cost reduction computes rule for irregular loop partitioning, called least communication computes rule. For an irregular loop with nonlinear array subscripts, the loop is transformed to a normalized single loop, then we partition the loop iterations to processors on which the minimal communication cost is ensured when executing those iterations. We also give some interprocedural optimization techniques for communication preprocessing when the irregular code has the procedure call. The experimental results show that, in most cases, our approaches achieved better performance than other loop partitioning rule
Nested loops are normally the most time intensive tasks in computer algorithms. These loops often in...
Three related problems, among others, are faced when trying to execute an algorithm on a parallel ma...
Different parallelization methods for irregular reductions on shared memory multiprocessors have bee...
In this paper, we propose a communication cost reduction computes rule for irregular loop partitioni...
Abstract. In most cases of distributed memory computations, node programs are executed on processors...
In most cases of distributed memory computations, node programs are executed on processors according...
In this paper, some automatic parallelization and opti-mization techniques for irregular scientific ...
This paper describes a number of optimizations that can be used to support the efficient execution o...
Communication (data movement) often dominates a computation's runtime and energy costs, motivating o...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
Communication set generation significantly influences the performance of parallel programs. However...
There are many important applications in computational fluid dynamics, circuit simulation and struct...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
[[abstract]]Intensive scientific algorithms can usually be formulated as nested loops which are the ...
153 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2000.We have studied five differen...
Nested loops are normally the most time intensive tasks in computer algorithms. These loops often in...
Three related problems, among others, are faced when trying to execute an algorithm on a parallel ma...
Different parallelization methods for irregular reductions on shared memory multiprocessors have bee...
In this paper, we propose a communication cost reduction computes rule for irregular loop partitioni...
Abstract. In most cases of distributed memory computations, node programs are executed on processors...
In most cases of distributed memory computations, node programs are executed on processors according...
In this paper, some automatic parallelization and opti-mization techniques for irregular scientific ...
This paper describes a number of optimizations that can be used to support the efficient execution o...
Communication (data movement) often dominates a computation's runtime and energy costs, motivating o...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
Communication set generation significantly influences the performance of parallel programs. However...
There are many important applications in computational fluid dynamics, circuit simulation and struct...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
[[abstract]]Intensive scientific algorithms can usually be formulated as nested loops which are the ...
153 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2000.We have studied five differen...
Nested loops are normally the most time intensive tasks in computer algorithms. These loops often in...
Three related problems, among others, are faced when trying to execute an algorithm on a parallel ma...
Different parallelization methods for irregular reductions on shared memory multiprocessors have bee...