Different parallelization methods for irregular reductions on shared memory multiprocessors have been proposed in the literature in recent years. We have classified all these methods and analyzed them in terms of a set of properties: data locality, memory overhead, exploited parallelism, and workload balancing. In this paper we propose several techniques to increase the amount of exploited parallelism and to introduce load balancing into an important class of these methods. Regarding parallelism, the proposed solution is based on the partial expansion of the reduction array. Load balancing is discussed in terms of two techniques. The first technique is a generic one, as it deals with any kind of load imbalance present in the problem domain....
this paper, we propose a communication cost reduction computes rule for irregular loop partitioning...
Multicomputer systems based on message passing draw attractions in the field of high performance co...
In this paper, we propose a communication cost reduction computes rule for irregular loop partitioni...
This paper presents a new parallelization method for reductions of arrays with subscripted subscript...
Abstract: Irregular reduction operations are the core of many large scientific and engineering appli...
Reduction recognition and optimization are crucial techniques in parallelizing compilers. They are u...
The Flagship Parallel Reduction Machine is designed to execute declarative language programs based o...
Parallel computing promises several orders of magnitude increase in our ability to solve realistic c...
A method is outlined for optimising graph partitions which arise in mapping unstructured mesh calcul...
Proper distribution of operations among parallel processors in a large scientific computation execut...
Algorithms for mitigating imbalance of the MapReduce computa-tions are considered in this paper. Map...
This paper presents a new parallelization method for an efficient implementation of unstructured arr...
This paper describes a number of optimizations that can be used to support the efficient execution o...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
On shared memory parallel computers (SMPCs) it is natural to focus on decomposing the computation (...
this paper, we propose a communication cost reduction computes rule for irregular loop partitioning...
Multicomputer systems based on message passing draw attractions in the field of high performance co...
In this paper, we propose a communication cost reduction computes rule for irregular loop partitioni...
This paper presents a new parallelization method for reductions of arrays with subscripted subscript...
Abstract: Irregular reduction operations are the core of many large scientific and engineering appli...
Reduction recognition and optimization are crucial techniques in parallelizing compilers. They are u...
The Flagship Parallel Reduction Machine is designed to execute declarative language programs based o...
Parallel computing promises several orders of magnitude increase in our ability to solve realistic c...
A method is outlined for optimising graph partitions which arise in mapping unstructured mesh calcul...
Proper distribution of operations among parallel processors in a large scientific computation execut...
Algorithms for mitigating imbalance of the MapReduce computa-tions are considered in this paper. Map...
This paper presents a new parallelization method for an efficient implementation of unstructured arr...
This paper describes a number of optimizations that can be used to support the efficient execution o...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
On shared memory parallel computers (SMPCs) it is natural to focus on decomposing the computation (...
this paper, we propose a communication cost reduction computes rule for irregular loop partitioning...
Multicomputer systems based on message passing draw attractions in the field of high performance co...
In this paper, we propose a communication cost reduction computes rule for irregular loop partitioni...