Reductions are important and time-consuming opera-tions in many scientific codes. Effective parallelization of reductions is a critical transformation for loop paralleliza-tion, especially for sparse, dynamic applications. Unfortu-nately, conventional reduction parallelization algorithms are not scalable. In this paper, we present new architectural support that significantly speeds-up parallel reduction and makes it scal-able in shared-memory multiprocessors. The required ar-chitectural changes are mostly confined to the directory controllers. Experimental results based on simulations show that the proposed support is very effective. While conventional software-only reduction parallelization deliv-ers average speedups of only 2.7 for 16 pro...
With ubiquitous multi-core architectures, a major challenge is how to effectively use these machines...
Run-time parallelization is often the only way to execute the code in parallel when data dependence ...
Parallel graph reduction is a conceptually simple model for the concurrent evaluation of lazy functi...
Reductions are important and time-consuming operations in many scientific codes. Effective paralleli...
Reduction recognition and optimization are crucial techniques in parallelizing compilers. They are u...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
Shared-memory multiprocessor systems can achieve high performance levels when appropriate work paral...
Proper distribution of operations among parallel processors in a large scientific computation execut...
This paper presents a new parallelization method for reductions of arrays with subscripted subscript...
Speculative parallel execution of statically non-analyzable codes on Distributed Shared-Memory (DSM)...
peer-reviewedThe shift towards multicore processing has led to a much wider population of developer...
Different parallelization methods for irregular reductions on shared memory multiprocessors have bee...
The last decade has produced enormous improvements in processor speeds without a corresponding impro...
A coarse-grain parallel program typically has one thread (task) per processor, whereas a fine-grain ...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
With ubiquitous multi-core architectures, a major challenge is how to effectively use these machines...
Run-time parallelization is often the only way to execute the code in parallel when data dependence ...
Parallel graph reduction is a conceptually simple model for the concurrent evaluation of lazy functi...
Reductions are important and time-consuming operations in many scientific codes. Effective paralleli...
Reduction recognition and optimization are crucial techniques in parallelizing compilers. They are u...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
Shared-memory multiprocessor systems can achieve high performance levels when appropriate work paral...
Proper distribution of operations among parallel processors in a large scientific computation execut...
This paper presents a new parallelization method for reductions of arrays with subscripted subscript...
Speculative parallel execution of statically non-analyzable codes on Distributed Shared-Memory (DSM)...
peer-reviewedThe shift towards multicore processing has led to a much wider population of developer...
Different parallelization methods for irregular reductions on shared memory multiprocessors have bee...
The last decade has produced enormous improvements in processor speeds without a corresponding impro...
A coarse-grain parallel program typically has one thread (task) per processor, whereas a fine-grain ...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
With ubiquitous multi-core architectures, a major challenge is how to effectively use these machines...
Run-time parallelization is often the only way to execute the code in parallel when data dependence ...
Parallel graph reduction is a conceptually simple model for the concurrent evaluation of lazy functi...