SIMD computers have proved to be a useful and cost effective approach to massively parallel computation. On the other hand, there are algorithms which are very inefficient when directly translated into a data-parallel program.This paper presents a number of simple transformations which are able to reduce this SIMD overhead to a moderate constant factor. It also introduces techniques for reducing the remaining overhead using Markov chain models of control flow. The optimization problems involved are NP-hard in general but there are many useful heuristics, and closed form optimizations for a probabilistic variant
This thesis talks about techniques which can be used to optimize run time of algorithms. For a demon...
A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched net...
AbstractWe describe our implementation of several efficient parallel algorithms on the massively par...
his paper presents a technique that may be used to transform SIMD shared memory parallel s algorithm...
A massively parallel MIMD machine is costly and difficult to build. Implementations of MIMD machines...
This paper considers the expression and derivation of efficient data parallel programs for SIMD and ...
Two new parallel optimization algorithms based on the simplex method are described. They may be exec...
This paper presents a straightforward approach to determining how best to utilize an MIMD multiproce...
This dissertation proposes a new technique for efficient parallel solution of very large linear syst...
The increasing availability of multi-core and multiprocessor architectures provides new opportunitie...
Many important multimedia applications contain a significant fraction of reduction operations. Altho...
Highly parallel computing architectures are the only means to achieve the computation rates demanded...
We describe our implementation of several efficient parallel algorithms on the massively parallel SI...
Many loop nests in scientific codes contain a parallelizable outer loop but have an inner loop for w...
The P-RAM model of computation has proved to be a very useful theoretical model for exploiting and e...
This thesis talks about techniques which can be used to optimize run time of algorithms. For a demon...
A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched net...
AbstractWe describe our implementation of several efficient parallel algorithms on the massively par...
his paper presents a technique that may be used to transform SIMD shared memory parallel s algorithm...
A massively parallel MIMD machine is costly and difficult to build. Implementations of MIMD machines...
This paper considers the expression and derivation of efficient data parallel programs for SIMD and ...
Two new parallel optimization algorithms based on the simplex method are described. They may be exec...
This paper presents a straightforward approach to determining how best to utilize an MIMD multiproce...
This dissertation proposes a new technique for efficient parallel solution of very large linear syst...
The increasing availability of multi-core and multiprocessor architectures provides new opportunitie...
Many important multimedia applications contain a significant fraction of reduction operations. Altho...
Highly parallel computing architectures are the only means to achieve the computation rates demanded...
We describe our implementation of several efficient parallel algorithms on the massively parallel SI...
Many loop nests in scientific codes contain a parallelizable outer loop but have an inner loop for w...
The P-RAM model of computation has proved to be a very useful theoretical model for exploiting and e...
This thesis talks about techniques which can be used to optimize run time of algorithms. For a demon...
A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched net...
AbstractWe describe our implementation of several efficient parallel algorithms on the massively par...