Loops in scientific and engineering applications provide a rich source of parallelism. In order to obtain a higher level of parallelism, loops with loop-carried dependences, which are largely serialized using the traditional techniques, need to be parallelized with fine-grained synchronization. This approach, so-called DOACROSS parallelization, requires new optimization strategies in order to preserve parallelism while minimizing the amount of inter-processor communication. In this thesis, I examine closely issues involved in the DOACROSS parallelization. There are two focuses in this work: (1) increasing parallelism, and (2) reducing communication overhead. Strategies for four major optimization problems are proposed and described in detai...
Modern computers will increasingly rely on parallelism to achieve high computation rates. Techniques...
Power consumption and fabrication limitations are increasingly playing significant roles in the desi...
The optimization of programs with explicit--i.e. user specified--parallelism requires the computatio...
Loops in scientific and engineering applications provide a rich source of parallelism. In order to o...
We present two algorithms to minimize the amount of synchronization added when parallelizing a loop ...
Parallelizing compilers promise to exploit the parallelism available in a given program, particularl...
It is extremely difficult to parallelize DOACROSS loops with non-uniform loop-carried dependences. I...
While automatic parallelization of loops usually relies on compile-time analysis of data dependences...
In this paper, we focus on the need for two approaches to optimize producer and consumer synchroniza...
Loops are the main source of parallelism in scientific programs. Hence, several techniques were dev...
Abstract In this paper, an approach to the problem of exploiting parallelism within nested loops is ...
In general, any nested loop can be parallelized as long as all dependence constraints among iteratio...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...
: If the iterations of a loop nest cannot be partitioned into independent tasks, data communication ...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...
Modern computers will increasingly rely on parallelism to achieve high computation rates. Techniques...
Power consumption and fabrication limitations are increasingly playing significant roles in the desi...
The optimization of programs with explicit--i.e. user specified--parallelism requires the computatio...
Loops in scientific and engineering applications provide a rich source of parallelism. In order to o...
We present two algorithms to minimize the amount of synchronization added when parallelizing a loop ...
Parallelizing compilers promise to exploit the parallelism available in a given program, particularl...
It is extremely difficult to parallelize DOACROSS loops with non-uniform loop-carried dependences. I...
While automatic parallelization of loops usually relies on compile-time analysis of data dependences...
In this paper, we focus on the need for two approaches to optimize producer and consumer synchroniza...
Loops are the main source of parallelism in scientific programs. Hence, several techniques were dev...
Abstract In this paper, an approach to the problem of exploiting parallelism within nested loops is ...
In general, any nested loop can be parallelized as long as all dependence constraints among iteratio...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...
: If the iterations of a loop nest cannot be partitioned into independent tasks, data communication ...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...
Modern computers will increasingly rely on parallelism to achieve high computation rates. Techniques...
Power consumption and fabrication limitations are increasingly playing significant roles in the desi...
The optimization of programs with explicit--i.e. user specified--parallelism requires the computatio...