Vector multiprocessors rely on both spatial and temporal parallelism for achieving significant speedup. For singly nested loops, we study the effect on the speedup of: 1) loop fusion and, 2) increasing the granule-size of parallel-vector loops using extracted statements from scalar loops. The proposed optimizations migrate vector statements from one loop to another, create new loops, and reduce others. Loops and statements that belong to strongly connected data paths are vertically fused, whenever possible, in order to promote chaining and cache/register reuse. To reduce loop synchronization, horizontal fusion is also used for independent loops having compatible dependence types. Finally, vector operations are scheduled based on knowledge o...
An emerging trend in processor design is the incorporation of short vector instructions into the ISA...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
Data locality and synchronization overhead are two important factors that affect the performance of ...
Parallelizing compilers promise to exploit the parallelism available in a given program, particularl...
Modern computers will increasingly rely on parallelism to achieve high computation rates. Techniques...
An emerging trend in processor design is the addition of short vector instructions to general-purpos...
Abstract. Loop fusion is a program transformation that merges multi-ple loops into one. It is eectiv...
Loop fusion improves data locality and reduces synchronization in data-parallel applications. Howeve...
Modern compilers offer more and more capabilities to automatically parallelize code-regions if these...
Loops in scientific and engineering applications provide a rich source of parallelism. In order to o...
Power consumption and fabrication limitations are increasingly playing significant roles in the desi...
Developing efficient programs for many of the current parallel computers is not easy due to the arch...
Newer architectures continue to expand vector sizes and increase the different number of vec-tor ins...
Loop vectorization, a key feature exploited to obtain high perfor-mance on Single Instruction Multip...
In this paper we analyze the effect of compiler optimizations on fine grain parallelism in scalar pr...
An emerging trend in processor design is the incorporation of short vector instructions into the ISA...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
Data locality and synchronization overhead are two important factors that affect the performance of ...
Parallelizing compilers promise to exploit the parallelism available in a given program, particularl...
Modern computers will increasingly rely on parallelism to achieve high computation rates. Techniques...
An emerging trend in processor design is the addition of short vector instructions to general-purpos...
Abstract. Loop fusion is a program transformation that merges multi-ple loops into one. It is eectiv...
Loop fusion improves data locality and reduces synchronization in data-parallel applications. Howeve...
Modern compilers offer more and more capabilities to automatically parallelize code-regions if these...
Loops in scientific and engineering applications provide a rich source of parallelism. In order to o...
Power consumption and fabrication limitations are increasingly playing significant roles in the desi...
Developing efficient programs for many of the current parallel computers is not easy due to the arch...
Newer architectures continue to expand vector sizes and increase the different number of vec-tor ins...
Loop vectorization, a key feature exploited to obtain high perfor-mance on Single Instruction Multip...
In this paper we analyze the effect of compiler optimizations on fine grain parallelism in scalar pr...
An emerging trend in processor design is the incorporation of short vector instructions into the ISA...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
Data locality and synchronization overhead are two important factors that affect the performance of ...