In order to reduce remote memory accesses on CC-NUMA multiprocessors, we present an interprocedural analysis to support static loop scheduling and data allocation. Given a parallelized program, the compiler constructs graphs which represent globally and interprocedurally the remote reference penalties associated with different choices for loop scheduling and data allocation. After deriving an optimal solution according to those graphs, the compiler generates data allocation directives and schedules DOALL loops. Experiments indicate that the proposed compiler scheme is efficient and simulation results show good performance of the parallel code. 1 Introduction Executing independent loop iterations on multiple processors is an important appro...
This paper proposes an efficient run-time system to schedule general nested loops on multiprocessors...
Shared-memory multiprocessor systems can achieve high performance levels when appropriate work paral...
This work leverages an original dependency analysis to parallelize loops regardless of their form i...
In this paper we present a unified approach for compiling programs for Distributed-Memory Multiproce...
Communication overhead in multiprocessor systems, as exemplified by cache coherency traffic and glob...
. We present several new compiler techniques employed by our interprocedural parallelizing research ...
Link to published version: http://ieeexplore.ieee.org/iel2/390/6075/00236705.pdf?tp=&arnumber=236705...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...
[[abstract]]In distributed memory multicomputers, local memory accesses are much faster than those i...
this paper we will present a solution to the problem of determining loop and data partitions automat...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...
The parallelization of complex, irregular scientific applications with various computational require...
While automatic parallelization of loops usually relies on compile-time analysis of data dependences...
In this paper, we study the problem of scheduling parallel loops at compile-time for a heterogeneous...
One of the major challenges in designing optimizing compilers, especially for scientific computation...
This paper proposes an efficient run-time system to schedule general nested loops on multiprocessors...
Shared-memory multiprocessor systems can achieve high performance levels when appropriate work paral...
This work leverages an original dependency analysis to parallelize loops regardless of their form i...
In this paper we present a unified approach for compiling programs for Distributed-Memory Multiproce...
Communication overhead in multiprocessor systems, as exemplified by cache coherency traffic and glob...
. We present several new compiler techniques employed by our interprocedural parallelizing research ...
Link to published version: http://ieeexplore.ieee.org/iel2/390/6075/00236705.pdf?tp=&arnumber=236705...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...
[[abstract]]In distributed memory multicomputers, local memory accesses are much faster than those i...
this paper we will present a solution to the problem of determining loop and data partitions automat...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...
The parallelization of complex, irregular scientific applications with various computational require...
While automatic parallelization of loops usually relies on compile-time analysis of data dependences...
In this paper, we study the problem of scheduling parallel loops at compile-time for a heterogeneous...
One of the major challenges in designing optimizing compilers, especially for scientific computation...
This paper proposes an efficient run-time system to schedule general nested loops on multiprocessors...
Shared-memory multiprocessor systems can achieve high performance levels when appropriate work paral...
This work leverages an original dependency analysis to parallelize loops regardless of their form i...