[[abstract]]Minimizing interprocessor communication is the key to a parallelized program on execution in multicomputers. This paper addresses a compilation technique to achieve the goal of generating an efficient parallelized code with both reducing the incurred communication cost and preserving parallelism. First, we transform an n-nested loop into a k-projected structure with supporting an evaluation function to evaulate these projected structures to obtain a certain k-projected structure with less parallel executing time. Next, a mapping strategy is proposed to map the k-projected structure onto hypercubes to be executed in parallel in a way with workload balance and low communication cost amoung all of processors
Over the past two decades tremendous progress has been made in both the design of parallel architect...
This paper presents compiler algorithms to optimize out-of-core programs. These algorithms consider ...
In this paper, we present an efficient framework for intraprocedural performance based program parti...
[[abstract]]Intensive scientific algorithms can usually be formulated as nested loops which are the ...
[[abstract]]Efficient methods of partitioning nested for-loops for parallel execution on multicomput...
The task-to-processor mapping problem is addressed in the context of a local-memory multiprocessor w...
This paper addresses the problems of communication -free partitions of statement-iterations of neste...
[[abstract]]In distributed memory multicomputers, local memory accesses are much faster than those i...
Communication overhead in multiprocessor systems, as exemplified by cache coherency traffic and glob...
Abstract In this paper, an approach to the problem of exploiting parallelism within nested loops is ...
Many parallel algorithms exhibit a hypercube communication topology. Such algorithms can easily be e...
Application specific MPSoCs are often used to implement high-performance data-intensive applications...
We deal with compiler support for parallelizing perfectly nested loops for coarse-grain distributed ...
Mapping of parallel programs onto parallel computers for efficient execution is a fundamental proble...
Chain-based scheduling [1] is an efficient partitioning and scheduling scheme for nested loops on di...
Over the past two decades tremendous progress has been made in both the design of parallel architect...
This paper presents compiler algorithms to optimize out-of-core programs. These algorithms consider ...
In this paper, we present an efficient framework for intraprocedural performance based program parti...
[[abstract]]Intensive scientific algorithms can usually be formulated as nested loops which are the ...
[[abstract]]Efficient methods of partitioning nested for-loops for parallel execution on multicomput...
The task-to-processor mapping problem is addressed in the context of a local-memory multiprocessor w...
This paper addresses the problems of communication -free partitions of statement-iterations of neste...
[[abstract]]In distributed memory multicomputers, local memory accesses are much faster than those i...
Communication overhead in multiprocessor systems, as exemplified by cache coherency traffic and glob...
Abstract In this paper, an approach to the problem of exploiting parallelism within nested loops is ...
Many parallel algorithms exhibit a hypercube communication topology. Such algorithms can easily be e...
Application specific MPSoCs are often used to implement high-performance data-intensive applications...
We deal with compiler support for parallelizing perfectly nested loops for coarse-grain distributed ...
Mapping of parallel programs onto parallel computers for efficient execution is a fundamental proble...
Chain-based scheduling [1] is an efficient partitioning and scheduling scheme for nested loops on di...
Over the past two decades tremendous progress has been made in both the design of parallel architect...
This paper presents compiler algorithms to optimize out-of-core programs. These algorithms consider ...
In this paper, we present an efficient framework for intraprocedural performance based program parti...