The map operation, in which a function is applied indepen-dently to each element in a collection to produce a new col-lection, appears in many settings and is easy to parallelize. While a straightforward implementation in hardware will consist of multiple functional units with buffers to balance variable execution times, the best trade-off between these two components is not obvious. Too many buffers wastes re-sources that could otherwise perform computation; too few buffers causes functional units to lie idle waiting for empty buffers. Our work considers this abstract problem, derives worst-case workload distributions, then shows how to trade functional units for buffers to maximize throughput. Our re-sults can be used by designers and com...
In this paper we study the performance of four mapping algorithms. The four algorithms include two n...
A fundamental issue affecting the performance of a parallel application running on message-passing p...
This paper describes the performance of localitybased mapping and remapping partitioners for unstruc...
Abstract—The map is a higher-order function that applies a given function to the list or lists of el...
A faire apr`es Keywords: Parallel environment, Distributed-memory machines, Load-balancing, Mapping...
Parallelism, Optimal Data Distribution/Collection, P3L This document describes the MAP paradigm of ...
The need for high-performance computing together with the increasing trend from single processor to ...
Much compiler-orientated work in the area of mapping parallel programs to parallel architectures has...
International audienceA key problem for parallelizing compilers is to find the good tradeoff betwee...
The mapping problem has been studied extensively. However, algorithms which were designed to map a p...
International audienceData dependences are known to hamper efficient parallelization of programs. M...
The optimal mapping of tasks of a parallel program onto nodes of a parallel computing system has a r...
Finite functions (also called maps) are used to describe a number of key computations and storage me...
Algorithms for mitigating imbalance of the MapReduce computa-tions are considered in this paper. Map...
For a wide variety of applications, both task and data parallelism must be exploited to achieve the ...
In this paper we study the performance of four mapping algorithms. The four algorithms include two n...
A fundamental issue affecting the performance of a parallel application running on message-passing p...
This paper describes the performance of localitybased mapping and remapping partitioners for unstruc...
Abstract—The map is a higher-order function that applies a given function to the list or lists of el...
A faire apr`es Keywords: Parallel environment, Distributed-memory machines, Load-balancing, Mapping...
Parallelism, Optimal Data Distribution/Collection, P3L This document describes the MAP paradigm of ...
The need for high-performance computing together with the increasing trend from single processor to ...
Much compiler-orientated work in the area of mapping parallel programs to parallel architectures has...
International audienceA key problem for parallelizing compilers is to find the good tradeoff betwee...
The mapping problem has been studied extensively. However, algorithms which were designed to map a p...
International audienceData dependences are known to hamper efficient parallelization of programs. M...
The optimal mapping of tasks of a parallel program onto nodes of a parallel computing system has a r...
Finite functions (also called maps) are used to describe a number of key computations and storage me...
Algorithms for mitigating imbalance of the MapReduce computa-tions are considered in this paper. Map...
For a wide variety of applications, both task and data parallelism must be exploited to achieve the ...
In this paper we study the performance of four mapping algorithms. The four algorithms include two n...
A fundamental issue affecting the performance of a parallel application running on message-passing p...
This paper describes the performance of localitybased mapping and remapping partitioners for unstruc...