With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses how to find an optimal supernode shape of a supernode transformation (also known as tiling). We assume the communication cost to be dominated by the startup penalty and therefore, can be approximated by a constant. We identify three parameters of supernode transformation: supernode size, relative side lengths, and cutting hyperplane directions. For algorithms with perfectly nested loops and uniform dependencies, we give a closed form for an optimal linear schedule vector, a necessary and sufficient condition for an optimal relative side lengths, and for dependence cones with n extreme directions, we...
AMS subject classification: 68Q22, 90C90We discuss in this paper the problem of generating highly ef...
AbstractIt is shown that the problem of finding a maximal set of paths in a given (undirected or dir...
Abstract — There exist several scheduling schemes for parallelizing loops without dependences for sh...
Abstract — In this paper we revisit the supernode-shape selec-tion problem, that has been widely dis...
Iteration space tiling is a common strategy used by parallelizing compilers and in performance tunin...
this paper how to execute a class of n + 1-dimensional uniform recurrences in SPMD (Single Program M...
Iteration space tiling is a common strategy used by parallelizing compilers to reduce communication ...
Many computationally-intensive programs, such as those for differential equations, spatial interpola...
Many computationally-intensive programs, such as those for differential equations, spatial interpola...
Many computationally-intensive programs, such as those for differential equations, spatial interpola...
: Dehne presented an optimal algorithm to compute the contour of the maximal elements of n planar po...
Three related problems, among others, are faced when trying to execute an algorithm on a parallel ma...
We present parallel computational geometry algorithms that are scalable, architecture independent, e...
. The problem of finding the minimum topology of multiprocessing substrates supporting parallel exec...
AbstractWe present two parallel algorithms for finding a maximal set of paths in a given undirected ...
AMS subject classification: 68Q22, 90C90We discuss in this paper the problem of generating highly ef...
AbstractIt is shown that the problem of finding a maximal set of paths in a given (undirected or dir...
Abstract — There exist several scheduling schemes for parallelizing loops without dependences for sh...
Abstract — In this paper we revisit the supernode-shape selec-tion problem, that has been widely dis...
Iteration space tiling is a common strategy used by parallelizing compilers and in performance tunin...
this paper how to execute a class of n + 1-dimensional uniform recurrences in SPMD (Single Program M...
Iteration space tiling is a common strategy used by parallelizing compilers to reduce communication ...
Many computationally-intensive programs, such as those for differential equations, spatial interpola...
Many computationally-intensive programs, such as those for differential equations, spatial interpola...
Many computationally-intensive programs, such as those for differential equations, spatial interpola...
: Dehne presented an optimal algorithm to compute the contour of the maximal elements of n planar po...
Three related problems, among others, are faced when trying to execute an algorithm on a parallel ma...
We present parallel computational geometry algorithms that are scalable, architecture independent, e...
. The problem of finding the minimum topology of multiprocessing substrates supporting parallel exec...
AbstractWe present two parallel algorithms for finding a maximal set of paths in a given undirected ...
AMS subject classification: 68Q22, 90C90We discuss in this paper the problem of generating highly ef...
AbstractIt is shown that the problem of finding a maximal set of paths in a given (undirected or dir...
Abstract — There exist several scheduling schemes for parallelizing loops without dependences for sh...