In this report we address the issue of loop tiling to minimize the completion time of the loop when executed on multicomputers. We remove the restriction of atomicity of tiles and internal parallelism within tiles is exploited by overlapping computation with communication. The effectiveness of tiling is then critically dependent on the execution order of tasks within a tile. In this paper we present a theoretical framework based on equivalence classes that provides an optimal task ordering under assumptions of constant and different permutations of tasks in individual tiles. Our framework is able to handle constant but compile-time unknown dependences by generating optimal task permutations at run-time and results in significantly lower loo...
In this paper, we consider the problem of scheduling an application on a parallel computational plat...
Despite decades of research on high-level loop optimizations and theirsuccessful integration in prod...
We study the computational power of rational Piecewise Constant Derivative (PCD) systems. PCD system...
In the framework of fully permutable loops, tiling has been extensively studied as a source-to-sourc...
In this paper, an efficient algorithm to implement loop partitioning is introduced and evaluated. We...
This paper investigates the execution of tree-shaped task graphs using multiple processors. Each edg...
This paper investigates the execution of tree-shaped task graphs using multiple processors. Each edg...
In this paper, we survey loop parallelization algorithms, analyzing the dependence representations t...
We present a paradigm and implementation of a parallel control flow model for algorithmic patterns o...
In the data parallel programming style the user usually specifies the data parallelism explicitly so...
In this paper, an efficient algorithm to simultaneously implement array alignment and data/computati...
It is easy to find errors and inefficient parts of a sequential program, by using a standard debugge...
Scientific applications are usually described as directed acyclic graphs, where nodes represent tas...
Loop tiling is a loop transformation widely used to improve spatial and temporal data locality, to i...
Given a set $L$ of $n$ points in the $d$-dimensional Cartesian space $E^d$, and a query specifying a...
In this paper, we consider the problem of scheduling an application on a parallel computational plat...
Despite decades of research on high-level loop optimizations and theirsuccessful integration in prod...
We study the computational power of rational Piecewise Constant Derivative (PCD) systems. PCD system...
In the framework of fully permutable loops, tiling has been extensively studied as a source-to-sourc...
In this paper, an efficient algorithm to implement loop partitioning is introduced and evaluated. We...
This paper investigates the execution of tree-shaped task graphs using multiple processors. Each edg...
This paper investigates the execution of tree-shaped task graphs using multiple processors. Each edg...
In this paper, we survey loop parallelization algorithms, analyzing the dependence representations t...
We present a paradigm and implementation of a parallel control flow model for algorithmic patterns o...
In the data parallel programming style the user usually specifies the data parallelism explicitly so...
In this paper, an efficient algorithm to simultaneously implement array alignment and data/computati...
It is easy to find errors and inefficient parts of a sequential program, by using a standard debugge...
Scientific applications are usually described as directed acyclic graphs, where nodes represent tas...
Loop tiling is a loop transformation widely used to improve spatial and temporal data locality, to i...
Given a set $L$ of $n$ points in the $d$-dimensional Cartesian space $E^d$, and a query specifying a...
In this paper, we consider the problem of scheduling an application on a parallel computational plat...
Despite decades of research on high-level loop optimizations and theirsuccessful integration in prod...
We study the computational power of rational Piecewise Constant Derivative (PCD) systems. PCD system...