The arrival of multicore architectures has generated an interest in reformulating dense matrix computations as algorithms-by-blocks, where submatrices are units of data and computations with those blocks are units of computation. Rather than directly executing such an algorithm, a directed acyclic graph is generated at runtime that is then scheduled by a runtime system such as SuperMatrix. The benefit is a clear separation of concerns between the library and the heuristics for scheduling. In this paper, we show that this approach can be taken one step further using the same methodology and an ad hoc runtime to map algorithms-by-blocks to small clusters. With no change to the library code, and the application that uses it, the computational ...
In this paper, we survey algorithms that allocate a parallel program represented by an edge-weighted...
International audienceWhen scheduling a directed acyclic graph (DAG) of tasks with communication cos...
Abstract 1 In this paper, we survey algorithms that allocate a parallel program represented by an ed...
The multiplication of large spare matrices is a basic operation for many scientific and engineering ...
We discuss the high-performance parallel implementation and execution of dense linear algebra matrix...
Scheduling a large number of applications on a cluster computing environment is a serious obstacle t...
Scheduling problems are essential for decision making in many academic disciplines, including operat...
Matrix multiplication is one of the important operations in scientific and engineering application. ...
The era of manycore computing will bring new fundamental challenges that the techniques designed for...
This paper presents a novel scheme to schedule loops for clustered microarchitectures. The scheme is...
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused o...
MapReduce is a framework proposed by Google for processing huge amounts of data in a distributed env...
Cluster-based data-parallel frameworks such as MapReduce, Hadoop, and Dryad are increasingly popular...
Scheduling of sporadic task systems on multiprocessor platforms is an area which has received much a...
In this article, we revisit the problem of scheduling dy-namically generated directed acyclic graphs...
In this paper, we survey algorithms that allocate a parallel program represented by an edge-weighted...
International audienceWhen scheduling a directed acyclic graph (DAG) of tasks with communication cos...
Abstract 1 In this paper, we survey algorithms that allocate a parallel program represented by an ed...
The multiplication of large spare matrices is a basic operation for many scientific and engineering ...
We discuss the high-performance parallel implementation and execution of dense linear algebra matrix...
Scheduling a large number of applications on a cluster computing environment is a serious obstacle t...
Scheduling problems are essential for decision making in many academic disciplines, including operat...
Matrix multiplication is one of the important operations in scientific and engineering application. ...
The era of manycore computing will bring new fundamental challenges that the techniques designed for...
This paper presents a novel scheme to schedule loops for clustered microarchitectures. The scheme is...
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused o...
MapReduce is a framework proposed by Google for processing huge amounts of data in a distributed env...
Cluster-based data-parallel frameworks such as MapReduce, Hadoop, and Dryad are increasingly popular...
Scheduling of sporadic task systems on multiprocessor platforms is an area which has received much a...
In this article, we revisit the problem of scheduling dy-namically generated directed acyclic graphs...
In this paper, we survey algorithms that allocate a parallel program represented by an edge-weighted...
International audienceWhen scheduling a directed acyclic graph (DAG) of tasks with communication cos...
Abstract 1 In this paper, we survey algorithms that allocate a parallel program represented by an ed...