We consider the problem of data allocation when performing matrix multiplication on a heterogeneous node, with multicores and GPUs. Classical (cyclic) allocations designed for homogeneous settings are not appropriate, but the advent of task-based runtime systems makes it possible to use more general allocations. Previous theoretical work has proposed square and cube partitioning algorithms aimed at minimizing data movement for matrix multiplication. We propose techniques to adapt these continuous square partitionings to allocating discrete tiles of a matrix, and strategies to adapt the static allocation at run-time. We use these techniques in an implementation of Matrix Multiplication based on the StarPU runtime system, and we show through ...
We consider the problem of allocating and scheduling dense linear application on fully heterogeneous...
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused o...
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016...
International audienceWe consider the problem of data allocation when performing matrix multiplicati...
The tremendous increase in the size and heterogeneity of supercomputers makes it very difficult to p...
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and mul...
The tremendous increase in the size and heterogeneity of supercomputers makes it very difficult to p...
(eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogene...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
In this paper, an adaptive matrix multiplication algorithm for dynamic heterogeneous environments is...
Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster 2006), October, 2...
In this paper, we consider the problem of partitioning a square into a set of zones of prescribed ar...
We present a novel approach of distributing matrix multiplications among GPU-equipped nodes in a clu...
(eng) We study the implementation of dense linear algebra computations, such as matrix multiplicatio...
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplic...
We consider the problem of allocating and scheduling dense linear application on fully heterogeneous...
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused o...
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016...
International audienceWe consider the problem of data allocation when performing matrix multiplicati...
The tremendous increase in the size and heterogeneity of supercomputers makes it very difficult to p...
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and mul...
The tremendous increase in the size and heterogeneity of supercomputers makes it very difficult to p...
(eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogene...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
In this paper, an adaptive matrix multiplication algorithm for dynamic heterogeneous environments is...
Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster 2006), October, 2...
In this paper, we consider the problem of partitioning a square into a set of zones of prescribed ar...
We present a novel approach of distributing matrix multiplications among GPU-equipped nodes in a clu...
(eng) We study the implementation of dense linear algebra computations, such as matrix multiplicatio...
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplic...
We consider the problem of allocating and scheduling dense linear application on fully heterogeneous...
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused o...
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016...