In this paper, an adaptive matrix multiplication algorithm for dynamic heterogeneous environments is developed and evaluated. Unlike the state-of-the-art approaches, where load balancing is achieved through unequal distribution of the matrix data among the heterogeneous nodes, the matrices in our approach are partitioned into blocks of equal size. Task allocation and the block size are adapted during run time. Data pre-fetch is used to perform efficient communication. Our approach enables the use of various task scheduling heuristics. Further, we show that the control and coordination overheads of this approach are negligible when compared with the overall execution time. The effectiveness of the approach is verified through a configurable ...
International audienceThis paper is focused on designing efficient parallel matrix-product algorithm...
(eng) We study the implementation of dense linear algebra computations, such as matrix multiplicatio...
We address synchronization issues of some block matrix multiplication algorithms in a distributed co...
International audienceIn this paper, we address the issue of implementing matrix-matrix multiplicati...
(eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogene...
The tremendous increase in the size and heterogeneity of supercomputers makes it very difficult to p...
We consider the problem of data allocation when performing matrix multiplication on a heterogeneous ...
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and mul...
We present a novel approach of distributing matrix multiplications among GPU-equipped nodes in a clu...
The tremendous increase in the size and heterogeneity of supercomputers makes it very difficult to p...
Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster 2006), October, 2...
International audienceWe study the implementation of dense linear algebra computations, such as matr...
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016...
In this paper, we address the issue of imple-menting matrix-matrix multiplication on heteroge-neous ...
We propose an adaptive load balancing algorithm for heterogeneous distributed systems. The algorithm...
International audienceThis paper is focused on designing efficient parallel matrix-product algorithm...
(eng) We study the implementation of dense linear algebra computations, such as matrix multiplicatio...
We address synchronization issues of some block matrix multiplication algorithms in a distributed co...
International audienceIn this paper, we address the issue of implementing matrix-matrix multiplicati...
(eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogene...
The tremendous increase in the size and heterogeneity of supercomputers makes it very difficult to p...
We consider the problem of data allocation when performing matrix multiplication on a heterogeneous ...
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and mul...
We present a novel approach of distributing matrix multiplications among GPU-equipped nodes in a clu...
The tremendous increase in the size and heterogeneity of supercomputers makes it very difficult to p...
Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster 2006), October, 2...
International audienceWe study the implementation of dense linear algebra computations, such as matr...
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016...
In this paper, we address the issue of imple-menting matrix-matrix multiplication on heteroge-neous ...
We propose an adaptive load balancing algorithm for heterogeneous distributed systems. The algorithm...
International audienceThis paper is focused on designing efficient parallel matrix-product algorithm...
(eng) We study the implementation of dense linear algebra computations, such as matrix multiplicatio...
We address synchronization issues of some block matrix multiplication algorithms in a distributed co...