We present a novel approach of distributing matrix multiplications among GPU-equipped nodes in a cluster system. In this context we discuss the induced challenges and possible solutions. Additionally we state an algorithm which outperforms optimized GPU BLAS libraries for small matrices. Furthermore we provide a novel theoretical model for distributing algorithms within homogeneous computation systems with multiple hierarchies. In the context of this model we develop an algorithm which can find the optimal distribution parameters for each involved subalgorithm. We provide a detailed analysis of the algorithms space and time complexities and justify its use with a structured evaluation within a small GPU-equipped Beowulf cluster
In this document, we describe two strategies of distribution of computations that can be used to imp...
International audienceLarge clusters and supercomputers are rapidly evolving and may be subject to r...
This paper presents and analyzes two different strategies of heterogeneous distribution of computati...
The problem of executing large BLAS (basic linear algebra subprograms) Level-2 operations, such as m...
In this paper, an adaptive matrix multiplication algorithm for dynamic heterogeneous environments is...
We consider the problem of data allocation when performing matrix multiplication on a heterogeneous ...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Recent advances in GPUs (graphics processing units) lead to mas-sively parallel hardware that is eas...
The multiplication of a vector by a matrix is the kernel operation in many algorithms used in scient...
(eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogene...
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and mul...
The multiplication of large spare matrices is a basic operation for many scientific and engineering ...
One of the main problems of substructure-based parallel solution methods is the imbalances in the co...
Matrix multiplication is one of the important operations in scientific and engineering application. ...
Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster 2006), October, 2...
In this document, we describe two strategies of distribution of computations that can be used to imp...
International audienceLarge clusters and supercomputers are rapidly evolving and may be subject to r...
This paper presents and analyzes two different strategies of heterogeneous distribution of computati...
The problem of executing large BLAS (basic linear algebra subprograms) Level-2 operations, such as m...
In this paper, an adaptive matrix multiplication algorithm for dynamic heterogeneous environments is...
We consider the problem of data allocation when performing matrix multiplication on a heterogeneous ...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Recent advances in GPUs (graphics processing units) lead to mas-sively parallel hardware that is eas...
The multiplication of a vector by a matrix is the kernel operation in many algorithms used in scient...
(eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogene...
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and mul...
The multiplication of large spare matrices is a basic operation for many scientific and engineering ...
One of the main problems of substructure-based parallel solution methods is the imbalances in the co...
Matrix multiplication is one of the important operations in scientific and engineering application. ...
Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster 2006), October, 2...
In this document, we describe two strategies of distribution of computations that can be used to imp...
International audienceLarge clusters and supercomputers are rapidly evolving and may be subject to r...
This paper presents and analyzes two different strategies of heterogeneous distribution of computati...