For matrix multiplication on hypercube multiprocessors with the product matrix accumulated in place a processor must receive about P^2/√ N elements of each input operand, with operands of size PxP distributed evenly over N processors. With concurrent communication on all ports, the number of element transfers in sequence can be reduced to P^2/√N logN for each input operand. We present a two-level partitioning of the matrices and an algorithm for the matrix multiplication with optimal data motion and constant storage. The algorithm has sequential arithmetic complexity 2P^3, and parallel arithmetic complexity 2P^3/N. The algorithm has been implemented on the Connection Machine model CM-2. For the performance on the 8K CM-2, we measured about ...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
Matrix multiplication is a core building block for numerous scientific computing and, more recently,...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
In this paper we present an efficient dense matrix multi-plication algorithm for distributed memory ...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
The algorithm of multiplication of matrices of Dekel, Nassimi and Sahani or Hypercube is analysed, m...
Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster 2006), October, 2...
Matrix multiplication is one of the important operations in scientific and engineering application. ...
International audienceWe consider the problem of data allocation when performing matrix multiplicati...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
Computer (Article begins on next page) The Harvard community has made this article openly available....
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
Matrix multiplication is one of the important operations in scientific and engineering application. ...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
Matrix multiplication is a core building block for numerous scientific computing and, more recently,...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
In this paper we present an efficient dense matrix multi-plication algorithm for distributed memory ...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
The algorithm of multiplication of matrices of Dekel, Nassimi and Sahani or Hypercube is analysed, m...
Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster 2006), October, 2...
Matrix multiplication is one of the important operations in scientific and engineering application. ...
International audienceWe consider the problem of data allocation when performing matrix multiplicati...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
Computer (Article begins on next page) The Harvard community has made this article openly available....
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
Matrix multiplication is one of the important operations in scientific and engineering application. ...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
Matrix multiplication is a core building block for numerous scientific computing and, more recently,...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...