In this paper we present an efficient dense matrix multi-plication algorithm for distributed memory computers with a hypercube topology. The proposed algorithm performs better than all previously proposed algorithms for a wide range of matrix sizes and number of processors, especially for large matrices. We analyze the performance of the algorithms for two types of hypercube architectures, one in which each node can use (to send and receive.) at most one communication link at a time and the other & which eaeh node can use all communication links simultaneously
Hypercube algorithms are developed for a variety of commun-ication-intensive tasks such as transposi...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
(eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogene...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
The algorithm of multiplication of matrices of Dekel, Nassimi and Sahani or Hypercube is analysed, m...
This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collec...
For matrix multiplication on hypercube multiprocessors with the product matrix accumulated in place ...
The use of an appropriate methodology for calculating the communication cost, time complexity and pe...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Processor allocation and the task scheduling technique in parallel processing systems play a signifi...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster 2006), October, 2...
Hypercube algorithms are developed for a variety of communication-intensive tasks such as transposin...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
Hypercube algorithms are developed for a variety of commun-ication-intensive tasks such as transposi...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
(eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogene...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
The algorithm of multiplication of matrices of Dekel, Nassimi and Sahani or Hypercube is analysed, m...
This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collec...
For matrix multiplication on hypercube multiprocessors with the product matrix accumulated in place ...
The use of an appropriate methodology for calculating the communication cost, time complexity and pe...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Processor allocation and the task scheduling technique in parallel processing systems play a signifi...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster 2006), October, 2...
Hypercube algorithms are developed for a variety of communication-intensive tasks such as transposin...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
Hypercube algorithms are developed for a variety of commun-ication-intensive tasks such as transposi...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
(eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogene...