This paper describes a novel parallel algorithm that implements a dense matrix multiplication operation with algorithmic efficiency equivalent to that of Cannon’s algorithm. It is suitable for clusters and scalable shared memory systems. The current approach differs from the other parallel matrix multiplication algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. The experimental results on clusters (IBM SP, Linux-Myrinet) and shared memory systems (SGI Altix, Cray X1) demonstrate consistent performance advantages over pdgemm from the ScaLAPACK/PBBLAS suite, the leading implementation of the parallel matrix multiplication algorithms used today. In the best case on the SGI ...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
The sparse matrix--vector multiplication is an important kernel, but is hard to efficiently execute ...
AbstractWe present a parallel method for matrix multiplication on distributed-memory MIMD architectu...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
Today current era of scientific computing and computational theory involves high exhaustive data com...
Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time o...
Matrix multiplication is one of the important operations in scientific and engineering application. ...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
The paper presents analysis of matrix multiplication algorithms from the point of view of their effi...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
The multiplication of a vector by a matrix is the kernel operation in many algorithms used in scient...
We present a parallel method for matrix multiplication on distributedmemory MIMD architectures based...
Matrix multiplication is one of the important operations in scientific and engineering application. ...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
Abstract: Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count f...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
The sparse matrix--vector multiplication is an important kernel, but is hard to efficiently execute ...
AbstractWe present a parallel method for matrix multiplication on distributed-memory MIMD architectu...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
Today current era of scientific computing and computational theory involves high exhaustive data com...
Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time o...
Matrix multiplication is one of the important operations in scientific and engineering application. ...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
The paper presents analysis of matrix multiplication algorithms from the point of view of their effi...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
The multiplication of a vector by a matrix is the kernel operation in many algorithms used in scient...
We present a parallel method for matrix multiplication on distributedmemory MIMD architectures based...
Matrix multiplication is one of the important operations in scientific and engineering application. ...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
Abstract: Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count f...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
The sparse matrix--vector multiplication is an important kernel, but is hard to efficiently execute ...
AbstractWe present a parallel method for matrix multiplication on distributed-memory MIMD architectu...