Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and high performance com-puting. We obtain a new parallel algorithm that is based on Strassen’s fast matrix multiplication and minimizes communi-cation. The algorithm outperforms all known parallel matrix multiplication algorithms, classical and Strassen-based, both asymptotically and in practice. A critical bottleneck in parallelizing Strassen’s algorithm is the communication between the processors. Ballard, Dem-mel, Holtz, and Schwartz (SPAA’11) prove lower bounds on these communication costs, using expansion properties of the underlying computation graph. Our algorithm matches these lower bounds, and so is communication-optimal. It exhibits per...
AbstractWe present a parallel method for matrix multiplication on distributed-memory MIMD architectu...
Today current era of scientific computing and computational theory involves high exhaustive data com...
This paper describes a novel parallel algorithm that implements a dense matrix multiplication operat...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
Abstract: Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count f...
We present a parallel method for matrix multiplication on distributedmemory MIMD architectures based...
[[abstract]]We present a parallel method for matrix multiplication on distributed-memory MIMD archit...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
AbstractWe present a parallel method for matrix multiplication on distributed-memory MIMD architectu...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of...
Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of...
Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of...
The paper presents analysis of matrix multiplication algorithms from the point of view of their effi...
AbstractWe present a parallel method for matrix multiplication on distributed-memory MIMD architectu...
Today current era of scientific computing and computational theory involves high exhaustive data com...
This paper describes a novel parallel algorithm that implements a dense matrix multiplication operat...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
Abstract: Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count f...
We present a parallel method for matrix multiplication on distributedmemory MIMD architectures based...
[[abstract]]We present a parallel method for matrix multiplication on distributed-memory MIMD archit...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
AbstractWe present a parallel method for matrix multiplication on distributed-memory MIMD architectu...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of...
Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of...
Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of...
The paper presents analysis of matrix multiplication algorithms from the point of view of their effi...
AbstractWe present a parallel method for matrix multiplication on distributed-memory MIMD architectu...
Today current era of scientific computing and computational theory involves high exhaustive data com...
This paper describes a novel parallel algorithm that implements a dense matrix multiplication operat...