Submitted for publication to IEEE TPDS The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause tradi-tional column-major or row-major array layouts to incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts to improve performance and reduce variability. Previous work on recursive matrix multiplication is extended to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication, and the more complex algorithms of Strassen and Winograd. While recursive layouts significantly outper-form traditional layouts (reduc...
This Master Thesis examines if a matrix multiplication program that combines the two efficiency stra...
Programming languages that provide multidimensional arrays and a flat linear model of memory must im...
Via novel path-routing techniques we prove a lower bound on the I/O-complexity of all recursive matr...
Many fast algorithms in arithmetic complexity have hierarchical or recursive structures that make ef...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...
AbstractIn this article, we present a fast algorithm for matrix multiplication optimized for recent ...
The paper presents analysis of matrix multiplication algorithms from the point of view of their effi...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
Abstract-- In this work, the performance of basic and strassen’s matrix multiplication algorithms ar...
Matrix-Matrix Multiplication (MMM) is a highly important kernel in linear algebra algorithms and the...
This paper examines the recursive multiplier and some potential enhancements f o r it. The delay of ...
The complexity of matrix multiplication (hereafter MM) has been intensively studied since 1969, when...
International audienceWe propose several new schedules for Strassen-Winograd's matrix multiplication...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
This Master Thesis examines if a matrix multiplication program that combines the two efficiency stra...
Programming languages that provide multidimensional arrays and a flat linear model of memory must im...
Via novel path-routing techniques we prove a lower bound on the I/O-complexity of all recursive matr...
Many fast algorithms in arithmetic complexity have hierarchical or recursive structures that make ef...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...
AbstractIn this article, we present a fast algorithm for matrix multiplication optimized for recent ...
The paper presents analysis of matrix multiplication algorithms from the point of view of their effi...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
Abstract-- In this work, the performance of basic and strassen’s matrix multiplication algorithms ar...
Matrix-Matrix Multiplication (MMM) is a highly important kernel in linear algebra algorithms and the...
This paper examines the recursive multiplier and some potential enhancements f o r it. The delay of ...
The complexity of matrix multiplication (hereafter MM) has been intensively studied since 1969, when...
International audienceWe propose several new schedules for Strassen-Winograd's matrix multiplication...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
This Master Thesis examines if a matrix multiplication program that combines the two efficiency stra...
Programming languages that provide multidimensional arrays and a flat linear model of memory must im...
Via novel path-routing techniques we prove a lower bound on the I/O-complexity of all recursive matr...