Many fast algorithms in arithmetic complexity have hierarchical or recursive structures that make efficient implementations on high performance computers with memory hierarchies non-trivial. In this paper we present our findings on efficient implementation of Strassen's algorithm[17] for the ubiquitous operation of matrix multiplication as a model for a class of recursive algorithms. In comparison to the conventional multiplication algorithm, Strassen's algorithm requires more storage space and exhibits poorer data locality. Although recent years have seen better representation and better implementations of the algorithm, the characterization of the optimization in implementation issues and hence the automatic optimization strateg...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
Today current era of scientific computing and computational theory involves high exhaustive data com...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
The paper presents analysis of matrix multiplication algorithms from the point of view of their effi...
Submitted for publication to IEEE TPDS The performance of both serial and parallel implementations o...
Abstract-- In this work, the performance of basic and strassen’s matrix multiplication algorithms ar...
International audienceWe propose several new schedules for Strassen-Winograd's matrix multiplication...
A tight Ω((n/M ̅ ̅√)log27M) lower bound is derived on the I/O complexity of Strassen’s algorithm to ...
Strassen's algorithm is a divide and conquer matrix multiplication method that is mostly of theoreti...
Abstract: Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count f...
This paper examines how to write code to gain high performance on modern computers as well as the im...
Matrix multiplication is one of the most widely used operations in all computational fields of linea...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
Today current era of scientific computing and computational theory involves high exhaustive data com...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
The paper presents analysis of matrix multiplication algorithms from the point of view of their effi...
Submitted for publication to IEEE TPDS The performance of both serial and parallel implementations o...
Abstract-- In this work, the performance of basic and strassen’s matrix multiplication algorithms ar...
International audienceWe propose several new schedules for Strassen-Winograd's matrix multiplication...
A tight Ω((n/M ̅ ̅√)log27M) lower bound is derived on the I/O complexity of Strassen’s algorithm to ...
Strassen's algorithm is a divide and conquer matrix multiplication method that is mostly of theoreti...
Abstract: Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count f...
This paper examines how to write code to gain high performance on modern computers as well as the im...
Matrix multiplication is one of the most widely used operations in all computational fields of linea...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
Today current era of scientific computing and computational theory involves high exhaustive data com...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...