Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expense of reduced locality of reference, which makes it challenging to implement the algorithm e#ciently on a modern machine with a hierarchical memory system. We report on an implementation of this algorithm that uses several unconventional techniques to make the algorithm memory-friendly. First, the algorithm internally uses a non-standard arraylayout known as Morton order that is based on a quad-tree decomposition of the matrix. Second, we dynamically select the recursion truncation point to minimize padding without a#ecting the performance of the algorithm, whichwe can do by virtue of the cache behavior of the Morton ordering. Each te...
In this study, we propose a simple method for fault-tolerant Strassen-like matrix multiplications. T...
Today current era of scientific computing and computational theory involves high exhaustive data com...
This Master Thesis examines if a matrix multiplication program that combines the two efficiency stra...
Many fast algorithms in arithmetic complexity have hierarchical or recursive structures that make ef...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
Submitted for publication to IEEE TPDS The performance of both serial and parallel implementations o...
This paper examines how to write code to gain high performance on modern computers as well as the im...
A proof of concept is offered for the uniform representation of matrices serially in Morton-order (o...
Abstract: Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count f...
Abstract-- In this work, the performance of basic and strassen’s matrix multiplication algorithms ar...
International audienceWe propose several new schedules for Strassen-Winograd's matrix multiplication...
During the last half-decade, a number of research efforts have centered around developing software f...
Abstract. Strassen's algorithm for fast matrix-matrix multiplication has been implemented for m...
The paper presents analysis of matrix multiplication algorithms from the point of view of their effi...
We present a parallel method for matrix multiplication on distributedmemory MIMD architectures based...
In this study, we propose a simple method for fault-tolerant Strassen-like matrix multiplications. T...
Today current era of scientific computing and computational theory involves high exhaustive data com...
This Master Thesis examines if a matrix multiplication program that combines the two efficiency stra...
Many fast algorithms in arithmetic complexity have hierarchical or recursive structures that make ef...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
Submitted for publication to IEEE TPDS The performance of both serial and parallel implementations o...
This paper examines how to write code to gain high performance on modern computers as well as the im...
A proof of concept is offered for the uniform representation of matrices serially in Morton-order (o...
Abstract: Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count f...
Abstract-- In this work, the performance of basic and strassen’s matrix multiplication algorithms ar...
International audienceWe propose several new schedules for Strassen-Winograd's matrix multiplication...
During the last half-decade, a number of research efforts have centered around developing software f...
Abstract. Strassen's algorithm for fast matrix-matrix multiplication has been implemented for m...
The paper presents analysis of matrix multiplication algorithms from the point of view of their effi...
We present a parallel method for matrix multiplication on distributedmemory MIMD architectures based...
In this study, we propose a simple method for fault-tolerant Strassen-like matrix multiplications. T...
Today current era of scientific computing and computational theory involves high exhaustive data com...
This Master Thesis examines if a matrix multiplication program that combines the two efficiency stra...