The Fortran--90 standard requires an intrinsic function matmul which multiplies two matrices together to produce a third as the result. However, the standard does not specify which algorithm to use. We consider an extension to the matmul syntax which allows a Winograd variant of Strassen's algorithm to be added. We discuss an implementation that is in a commercial Fortran--90 offering. Key words. BLAS, matrix multiplication, Winograd's variant of Strassen's algorithm, multilevel algorithms AMS(MOS) subject classification. Numerical Analysis: Numerical Linear Algebra 1 Introduction Fortran--90 [1] has a rich set of intrinsic functions built into it that operate on large objects such as vectors and matrices. We would like ...
We provide efficient single- and double-precision GPU (Graphics Processing Unit) implementa-tions of...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...
We look at how both logical restructuring and improvements available from successive versions of For...
Abstract. Matrix{matrix multiplication is normally computed using one of the BLAS or a reinvention o...
Strassen's algorithm is a divide and conquer matrix multiplication method that is mostly of theoreti...
This report presents two new ASSEMBLER-subroutines MULR8 and MULC16 for fast multiplication of espec...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
A package of 38 low-level subprograms for many of the basic operations of numerical linear algebra i...
Abstract. Strassen's algorithm for fast matrix-matrix multiplication has been implemented for m...
The Level 3 BLAS (BLAS3) are a set of specifications of Fortran 77 subprograms for carrying out mat...
The Level 3 BLAS (BLAS3) are a set of specifications of FORTRAN 77 subprograms for carrying out matr...
Abstract. Recent advances in computing allow taking new look at ma-trix multiplication, where the ke...
Many fast algorithms in arithmetic complexity have hierarchical or recursive structures that make ef...
The paper presents analysis of matrix multiplication algorithms from the point of view of their effi...
Abstract: Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count f...
We provide efficient single- and double-precision GPU (Graphics Processing Unit) implementa-tions of...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...
We look at how both logical restructuring and improvements available from successive versions of For...
Abstract. Matrix{matrix multiplication is normally computed using one of the BLAS or a reinvention o...
Strassen's algorithm is a divide and conquer matrix multiplication method that is mostly of theoreti...
This report presents two new ASSEMBLER-subroutines MULR8 and MULC16 for fast multiplication of espec...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
A package of 38 low-level subprograms for many of the basic operations of numerical linear algebra i...
Abstract. Strassen's algorithm for fast matrix-matrix multiplication has been implemented for m...
The Level 3 BLAS (BLAS3) are a set of specifications of Fortran 77 subprograms for carrying out mat...
The Level 3 BLAS (BLAS3) are a set of specifications of FORTRAN 77 subprograms for carrying out matr...
Abstract. Recent advances in computing allow taking new look at ma-trix multiplication, where the ke...
Many fast algorithms in arithmetic complexity have hierarchical or recursive structures that make ef...
The paper presents analysis of matrix multiplication algorithms from the point of view of their effi...
Abstract: Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count f...
We provide efficient single- and double-precision GPU (Graphics Processing Unit) implementa-tions of...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...
We look at how both logical restructuring and improvements available from successive versions of For...