The Level 3 BLAS (BLAS3) are a set of specifications of Fortran 77 subprograms for carrying out matrix multiplications and the solution of triangular systems with multiple right-hand sides. They are intended to provide efficient and portable building blocks for linear algebra algorithms on high performance computers. We describe algorithms for the BLAS3 operations that are asymptotically faster than the conventional ones. These algorithms are based on Strassen's method for fast matrix multiplication, which is now recognized to be a practically useful technique once matrix dimensions exceed about 100. We pay particular attention to the numerical stability of these "fast BLAS3". Error bounds are given and their significance is explai...
In this thesis it is showed how an \(O(n^{4-\epsilon})\) algorithm for the cube multiplication probl...
We perform forward error analysis for a large class of recursive matrix multiplication algorithms in...
This paper proposes a set of Level 3 Basic Linear Algebra Subprograms and associated kernels for sp...
The Level 3 BLAS (BLAS3) are a set of specifications of FORTRAN 77 subprograms for carrying out matr...
Fast algorithms for matrix multiplication, namely those that perform asymptotically fewer scalar ope...
Abstract. Matrix{matrix multiplication is normally computed using one of the BLAS or a reinvention o...
Block algorithms are becoming increasingly popular in matrix computations. Since their basic unit of...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply ...
We perform forward error analysis for a large class of recursive matrix multiplication algorithms in...
The Fortran--90 standard requires an intrinsic function matmul which multiplies two matrices togethe...
A simple but highly effective approach for transforming high-performance implementations on cachebas...
AbstractThe main purpose of this paper is to present a fast matrix multiplication algorithm taken fr...
Abstract. Strassen's algorithm for fast matrix-matrix multiplication has been implemented for m...
In this thesis it is showed how an \(O(n^{4-\epsilon})\) algorithm for the cube multiplication probl...
We perform forward error analysis for a large class of recursive matrix multiplication algorithms in...
This paper proposes a set of Level 3 Basic Linear Algebra Subprograms and associated kernels for sp...
The Level 3 BLAS (BLAS3) are a set of specifications of FORTRAN 77 subprograms for carrying out matr...
Fast algorithms for matrix multiplication, namely those that perform asymptotically fewer scalar ope...
Abstract. Matrix{matrix multiplication is normally computed using one of the BLAS or a reinvention o...
Block algorithms are becoming increasingly popular in matrix computations. Since their basic unit of...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply ...
We perform forward error analysis for a large class of recursive matrix multiplication algorithms in...
The Fortran--90 standard requires an intrinsic function matmul which multiplies two matrices togethe...
A simple but highly effective approach for transforming high-performance implementations on cachebas...
AbstractThe main purpose of this paper is to present a fast matrix multiplication algorithm taken fr...
Abstract. Strassen's algorithm for fast matrix-matrix multiplication has been implemented for m...
In this thesis it is showed how an \(O(n^{4-\epsilon})\) algorithm for the cube multiplication probl...
We perform forward error analysis for a large class of recursive matrix multiplication algorithms in...
This paper proposes a set of Level 3 Basic Linear Algebra Subprograms and associated kernels for sp...