The Level 3 BLAS (BLAS3) are a set of specifications of FORTRAN 77 subprograms for carrying out matrix multiplications and the solution of triangular systems with multiple right-hand sides. They are intended to provide efficient and portable building blocks for linear algebra algorithms on high-performance computers. We describe algorithms for the BLAS3 operations that are asymptotically faster than the conventional ones. These algorithms are based on Strassen's method for fast matrix multiplication, which is now recognized to be a practically useful technique once matrix dimensions exceed about 100. We pay particular attention to the numerical stability of these “fast BLAS3.” Error bounds are given and their significance is explained and i...
Abstract. Strassen's algorithm for fast matrix-matrix multiplication has been implemented for m...
. Basic Linear Algebra Subroutines (BLAS-3) [1] are building blocks to solve a lot of numerical prob...
In this thesis it is showed how an \(O(n^{4-\epsilon})\) algorithm for the cube multiplication probl...
The Level 3 BLAS (BLAS3) are a set of specifications of Fortran 77 subprograms for carrying out mat...
Fast algorithms for matrix multiplication, namely those that perform asymptotically fewer scalar ope...
Abstract. Matrix{matrix multiplication is normally computed using one of the BLAS or a reinvention o...
Block algorithms are becoming increasingly popular in matrix computations. Since their basic unit of...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
We perform forward error analysis for a large class of recursive matrix multiplication algorithms in...
The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply ...
AbstractThe main purpose of this paper is to present a fast matrix multiplication algorithm taken fr...
A simple but highly effective approach for transforming high-performance implementations on cachebas...
We perform forward error analysis for a large class of recursive matrix multiplication algorithms in...
The Fortran--90 standard requires an intrinsic function matmul which multiplies two matrices togethe...
Abstract. Strassen's algorithm for fast matrix-matrix multiplication has been implemented for m...
. Basic Linear Algebra Subroutines (BLAS-3) [1] are building blocks to solve a lot of numerical prob...
In this thesis it is showed how an \(O(n^{4-\epsilon})\) algorithm for the cube multiplication probl...
The Level 3 BLAS (BLAS3) are a set of specifications of Fortran 77 subprograms for carrying out mat...
Fast algorithms for matrix multiplication, namely those that perform asymptotically fewer scalar ope...
Abstract. Matrix{matrix multiplication is normally computed using one of the BLAS or a reinvention o...
Block algorithms are becoming increasingly popular in matrix computations. Since their basic unit of...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
We perform forward error analysis for a large class of recursive matrix multiplication algorithms in...
The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply ...
AbstractThe main purpose of this paper is to present a fast matrix multiplication algorithm taken fr...
A simple but highly effective approach for transforming high-performance implementations on cachebas...
We perform forward error analysis for a large class of recursive matrix multiplication algorithms in...
The Fortran--90 standard requires an intrinsic function matmul which multiplies two matrices togethe...
Abstract. Strassen's algorithm for fast matrix-matrix multiplication has been implemented for m...
. Basic Linear Algebra Subroutines (BLAS-3) [1] are building blocks to solve a lot of numerical prob...
In this thesis it is showed how an \(O(n^{4-\epsilon})\) algorithm for the cube multiplication probl...