We provide efficient single- and double-precision GPU (Graphics Processing Unit) implementa-tions of Strassen’s matrix multiplication algorithm as well as of Winograd’s variant of this algorithm. The single-precision implementations of these two algorithms are compared analytically using the arithmetic count, device-memory transactions, and device memory to multiprocessor data volume metrics. Our analysis indicates that, for 16384 × 16384 matrices, our single-precision implementation of Strassen’s algorithm limited to four levels of recursion reduces the number of arithmetics by 41.3%, the number of transactions by 33.7%, and the volume by 29.2 % relative to the best known GPU implementation of the classical n3 matrix multiplication algorit...
Due to non-associativity of floating-point operations and dynamic scheduling on parallel architectur...
Today’s hardware platforms have parallel processing capabilities and many parallel programming model...
We present several algorithms to compute the solution of a linear system of equations on a graphics ...
This paper presents initial experiments in implementing two notable matrix multiplication algorithms...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
In this paper we discuss about our experiences in improving the performance of two key algorithms: t...
General matrix-matrix multiplications with double-precision real and complex entries (DGEMM and ZGEM...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
Matrix multiplication is at the core of high-performance numerical computation. Software methods of ...
Today current era of scientific computing and computational theory involves high exhaustive data com...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
Modern graphics processing units (GPUs) have been at the leading edge of in-creasing chip-level para...
Abstract: Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count f...
Due to non-associativity of floating-point operations and dynamic scheduling on parallel architectur...
Today’s hardware platforms have parallel processing capabilities and many parallel programming model...
We present several algorithms to compute the solution of a linear system of equations on a graphics ...
This paper presents initial experiments in implementing two notable matrix multiplication algorithms...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
In this paper we discuss about our experiences in improving the performance of two key algorithms: t...
General matrix-matrix multiplications with double-precision real and complex entries (DGEMM and ZGEM...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
Matrix multiplication is at the core of high-performance numerical computation. Software methods of ...
Today current era of scientific computing and computational theory involves high exhaustive data com...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × ...
Modern graphics processing units (GPUs) have been at the leading edge of in-creasing chip-level para...
Abstract: Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count f...
Due to non-associativity of floating-point operations and dynamic scheduling on parallel architectur...
Today’s hardware platforms have parallel processing capabilities and many parallel programming model...
We present several algorithms to compute the solution of a linear system of equations on a graphics ...