A challenging class of problems arising in many GPU applications, called batched problems, involves linear algebra operations on many small-sized matrices. We designed batched BLAS (Basic Linear Algebra Subroutines) routines, and in particular the Level-2 BLAS GEMV and the Level-3 BLAS GEMM routines, to solve them. We proposed device functions and big-tile settings in our batched BLAS design. We adopted auto-tuning to optimize different instances of GEMV routines. We illustrated our batched BLAS approach to optimize batched bi-diagonalization progressively on a K40c GPU. The optimization techniques in this paper are applicable to the other two-sided factorizations as well
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
International audienceWe study tiled algorithms for going from a " full " matrix to a condensed " ba...
During this thesis, we studied linear algebra systems with small matrices (typically from 2x2 to 5x5...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
The emergence of multicore and heterogeneous architectures requires many linear algebra algorithms t...
AbstractSolving a large number of relatively small linear systems has recently drawn more attention ...
Most methods for calculating the SVD (singular value decomposition) require to first bidiagonalize t...
Abstract—Currently, state of the art libraries, like MAGMA, focus on very large linear algebra probl...
Sparse matrix–vector multiplication (SpMV) is of singular importance in sparse linear algebra, which...
The singular values of a matrix are conventionally computed using either the bidiagonalization algo...
International audienceThis special issue of Parallel Computing contains nine articles, selected afte...
This paper presents a novel, high-performance, graphical processing unit-based algorithm for efficie...
Low rank matrix factorization is an important step in many high dimensional machine learning algorit...
Sparse matrix–vector multiplication (SpMV) is of singular importance in sparse linear algebra, which...
Abstract. Approximation of matrices using the Singular Value Decom-position (SVD) plays a central ro...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
International audienceWe study tiled algorithms for going from a " full " matrix to a condensed " ba...
During this thesis, we studied linear algebra systems with small matrices (typically from 2x2 to 5x5...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
The emergence of multicore and heterogeneous architectures requires many linear algebra algorithms t...
AbstractSolving a large number of relatively small linear systems has recently drawn more attention ...
Most methods for calculating the SVD (singular value decomposition) require to first bidiagonalize t...
Abstract—Currently, state of the art libraries, like MAGMA, focus on very large linear algebra probl...
Sparse matrix–vector multiplication (SpMV) is of singular importance in sparse linear algebra, which...
The singular values of a matrix are conventionally computed using either the bidiagonalization algo...
International audienceThis special issue of Parallel Computing contains nine articles, selected afte...
This paper presents a novel, high-performance, graphical processing unit-based algorithm for efficie...
Low rank matrix factorization is an important step in many high dimensional machine learning algorit...
Sparse matrix–vector multiplication (SpMV) is of singular importance in sparse linear algebra, which...
Abstract. Approximation of matrices using the Singular Value Decom-position (SVD) plays a central ro...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
International audienceWe study tiled algorithms for going from a " full " matrix to a condensed " ba...
During this thesis, we studied linear algebra systems with small matrices (typically from 2x2 to 5x5...