We describe an interface and an implementation for performing Kronecker product actions on NVIDIA GPUs for multiple small 2-D matrices and 3-D arrays processed in parallel as a batch. This method is suited to cases where the Kronecker product component matrices are identical but the operands in a matrix-free application vary in the batch. Any batched GEMM (Gen-eral Matrix Multiply) implementation, for example ours [1] or the one in cuBLAS [2], can also be used for performing batched Kronecker products on GPUs. However, the specialized implementation presented here is faster and uses less memory. Partly this is because a simple GEMM based approach would require extra copies to and from main memory. We focus on matrix sizes less than or equal...
In this work, we address the efficient realization of block-Jacobi preconditioning on graphics proce...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
AbstractSolving a large number of relatively small linear systems has recently drawn more attention ...
We present an interface and an implementation of the General Matrix Multiply (GEMM) routine for mult...
The solving of tridiagonal systems is one of the most computationally expensive parts in many applic...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
We provide efficient single- and double-precision GPU (Graphics Processing Unit) implementa-tions of...
One of the most important and commonly used operations in many linear algebra functions is matrix-ma...
Multiphysics systems are used to simulate various physics phenomena given byPartial Differential Equ...
A challenging class of problems arising in many GPU applications, called batched problems, involves ...
Modern graphics processing units (GPUs) have been at the leading edge of in-creasing chip-level para...
AbstractThe Kronecker product has a rich and very pleasing algebra that supports a wide range of fas...
Background: In the last few years, the Non-negative Matrix Factorization (NMF) technique has gained ...
Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architec...
Abstract We present an interface to the graphics processing unit (GPU) from MATLAB, and four algorit...
In this work, we address the efficient realization of block-Jacobi preconditioning on graphics proce...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
AbstractSolving a large number of relatively small linear systems has recently drawn more attention ...
We present an interface and an implementation of the General Matrix Multiply (GEMM) routine for mult...
The solving of tridiagonal systems is one of the most computationally expensive parts in many applic...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
We provide efficient single- and double-precision GPU (Graphics Processing Unit) implementa-tions of...
One of the most important and commonly used operations in many linear algebra functions is matrix-ma...
Multiphysics systems are used to simulate various physics phenomena given byPartial Differential Equ...
A challenging class of problems arising in many GPU applications, called batched problems, involves ...
Modern graphics processing units (GPUs) have been at the leading edge of in-creasing chip-level para...
AbstractThe Kronecker product has a rich and very pleasing algebra that supports a wide range of fas...
Background: In the last few years, the Non-negative Matrix Factorization (NMF) technique has gained ...
Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architec...
Abstract We present an interface to the graphics processing unit (GPU) from MATLAB, and four algorit...
In this work, we address the efficient realization of block-Jacobi preconditioning on graphics proce...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
AbstractSolving a large number of relatively small linear systems has recently drawn more attention ...