Batched Kronecker product for 2-D matrices and 3-D arrays on NVIDIA GPUs

Chetan Jhurani

Publication date

January 2014

Abstract

We describe an interface and an implementation for performing Kronecker product actions on NVIDIA GPUs for multiple small 2-D matrices and 3-D arrays processed in parallel as a batch. This method is suited to cases where the Kronecker product component matrices are identical but the operands in a matrix-free application vary in the batch. Any batched GEMM (Gen-eral Matrix Multiply) implementation, for example ours [1] or the one in cuBLAS [2], can also be used for performing batched Kronecker products on GPUs. However, the specialized implementation presented here is faster and uses less memory. Partly this is because a simple GEMM based approach would require extra copies to and from main memory. We focus on matrix sizes less than or equal...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Batched Kronecker product for 2-D matrices and 3-D arrays on NVIDIA GPUs

Abstract

Extracted data

Batched Kronecker product for 2-D matrices and 3-D arrays on NVIDIA GPUs

Abstract

Extracted data

Related items

Related items