One of the most important and commonly used operations in many linear algebra functions is matrix-matrix multiplication (GEMM), which is also a key component in obtaining high performance of many scientific codes. It is a computationally intensive function requiring O(n3) operations, and its high computational intensity makes it well-suited to be significantly accelerated with GPUs. Today, many research problems require solving a very large number of relatively small GEMM operations that cannot utilise the entire GPU. To overcome this bottleneck, special functions have been developed that pack several GEMM operations into one and then compute them simultaneously on a GPU, which is called a batch operation. In this research work, we have pro...
Multiphysics systems are used to simulate various physics phenomena given byPartial Differential Equ...
Abstract We present an interface to the graphics processing unit (GPU) from MATLAB, and four algorit...
We describe an interface and an implementation for performing Kronecker product actions on NVIDIA GP...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
We present an interface and an implementation of the General Matrix Multiply (GEMM) routine for mult...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
Sparse matrix-matrix multiplication (SpMM) is a key operation in numerous ar- eas from information ...
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016...
As users and developers, we are witnessing the opening of a new computing scenario: the introduction...
The emergence of multicore and heterogeneous architectures requires many linear algebra algorithms t...
International audienceGPU matrix chain multiplication serves as a basis for a wide range of scientif...
International audienceThe race for Exascale computing has naturally led the current technologies to ...
AbstractSparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations....
Multiphysics systems are used to simulate various physics phenomena given byPartial Differential Equ...
Abstract We present an interface to the graphics processing unit (GPU) from MATLAB, and four algorit...
We describe an interface and an implementation for performing Kronecker product actions on NVIDIA GP...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
We present an interface and an implementation of the General Matrix Multiply (GEMM) routine for mult...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
Sparse matrix-matrix multiplication (SpMM) is a key operation in numerous ar- eas from information ...
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016...
As users and developers, we are witnessing the opening of a new computing scenario: the introduction...
The emergence of multicore and heterogeneous architectures requires many linear algebra algorithms t...
International audienceGPU matrix chain multiplication serves as a basis for a wide range of scientif...
International audienceThe race for Exascale computing has naturally led the current technologies to ...
AbstractSparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations....
Multiphysics systems are used to simulate various physics phenomena given byPartial Differential Equ...
Abstract We present an interface to the graphics processing unit (GPU) from MATLAB, and four algorit...
We describe an interface and an implementation for performing Kronecker product actions on NVIDIA GP...