XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server

Gautier, Thierry
Lima, Joao Vicente Ferreira

Open PDF

Open link

Publication date

March 2020

DOI

10.1109/PDP50117.2020.00008

Publisher

IEEE

Abstract

International audienceIn the last ten years, GPUs have dominated the market considering the computing/power metric and numerous research works have provided Basic Linear Algebra Subprograms implementations accelerated on GPUs. Several software libraries have been developed for exploiting performance of systems with accelerators, but the real performance may be far from the platform peak performance. This paper presents XKBlas that aims to improve performance of BLAS-3 kernels on multi-GPU systems. At low level, we model computation as a set of tasks accessing data on different resources. At high level, the API design favors non-blocking calls as uniform concept to overlap latency, even by fine grain computation. Unit benchmark of BLAS-3 ker...

Extracted data

We use cookies to provide a better user experience.

Data Protection

XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server

Abstract

Extracted data

XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server

Abstract

Extracted data

Related items

Related items