Performance Portable GPU Code Generation for Matrix Multiplication

Remmelg, Toomas
Lutz, Thibaut
Steuwer, Michel
Dubach, Christophe

Open PDF

Open link

Publication date

March 2016

DOI

10.1145/2884045.2884046

Publisher

Association for Computing Machinery (ACM)

Language

English

Abstract

Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full performance potential is a job best left for ninja programmers. High-level programming languages coupled with optimizing compilers have been proposed to attempt to address this issue. However, they rely on device-specific heuristics or hard-coded library implementations to achieve good performance resulting in non-portable solutions that need to be re-optimized for every new device. Achieving performance portability is the holy grail of high-performance computing and has so far remained an open problem even for well studied applications like matrix multiplication. We argue that what is needed is a way to descri...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Performance Portable GPU Code Generation for Matrix Multiplication

Abstract

Extracted data

Performance Portable GPU Code Generation for Matrix Multiplication

Abstract

Extracted data

Related items

Related items