Seamless optimization of the GEMM kernel for task-based programming models

Lorenzon, Arthur F.
Marques, Sandro M. V. N.
Navarro Muñoz, Antoni
Beltran Querol, Vicenç

Open PDF

Open link

Publication date

January 2022

DOI

10.1145/3524059.3532385.

Publisher

Association for Computing Machinery (ACM)

Language

English

Abstract

The general matrix-matrix multiplication (GEMM) kernel is a fundamental building block of many scientific applications. Many libraries such as Intel MKL and BLIS provide highly optimized sequential and parallel versions of this kernel. The parallel implementations of the GEMM kernel rely on the well-known fork-join execution model to exploit multi-core systems efficiently. However, these implementations are not well suited for task-based applications as they break the data-flow execution model. In this paper, we present a task-based implementation of the GEMM kernel that can be seamlessly leveraged by task-based applications while providing better performance than the fork-join version. Our implementation leverages several advanced features...