The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems

Dongarra, Jack
Hammarling, Sven
Higham, Nicholas J.
Relton, Samuel D.
Valero-Lara, Pedro
Zounon, Mawussi

Open PDF

Open link

Publication date

January 2017

DOI

10.1016/j.procs.2017.05.138

Publisher

Elsevier BV

Journal

Procedia Computer Science

Abstract

A current trend in high-performance computing is to decompose a large linear algebra problem into batches containing thousands of smaller problems, that can be solved independently, before collating the results. To standardize the interface to these routines, the community is developing an extension to the BLAS standard (the batched BLAS), enabling users to perform thousands of small BLAS operations in parallel whilst making efficient use of their hardware. We discuss the benefits and drawbacks of the current batched BLAS proposals and perform a number of experiments, focusing on a general matrix-matrix multiplication (GEMM), to explore their affect on the performance. In particular we analyze the effect of novel data layouts which, for exa...

Extracted data

We use cookies to provide a better user experience.

Data Protection

The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems

Abstract

Extracted data

The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems

Abstract

Extracted data

Related items

Related items