One trend in modern high performance computing (HPC) is to decompose a large linear algebra problem into thousands of small problems which can be solved indepen- dently. There is a clear need for a batched BLAS standard, allowing users to perform thousands of small BLAS operations in parallel and making efficient use of their hard- ware. There are many possible ways in which the BLAS standard can be extended for batch operations. We discuss many of these possible designs, giving benefits and criticisms of each, along with a number of experiments designed to determine how the API may affect performance on modern HPC systems. Related issues that influence API design, such as the effect of memory layout on performance, are also discussed
The efficiency of numerical libraries for a given computation is highly dependent on the size of the...
BLIS is a new framework for rapid instantiation of the BLAS. We describe how BLIS extends the “GotoB...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
A current trend in high-performance computing is to decompose a large linear algebra prob- lem into ...
This paper proposes an API for Batched Basic Linear Algebra Subprograms (Batched BLAS). We focus on...
This report summarises the main points raised on a recent workshop discussing various extensions to ...
Abstract The Basic Linear Algebra Subprograms, BLAS, are the basic computa-tional kernels in most ap...
Scientific applications are some of the most computationally demanding software pieces. Their core i...
Combinatorial algorithms such as those that arise in graph analysis, modeling of discrete systems, b...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
The aim of this project was to encapsulate the needs of computational science applications. Performa...
This article describes the design rationale, a C implementation, and conformance testing of a subse...
Many scientific applications are in need to solve a high number of small-size independent problems. ...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
The efficiency of numerical libraries for a given computation is highly dependent on the size of the...
BLIS is a new framework for rapid instantiation of the BLAS. We describe how BLIS extends the “GotoB...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
A current trend in high-performance computing is to decompose a large linear algebra prob- lem into ...
This paper proposes an API for Batched Basic Linear Algebra Subprograms (Batched BLAS). We focus on...
This report summarises the main points raised on a recent workshop discussing various extensions to ...
Abstract The Basic Linear Algebra Subprograms, BLAS, are the basic computa-tional kernels in most ap...
Scientific applications are some of the most computationally demanding software pieces. Their core i...
Combinatorial algorithms such as those that arise in graph analysis, modeling of discrete systems, b...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
The aim of this project was to encapsulate the needs of computational science applications. Performa...
This article describes the design rationale, a C implementation, and conformance testing of a subse...
Many scientific applications are in need to solve a high number of small-size independent problems. ...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
The efficiency of numerical libraries for a given computation is highly dependent on the size of the...
BLIS is a new framework for rapid instantiation of the BLAS. We describe how BLIS extends the “GotoB...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...