This report summarises the main points raised on a recent workshop discussing various extensions to the BLAS standard, held at the University of Tennessee in May 2016. In particular the discussions focused on batched, reproducible, and reduced precision BLAS extensions. Various members of the linear algebra community and representatives from Intel, NVIDIA, and ARM were present to generate and evaluate ideas in each of these areas
This working note examines different Fortran implementations of the Basic Linear Algebra Subprograms...
One trend in modern high performance computing (HPC) is to decompose a large linear algebra problem ...
Timing results for BLAS (Basic Linear Algebra Subprograms) libraries in R on diverse CPUs and GPUs. ...
This report summarises the main points raised on a recent workshop discussing various extensions to ...
This article describes the design rationale, a C implementation, and conformance testing of a subse...
This article describes the design rationale, a C implementation, and conformance testing of a subset...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
This paper summarizes the BLAS Technical Forum Standard, a speci- #cation of a set of kernel routine...
The BLAS-like Library Instantiation Software (BLIS) is a framework for the rapid instantiation of ba...
The BLAS-like Library Instantiation Software (BLIS) is a framework for the rapid instantiation of ba...
National audienceDue to non-associativity of floating-point operations and dynamic scheduling on par...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
This paper proposes an API for Batched Basic Linear Algebra Subprograms (Batched BLAS). We focus on...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra l...
This working note examines different Fortran implementations of the Basic Linear Algebra Subprograms...
One trend in modern high performance computing (HPC) is to decompose a large linear algebra problem ...
Timing results for BLAS (Basic Linear Algebra Subprograms) libraries in R on diverse CPUs and GPUs. ...
This report summarises the main points raised on a recent workshop discussing various extensions to ...
This article describes the design rationale, a C implementation, and conformance testing of a subse...
This article describes the design rationale, a C implementation, and conformance testing of a subset...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
This paper summarizes the BLAS Technical Forum Standard, a speci- #cation of a set of kernel routine...
The BLAS-like Library Instantiation Software (BLIS) is a framework for the rapid instantiation of ba...
The BLAS-like Library Instantiation Software (BLIS) is a framework for the rapid instantiation of ba...
National audienceDue to non-associativity of floating-point operations and dynamic scheduling on par...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
This paper proposes an API for Batched Basic Linear Algebra Subprograms (Batched BLAS). We focus on...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra l...
This working note examines different Fortran implementations of the Basic Linear Algebra Subprograms...
One trend in modern high performance computing (HPC) is to decompose a large linear algebra problem ...
Timing results for BLAS (Basic Linear Algebra Subprograms) libraries in R on diverse CPUs and GPUs. ...