This article describes the design rationale, a C implementation, and conformance testing of a subset of the new Standard for the BLAS (Basic Linear Algebra Subroutines): Extended and Mixed Precision BLAS. Permitting higher internal precision and mixed input/output types and precisions allows us to implement some algorithms that are simpler, more accurate, and sometimes faster than possible without these features. The new BLAS are challenging to implement and test because there are many more subroutines than in the existing Standard, and because we must be able to assess whether a higher precision is used for internal computations than is used for either input or output variables. We have therefore developed an automated process of generatin...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
As scientific computation continues to scale, it is crucial to use floating-point arithmetic process...
The BLAS-like Library Instantiation Software (BLIS) is a framework for the rapid instantiation of ba...
This article describes the design rationale, a C implementation, and conformance testing of a subse...
This paper describes a C implementation of the proposed new BLAS Standard. Permitting mixtures of i...
This report summarises the main points raised on a recent workshop discussing various extensions to ...
National audienceDue to non-associativity of floating-point operations and dynamic scheduling on par...
Today's floating-point arithmetic landscape is broader than ever. While scientific computing has tra...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
This working note examines different Fortran implementations of the Basic Linear Algebra Subprograms...
This paper summarizes the BLAS Technical Forum Standard, a speci- #cation of a set of kernel routine...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...
International audienceNumerical reproducibility failures appear in massively par-allel floating-poin...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...
We look at how both logical restructuring and improvements available from successive versions of For...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
As scientific computation continues to scale, it is crucial to use floating-point arithmetic process...
The BLAS-like Library Instantiation Software (BLIS) is a framework for the rapid instantiation of ba...
This article describes the design rationale, a C implementation, and conformance testing of a subse...
This paper describes a C implementation of the proposed new BLAS Standard. Permitting mixtures of i...
This report summarises the main points raised on a recent workshop discussing various extensions to ...
National audienceDue to non-associativity of floating-point operations and dynamic scheduling on par...
Today's floating-point arithmetic landscape is broader than ever. While scientific computing has tra...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
This working note examines different Fortran implementations of the Basic Linear Algebra Subprograms...
This paper summarizes the BLAS Technical Forum Standard, a speci- #cation of a set of kernel routine...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...
International audienceNumerical reproducibility failures appear in massively par-allel floating-poin...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...
We look at how both logical restructuring and improvements available from successive versions of For...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
As scientific computation continues to scale, it is crucial to use floating-point arithmetic process...
The BLAS-like Library Instantiation Software (BLIS) is a framework for the rapid instantiation of ba...