This article describes the design rationale, a C implementation, and conformance testing of a subset of the new Standard for the BLAS (Basic Linear Algebra Subroutines): Extended and Mixed Precision BLAS. Permitting higher internal precision and mixed input/output types and precisions allows us to implement some algorithms that are simpler, more accurate, and sometimes faster than possible without these features. The new BLAS are challenging to implement and test because there are many more subroutines than in the existing Standard, and because we must be able to assess whether a higher precision is used for internal computations than is used for either input or output variables. We have therefore developed an automated process of ge...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
The BLAS-like Library Instantiation Software (BLIS) is a framework for the rapid instantiation of ba...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...
This article describes the design rationale, a C implementation, and conformance testing of a subset...
This paper describes a C implementation of the proposed new BLAS Standard. Permitting mixtures of i...
This report summarises the main points raised on a recent workshop discussing various extensions to ...
National audienceDue to non-associativity of floating-point operations and dynamic scheduling on par...
This paper summarizes the BLAS Technical Forum Standard, a speci- #cation of a set of kernel routine...
This working note examines different Fortran implementations of the Basic Linear Algebra Subprograms...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...
Today's floating-point arithmetic landscape is broader than ever. While scientific computing has tra...
We look at how both logical restructuring and improvements available from successive versions of For...
International audienceNumerical reproducibility failures appear in massively par-allel floating-poin...
The BLAS-like Library Instantiation Software (BLIS) is a framework for the rapid instantiation of ba...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
The BLAS-like Library Instantiation Software (BLIS) is a framework for the rapid instantiation of ba...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...
This article describes the design rationale, a C implementation, and conformance testing of a subset...
This paper describes a C implementation of the proposed new BLAS Standard. Permitting mixtures of i...
This report summarises the main points raised on a recent workshop discussing various extensions to ...
National audienceDue to non-associativity of floating-point operations and dynamic scheduling on par...
This paper summarizes the BLAS Technical Forum Standard, a speci- #cation of a set of kernel routine...
This working note examines different Fortran implementations of the Basic Linear Algebra Subprograms...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...
Today's floating-point arithmetic landscape is broader than ever. While scientific computing has tra...
We look at how both logical restructuring and improvements available from successive versions of For...
International audienceNumerical reproducibility failures appear in massively par-allel floating-poin...
The BLAS-like Library Instantiation Software (BLIS) is a framework for the rapid instantiation of ba...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
The BLAS-like Library Instantiation Software (BLIS) is a framework for the rapid instantiation of ba...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...