National audienceDue to non-associativity of floating-point operations and dynamic scheduling on parallel architectures, getting a bitwise reproducible floating-point result for multiple executions of the same code on different or even similar parallel architectures is challenging. We address the problem of reproducibility in the context of fundamental linear algebra operations — like the ones included in the Basic Linear Algebra Subprograms (BLAS) library — and propose algorithms that yields both reproducible and accurate (rounding to the nearest) results. We present implementations of these reproducible and accurate algorithms for the BLAS routines in parallel environments such as Intel server CPUs, Intel Xeon Phi, and both NVIDIA and AMD...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
This working note examines different Fortran implementations of the Basic Linear Algebra Subprograms...
National audienceDue to non-associativity of floating-point operations and dynamic scheduling on par...
International audienceDue to non-associativity of floating-point operations and dynamic schedu...
Due to non-associativity of floating-point operations and dynamic scheduling on parallel architectur...
This article describes the design rationale, a C implementation, and conformance testing of a subse...
This article describes the design rationale, a C implementation, and conformance testing of a subset...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
International audienceNumerical reproducibility failures appear in massively par-allel floating-poin...
Le problème de non-reproductibilté numérique surgit dans les calculs parallèles principalement à cau...
BLAS benchmark results for: CPUs: Intel Core i7-4790K, Intel Core i5-4590, Intel Core i5-4590, Inte...
International audienceModern high performance computation (HPC) performs a huge amount of floating p...
This report summarises the main points raised on a recent workshop discussing various extensions to ...
International audienceThe process of finding the solution of a linear system of equations is often t...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
This working note examines different Fortran implementations of the Basic Linear Algebra Subprograms...
National audienceDue to non-associativity of floating-point operations and dynamic scheduling on par...
International audienceDue to non-associativity of floating-point operations and dynamic schedu...
Due to non-associativity of floating-point operations and dynamic scheduling on parallel architectur...
This article describes the design rationale, a C implementation, and conformance testing of a subse...
This article describes the design rationale, a C implementation, and conformance testing of a subset...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
International audienceNumerical reproducibility failures appear in massively par-allel floating-poin...
Le problème de non-reproductibilté numérique surgit dans les calculs parallèles principalement à cau...
BLAS benchmark results for: CPUs: Intel Core i7-4790K, Intel Core i5-4590, Intel Core i5-4590, Inte...
International audienceModern high performance computation (HPC) performs a huge amount of floating p...
This report summarises the main points raised on a recent workshop discussing various extensions to ...
International audienceThe process of finding the solution of a linear system of equations is often t...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
This working note examines different Fortran implementations of the Basic Linear Algebra Subprograms...