National audienceDue to non-associativity of floating-point operations and dynamic scheduling on parallel architectures, getting a bitwise reproducible floating-point result for multiple executions of the same code on different or even similar parallel architectures is challenging. We address the problem of reproducibility in the context of fundamental linear algebra operations — like the ones included in the Basic Linear Algebra Subprograms (BLAS) library — and propose algorithms that yields both reproducible and accurate (rounding to the nearest) results. We present implementations of these reproducible and accurate algorithms for the BLAS routines in parallel environments such as Intel server CPUs, Intel Xeon Phi, and both NVIDIA and AMD...