This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY - implemented using multiple-precision software for central processing units (CPUs) and CUDA compatible graphics processing units (GPUs). Each log file provided contains the results of three test runs, and also details about the experimental setup. The following software packages are considered: 1. For CPUs: • MPFR: A C library for multiple-precision floating-point computations with correct rounding (https://www.mpfr.org) • ARPREC: An arbitrary precision package for Fortran and C++ (https://www.davidhbailey.com/dhbsoftware) • MPDECIMAL: A package for correctly-rounded arbitrary precision decimal floating point (https://www.bytereef.org/mpde...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
Graphic processors are becoming faster and faster. Computational power within graphic processing uni...
This paper describes a C implementation of the proposed new BLAS Standard. Permitting mixtures of i...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
BLAS benchmark results for: CPUs: Intel Core i7-4790K, Intel Core i5-4590, Intel Core i5-4590, Inte...
International audienceIn the last ten years, GPUs have dominated the market considering the computin...
Timing results for BLAS (Basic Linear Algebra Subprograms) libraries in R on diverse CPUs and GPUs. ...
This work reviews the experience of implementing different versions of the SSPR rank-one update oper...
This article describes the design rationale, a C implementation, and conformance testing of a subse...
This article describes the design rationale, a C implementation, and conformance testing of a subset...
National audienceDue to non-associativity of floating-point operations and dynamic scheduling on par...
<p>The first, second, third, and fourth columns show the name of each program, the number of GPUs us...
Automated code generation and performance tuning tech-niques for concurrent architectures such as GP...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
Graphic processors are becoming faster and faster. Computational power within graphic processing uni...
This paper describes a C implementation of the proposed new BLAS Standard. Permitting mixtures of i...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
BLAS benchmark results for: CPUs: Intel Core i7-4790K, Intel Core i5-4590, Intel Core i5-4590, Inte...
International audienceIn the last ten years, GPUs have dominated the market considering the computin...
Timing results for BLAS (Basic Linear Algebra Subprograms) libraries in R on diverse CPUs and GPUs. ...
This work reviews the experience of implementing different versions of the SSPR rank-one update oper...
This article describes the design rationale, a C implementation, and conformance testing of a subse...
This article describes the design rationale, a C implementation, and conformance testing of a subset...
National audienceDue to non-associativity of floating-point operations and dynamic scheduling on par...
<p>The first, second, third, and fourth columns show the name of each program, the number of GPUs us...
Automated code generation and performance tuning tech-niques for concurrent architectures such as GP...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
Graphic processors are becoming faster and faster. Computational power within graphic processing uni...
This paper describes a C implementation of the proposed new BLAS Standard. Permitting mixtures of i...