Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building blocks for several High Performance Computing (HPC) applications and hence dictate performance of the HPC applications. Performance in such tuned packages is attained through tuning of several algorithmic and architectural parameters such as number of parallel operations in the Directed Acyclic Graph of the BLAS/LAPACK routines, sizes of the memories in the memory hierarchy of the underlying platform, bandwidth of the memory, and structure of the compute resources in the underlying platform. In this paper, we closely investigate the impact of the Floating Point Unit (FPU) micro-architecture for performance tuning of BLAS and LAPACK. We present th...
International audienceOn modern architectures, the performance of 32-bit operations is often at leas...
This paper presents an overview of the LAPACK library, a portable, public-domain library to solve th...
UnrestrictedRecently, high-end computing systems have been introduced that employ reconfigurable har...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
This paper presents an approach to increasing the capability of scientific computing through the use...
We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
General purpose microprocessor based computers usu-ally speed their arithmetic processing performanc...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
Achieving high computation efficiency, in terms of Cycles per Instruction (CPI), for high-performanc...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
International audienceOn modern architectures, the performance of 32-bit operations is often at leas...
This paper presents an overview of the LAPACK library, a portable, public-domain library to solve th...
UnrestrictedRecently, high-end computing systems have been introduced that employ reconfigurable har...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
This paper presents an approach to increasing the capability of scientific computing through the use...
We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
General purpose microprocessor based computers usu-ally speed their arithmetic processing performanc...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
Achieving high computation efficiency, in terms of Cycles per Instruction (CPI), for high-performanc...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
International audienceOn modern architectures, the performance of 32-bit operations is often at leas...
This paper presents an overview of the LAPACK library, a portable, public-domain library to solve th...
UnrestrictedRecently, high-end computing systems have been introduced that employ reconfigurable har...