International audienceIn the last ten years, GPUs have dominated the market considering the computing/power metric and numerous research works have provided Basic Linear Algebra Subprograms implementations accelerated on GPUs. Several software libraries have been developed for exploiting performance of systems with accelerators, but the real performance may be far from the platform peak performance. This paper presents XKBlas that aims to improve performance of BLAS-3 kernels on multi-GPU systems. At low level, we model computation as a set of tasks accessing data on different resources. At high level, the API design favors non-blocking calls as uniform concept to overlap latency, even by fine grain computation. Unit benchmark of BLAS-3 ker...
In this chapter, we present a hybridization methodology for the development of linear algebra softwa...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel...
International audienceNowadays GPUs have dominated the market considering the computing/power metric...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
Scientific applications are some of the most computationally demanding software pieces. Their core i...
This work reviews the experience of implementing different versions of the SSPR rank-one update oper...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
The increase in performance of the last generations of graphics processors (GPUs) has made this clas...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
While general-purpose homogeneous multi-core architectures are becoming ubiquitous, there are clear ...
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra l...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
We propose two high-level application programming interfaces (APIs) to use a graphics processing uni...
In this chapter, we present a hybridization methodology for the development of linear algebra softwa...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel...
International audienceNowadays GPUs have dominated the market considering the computing/power metric...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
Scientific applications are some of the most computationally demanding software pieces. Their core i...
This work reviews the experience of implementing different versions of the SSPR rank-one update oper...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
The increase in performance of the last generations of graphics processors (GPUs) has made this clas...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
While general-purpose homogeneous multi-core architectures are becoming ubiquitous, there are clear ...
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra l...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
We propose two high-level application programming interfaces (APIs) to use a graphics processing uni...
In this chapter, we present a hybridization methodology for the development of linear algebra softwa...
This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY ...
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel...