International audienceNowadays GPUs have dominated the market considering the computing/power metric and numerous research works have provided Basic Linear Algebra Subprograms implementations accelerated on GPUs. Several software libraries have been developed for exploiting performance of systems with accelerators, but the real performance may be far from the platform peak performance with multiple GPUs. This paper presents two runtime heuristics to gain in performance when task based programs are performed on heterogeneous architecture such as multi-GPU systems. The first is a topology-aware policy to takes into account the heterogeneity of the high speed links that interconnect GPUs. The second is an optimistic heuristic that favor commun...
The relentless demands for improvements in the compute throughput, and energy efficiency have driven...
Scientific applications are some of the most computationally demanding software pieces. Their core i...
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra l...
International audienceNowadays GPUs have dominated the market considering the computing/power metric...
International audienceIn the last ten years, GPUs have dominated the market considering the computin...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
In this chapter, we present a hybridization methodology for the development of linear algebra softwa...
While general-purpose homogeneous multi-core architectures are becoming ubiquitous, there are clear ...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
We propose two high-level application programming interfaces (APIs) to use a graphics processing uni...
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on ...
The increase in performance of the last generations of graphics processors (GPUs) has made this clas...
Parallel accelerators are playing an increasingly important role in scientific computing. However, i...
abstract: With the massive multithreading execution feature, graphics processing units (GPUs) have b...
The relentless demands for improvements in the compute throughput, and energy efficiency have driven...
Scientific applications are some of the most computationally demanding software pieces. Their core i...
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra l...
International audienceNowadays GPUs have dominated the market considering the computing/power metric...
International audienceIn the last ten years, GPUs have dominated the market considering the computin...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
In this chapter, we present a hybridization methodology for the development of linear algebra softwa...
While general-purpose homogeneous multi-core architectures are becoming ubiquitous, there are clear ...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
We propose two high-level application programming interfaces (APIs) to use a graphics processing uni...
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on ...
The increase in performance of the last generations of graphics processors (GPUs) has made this clas...
Parallel accelerators are playing an increasingly important role in scientific computing. However, i...
abstract: With the massive multithreading execution feature, graphics processing units (GPUs) have b...
The relentless demands for improvements in the compute throughput, and energy efficiency have driven...
Scientific applications are some of the most computationally demanding software pieces. Their core i...
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra l...