We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and multi-GPU systems to support dense matrix computations efficiently. The main idea is that we treat a heterogeneous system as a distributed-memory machine, and use a heterogeneous multi-level block cyclic distribution method to allocate data to the host and multiple GPUs to minimize communication. We design het-erogeneous algorithms with hybrid tiles to accommodate the processor heterogeneity, and introduce an auto-tuning method to determine the hybrid tile sizes to attain both high performance and load balancing. We have also imple-mented a new runtime system and applied it to the Cholesky and QR factorizations. Our approach is designed for achi...
As users and developers, we are witnessing the opening of a new computing scenario: the introduction...
International audienceWe study the implementation of dense linear algebra computations, such as matr...
In this paper, we consider the problem of partitioning a square into a set of zones of prescribed ar...
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on ...
GPU-based heterogeneous clusters continue to draw atten-tion from vendors and HPC users due to their...
Matrix Factorization (MF) has been widely applied in machine learning and data mining. Due to the la...
Most supercomputers are shipped with both a CPU and a GPU. With the powerful parallel computing capa...
AbstractOne-sided dense matrix factorizations are important computational kernels in many scientific...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Hybrid GPU/CPU clusters are becoming very popular in the scientific computing community, as attested...
In this document, we describe two strategies of distribution of computations that can be used to imp...
(eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogene...
The Graphics Processing Unit (GPU) is present in almost every modern day personal computer. Despite...
A recent trend in modern high-performance computing environments is the introduction of powerful, en...
We consider the problem of data allocation when performing matrix multiplication on a heterogeneous ...
As users and developers, we are witnessing the opening of a new computing scenario: the introduction...
International audienceWe study the implementation of dense linear algebra computations, such as matr...
In this paper, we consider the problem of partitioning a square into a set of zones of prescribed ar...
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on ...
GPU-based heterogeneous clusters continue to draw atten-tion from vendors and HPC users due to their...
Matrix Factorization (MF) has been widely applied in machine learning and data mining. Due to the la...
Most supercomputers are shipped with both a CPU and a GPU. With the powerful parallel computing capa...
AbstractOne-sided dense matrix factorizations are important computational kernels in many scientific...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Hybrid GPU/CPU clusters are becoming very popular in the scientific computing community, as attested...
In this document, we describe two strategies of distribution of computations that can be used to imp...
(eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogene...
The Graphics Processing Unit (GPU) is present in almost every modern day personal computer. Despite...
A recent trend in modern high-performance computing environments is the introduction of powerful, en...
We consider the problem of data allocation when performing matrix multiplication on a heterogeneous ...
As users and developers, we are witnessing the opening of a new computing scenario: the introduction...
International audienceWe study the implementation of dense linear algebra computations, such as matr...
In this paper, we consider the problem of partitioning a square into a set of zones of prescribed ar...