International audienceHybrid computing platforms are now commonplace, featuring a large number of CPU cores and accelerators. This trend makes balancing computations between these heterogeneous resources performance critical. In this paper we propose aggregating several CPU cores in order to execute larger parallel tasks and thus improve the load balance between CPUs and accelerators. Additionally, we present our approach to exploit internal parallelism within tasks. This is done by combining two runtime systems: one runtime system to handle the task graph and another one to manage the internal parallelism. We demonstrate the relevance of our approach in the context of the dense Cholesky factorization kernel implemented on top of the StarPU...
Heterogeneous platforms are mixes of different processing units in a compute node (e.g., CPUs+GPUs, ...
International audience—To face the advent of multicore processors and the ever increasing complexity...
As many-core accelerators keep integrating more processing units, it becomes increasingly more diffi...
This paper is submitted for review to the Parallel Computing special issue for HCW and HeteroPar 16 ...
International audienceComputing platforms are now extremely complex providing an increasing number o...
In this paper, we consider task-based dense linear algebra applications on a single heterogeneous no...
In this paper we introduce a model for the total energy consumption of the Cholesky factorization on...
Load balancing increases the efficient usage of existing resources for parallel and distributed appl...
International audienceEnabling HPC applications to perform efficiently when invoking multiple parall...
Until recent years most parallel machines have been made up of closely-coupled microprocessor-based ...
International audienceEnabling HPC applications to perform efficiently when invoking multiple parall...
We consider the problem of allocating and scheduling dense linear application on fully heterogeneous...
International audienceThe task-based approach is a parallelization paradigm in which an algorithm is...
In a general-purpose computing system, several parallel applications run simultaneously on the same ...
Heterogeneous multiprocessing (HMP) is an emerging technology for high-performance and energy-effici...
Heterogeneous platforms are mixes of different processing units in a compute node (e.g., CPUs+GPUs, ...
International audience—To face the advent of multicore processors and the ever increasing complexity...
As many-core accelerators keep integrating more processing units, it becomes increasingly more diffi...
This paper is submitted for review to the Parallel Computing special issue for HCW and HeteroPar 16 ...
International audienceComputing platforms are now extremely complex providing an increasing number o...
In this paper, we consider task-based dense linear algebra applications on a single heterogeneous no...
In this paper we introduce a model for the total energy consumption of the Cholesky factorization on...
Load balancing increases the efficient usage of existing resources for parallel and distributed appl...
International audienceEnabling HPC applications to perform efficiently when invoking multiple parall...
Until recent years most parallel machines have been made up of closely-coupled microprocessor-based ...
International audienceEnabling HPC applications to perform efficiently when invoking multiple parall...
We consider the problem of allocating and scheduling dense linear application on fully heterogeneous...
International audienceThe task-based approach is a parallelization paradigm in which an algorithm is...
In a general-purpose computing system, several parallel applications run simultaneously on the same ...
Heterogeneous multiprocessing (HMP) is an emerging technology for high-performance and energy-effici...
Heterogeneous platforms are mixes of different processing units in a compute node (e.g., CPUs+GPUs, ...
International audience—To face the advent of multicore processors and the ever increasing complexity...
As many-core accelerators keep integrating more processing units, it becomes increasingly more diffi...