International audienceNowadays many clusters integrate GPUs accelerators in their architectures that provide a huge amount of computational units rarely fully exploited. We present in this talk how tile algorithms and DAG schedulers as PaRSEC or StarPU can allow the programmer to integrate GPUs in their algorithms. We will present dense linear algebra algorithms as Cholesky or LU factorizations that exploit distributed architectures equipped with GPUs
We consider the problem of allocating and scheduling dense linear application on fully heterogeneous...
International audienceAlthough the hardware has dramatically changed in the last few years, nodes of...
AbstractOne-sided dense matrix factorizations are important computational kernels in many scientific...
We address some key issues in designing dense linear algebra (DLA) algorithms that are common for bo...
We propose two high-level application programming interfaces (APIs) to use a graphics processing uni...
If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced with accel...
Abstract. We address some key issues in designing dense linear alge-bra (DLA) algorithms that are co...
Abstract. If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced ...
Abstract. We address some key issues in designing dense linear algebra (DLA) algorithms that are com...
In a previous PPoPP paper we showed how the FLAME method-ology, combined with the SuperMatrix runtim...
Enabling large scale use of GPU-based architectures for high performance computational science depen...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
We present DPLASMA, a new project related to PLASMA, that operates in the distributed memory regime....
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on ...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
We consider the problem of allocating and scheduling dense linear application on fully heterogeneous...
International audienceAlthough the hardware has dramatically changed in the last few years, nodes of...
AbstractOne-sided dense matrix factorizations are important computational kernels in many scientific...
We address some key issues in designing dense linear algebra (DLA) algorithms that are common for bo...
We propose two high-level application programming interfaces (APIs) to use a graphics processing uni...
If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced with accel...
Abstract. We address some key issues in designing dense linear alge-bra (DLA) algorithms that are co...
Abstract. If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced ...
Abstract. We address some key issues in designing dense linear algebra (DLA) algorithms that are com...
In a previous PPoPP paper we showed how the FLAME method-ology, combined with the SuperMatrix runtim...
Enabling large scale use of GPU-based architectures for high performance computational science depen...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
We present DPLASMA, a new project related to PLASMA, that operates in the distributed memory regime....
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on ...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
We consider the problem of allocating and scheduling dense linear application on fully heterogeneous...
International audienceAlthough the hardware has dramatically changed in the last few years, nodes of...
AbstractOne-sided dense matrix factorizations are important computational kernels in many scientific...