International audienceWe discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-time (NUMA) shared memory machines. We show how to optimize thread and data placement in order to achieve performance gains up to 50% compared to state-of- the-art libraries such as PLASMA or MKL
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
AbstractThis note calls into question a claim one sometimes hears about the time it takes to compute...
This paper is submitted for review to the Parallel Computing special issue for HCW and HeteroPar 16 ...
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-...
We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear sy...
International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achievin...
International audienceDue to the advent of multicore architectures and massive parallelism, the tile...
Several fine grained parallel algorithms were developed and compared to compute the Cholesky factori...
This paper introduces two novel algorithms for thread migrations, named CIMAR (Core-aware Interchang...
We employ the dynamic runtime system OmpSs to decrease the overhead of data motion in the now ubiqui...
Linear systems and the solving of those is an important tool in many areas of science. The solving o...
International audienceWe consider the problem of allocating and scheduling dense linear application ...
A Choleski method is described and used to solve linear systems of equations that arise in large sca...
The sparse matrix-vector product is a widespread operation amongst the scientific computing communit...
While the growing number of cores per chip allows researchers to solve larger scientific and enginee...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
AbstractThis note calls into question a claim one sometimes hears about the time it takes to compute...
This paper is submitted for review to the Parallel Computing special issue for HCW and HeteroPar 16 ...
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-...
We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear sy...
International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achievin...
International audienceDue to the advent of multicore architectures and massive parallelism, the tile...
Several fine grained parallel algorithms were developed and compared to compute the Cholesky factori...
This paper introduces two novel algorithms for thread migrations, named CIMAR (Core-aware Interchang...
We employ the dynamic runtime system OmpSs to decrease the overhead of data motion in the now ubiqui...
Linear systems and the solving of those is an important tool in many areas of science. The solving o...
International audienceWe consider the problem of allocating and scheduling dense linear application ...
A Choleski method is described and used to solve linear systems of equations that arise in large sca...
The sparse matrix-vector product is a widespread operation amongst the scientific computing communit...
While the growing number of cores per chip allows researchers to solve larger scientific and enginee...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
AbstractThis note calls into question a claim one sometimes hears about the time it takes to compute...
This paper is submitted for review to the Parallel Computing special issue for HCW and HeteroPar 16 ...