c © The Author 2015. This paper is published with open access at SuperFri.org We employ the dynamic runtime system OmpSs to decrease the overhead of data motion in the now ubiquitous non-uniform memory access (NUMA) high concurrency environment of multicore processors. The dense numerical linear algebra algorithms of Cholesky factorization and symmetric matrix inversion are employed as representative benchmarks. Work stealing occurs within an innovative NUMA-aware scheduling policy to reduce data movement between NUMA nodes. The overall approach achieves separation of concerns by abstracting the complexity of the hardware from the end users so that high productivity can be achieved. Performance results on a large NUMA system outperform the ...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Matrix computations lie at the heart of most scientific computational tasks. The solution of linear ...
8th WORKSHOP ON APPLICATIONS FOR MULTI-CORE ARCHITECTURESInternational audienceIn this paper, we ana...
We employ the dynamic runtime system OmpSs to decrease the overhead of data motion in the now ubiqui...
While the growing number of cores per chip allows researchers to solve larger scientific and enginee...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-...
International audienceOver the past few years, parallel sparse direct solvers made significant progr...
The sparse matrix-vector product is a widespread operation amongst the scientific computing communit...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...
We discuss the high-performance parallel implementation and execution of dense linear algebra matrix...
Nowadays the evolution of High Performance Computing follows the needs of numerical simulations.Thes...
this paper, we describe a framework for loop transformations and code generation for NUMA (non-unifo...
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused o...
We present the use of a hybrid static/dynamic scheduling strategy of the task dependency graph for d...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Matrix computations lie at the heart of most scientific computational tasks. The solution of linear ...
8th WORKSHOP ON APPLICATIONS FOR MULTI-CORE ARCHITECTURESInternational audienceIn this paper, we ana...
We employ the dynamic runtime system OmpSs to decrease the overhead of data motion in the now ubiqui...
While the growing number of cores per chip allows researchers to solve larger scientific and enginee...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-...
International audienceOver the past few years, parallel sparse direct solvers made significant progr...
The sparse matrix-vector product is a widespread operation amongst the scientific computing communit...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...
We discuss the high-performance parallel implementation and execution of dense linear algebra matrix...
Nowadays the evolution of High Performance Computing follows the needs of numerical simulations.Thes...
this paper, we describe a framework for loop transformations and code generation for NUMA (non-unifo...
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused o...
We present the use of a hybrid static/dynamic scheduling strategy of the task dependency graph for d...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Matrix computations lie at the heart of most scientific computational tasks. The solution of linear ...
8th WORKSHOP ON APPLICATIONS FOR MULTI-CORE ARCHITECTURESInternational audienceIn this paper, we ana...