International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achieving high performance for multi-threaded application requires both a careful placement of threads on computing units and a thorough allocation of data in memory. Finding such a placement is a hard problem to solve, because performance depends on complex interactions in several layers of the memory hierarchy. In this paper we propose a black-box approach to decide if an application execution time can be impacted by the placement of its threads and data, and in such a case, to choose the best placement strategy to adopt. We show that it is possible to reach near-optimal placement policy selection. Furthermore, solutions work across several recent ...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
Nowadays, on hierarchical shared memory multiprocessors with Non-Uniform Memory Access (NUMA), the n...
It is well known that the placement of threads and memory plays a crucial role for performance on NU...
International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achievin...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
International audienceExploiting the full computational power of current hierarchical multiprocessor...
This paper introduces two novel algorithms for thread migrations, named CIMAR (Core-aware Interchang...
Our work addresses the problem of placement of threads, or virtual cores, onto physical cores in a m...
The problem of placement of threads, or virtual cores, on physical cores in a multicore system has b...
Funding: This work has been supported by the European Union grant EU H2020-ICT-2014-1 project RePhra...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
International audienceWhile virtualization only introduces a small overhead on machines with few cor...
International audienceThe recent addition of data dependencies to the OpenMP 4.0 standard provides t...
We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear sy...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
Nowadays, on hierarchical shared memory multiprocessors with Non-Uniform Memory Access (NUMA), the n...
It is well known that the placement of threads and memory plays a crucial role for performance on NU...
International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achievin...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
International audienceExploiting the full computational power of current hierarchical multiprocessor...
This paper introduces two novel algorithms for thread migrations, named CIMAR (Core-aware Interchang...
Our work addresses the problem of placement of threads, or virtual cores, onto physical cores in a m...
The problem of placement of threads, or virtual cores, on physical cores in a multicore system has b...
Funding: This work has been supported by the European Union grant EU H2020-ICT-2014-1 project RePhra...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
International audienceWhile virtualization only introduces a small overhead on machines with few cor...
International audienceThe recent addition of data dependencies to the OpenMP 4.0 standard provides t...
We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear sy...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
Nowadays, on hierarchical shared memory multiprocessors with Non-Uniform Memory Access (NUMA), the n...
It is well known that the placement of threads and memory plays a crucial role for performance on NU...