The complexity of shared memory systems is becoming more relevant as the number of memory domains increases, with different access latencies and bandwidth rates depending on the proximity between the cores and the devices containing the data. In this context, techniques to manage and mitigate non-uniform memory access (NUMA) effects consist in migrating threads, memory pages or both and are typically applied by the system software. We propose techniques at the runtime system level to reduce NUMA effects on parallel applications. We leverage runtime system metadata in terms of a task dependency graph. Our approach, based on graph partitioning methods, is able to provide parallel performance improvements of 1.12X on average with respect to t...
Nowadays the evolution of High Performance Computing follows the needs of numerical simulations.Thes...
International audienceThe task-based approach is a parallelization paradigm in which an algorithm is...
International audienceIn modern parallel architectures, memory accesses represent a common bottlenec...
The complexity of shared memory systems is becoming more relevant as the number of memory domains in...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
The importance of high-performance graph processing to solve big data problems targeting high-impact...
International audienceThe recent addition of data dependencies to the OpenMP 4.0 standard provides t...
International audienceOver the past few years, parallel sparse direct solvers have made significant ...
Modern architectures have multiple processors, each of which contains multiple cores, connected to d...
International audienceExploiting the full computational power of current hierarchical multiprocessor...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
Nowadays the evolution of High Performance Computing follows the needs of numerical simulations.Thes...
International audienceThe task-based approach is a parallelization paradigm in which an algorithm is...
International audienceIn modern parallel architectures, memory accesses represent a common bottlenec...
The complexity of shared memory systems is becoming more relevant as the number of memory domains in...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
The importance of high-performance graph processing to solve big data problems targeting high-impact...
International audienceThe recent addition of data dependencies to the OpenMP 4.0 standard provides t...
International audienceOver the past few years, parallel sparse direct solvers have made significant ...
Modern architectures have multiple processors, each of which contains multiple cores, connected to d...
International audienceExploiting the full computational power of current hierarchical multiprocessor...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
Nowadays the evolution of High Performance Computing follows the needs of numerical simulations.Thes...
International audienceThe task-based approach is a parallelization paradigm in which an algorithm is...
International audienceIn modern parallel architectures, memory accesses represent a common bottlenec...