Current high performance computing architectures are composed of large shared memory NUMA nodes, among other components. Such nodes are becoming increasingly complex as they have several NUMA domains with different access latencies depending on the core where the access is issued. In this work, we propose techniques to efficiently mitigate the negative impact of NUMA effects on parallel applications performance. We leverage runtime system metadata expressed in terms of a task dependency graph, where nodes are sequential pieces of code and edges are control or data dependencies between them, to efficiently reduce data transfers using graph partitioning techniques. With our proposals, we are able to improve the execution time of OpenMP parall...
Graph-structured data can be found in nearly every aspect of today's world, be it road networks, soc...
Nowadays the evolution of High Performance Computing follows the needs of numerical simulations.Thes...
International audienceWe present a joint scheduling and memory allocation algorithm for efficient ex...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
The complexity of shared memory systems is becoming more relevant as the number of memory domains in...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
The importance of high-performance graph processing to solve big data problems targeting high-impact...
International audienceThe recent addition of data dependencies to the OpenMP 4.0 standard provides t...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
International audienceExploiting the full computational power of current hierarchical multiprocessor...
International audienceThe task-based approach is a parallelization paradigm in which an algorithm is...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
Graph partitioning and repartitioning have been studied for several decades. Yet, they are receiving...
Pattern matching on large graphs is the foundation for a variety of application domains. The continu...
Graph-structured data can be found in nearly every aspect of today's world, be it road networks, soc...
Nowadays the evolution of High Performance Computing follows the needs of numerical simulations.Thes...
International audienceWe present a joint scheduling and memory allocation algorithm for efficient ex...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
The complexity of shared memory systems is becoming more relevant as the number of memory domains in...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
The importance of high-performance graph processing to solve big data problems targeting high-impact...
International audienceThe recent addition of data dependencies to the OpenMP 4.0 standard provides t...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
International audienceExploiting the full computational power of current hierarchical multiprocessor...
International audienceThe task-based approach is a parallelization paradigm in which an algorithm is...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
Graph partitioning and repartitioning have been studied for several decades. Yet, they are receiving...
Pattern matching on large graphs is the foundation for a variety of application domains. The continu...
Graph-structured data can be found in nearly every aspect of today's world, be it road networks, soc...
Nowadays the evolution of High Performance Computing follows the needs of numerical simulations.Thes...
International audienceWe present a joint scheduling and memory allocation algorithm for efficient ex...