The complexity of shared memory systems is becoming more relevant as the number of memory domains increases, with different access latencies and bandwidth rates depending on the proximity between the cores and the devices containing the data. In this context, techniques to manage and mitigate non-uniform memory access (NUMA) effects consist in migrating threads, memory pages or both and are typically applied by the system software. We propose techniques at the runtime system level to reduce NUMA effects on parallel applications. We leverage runtime system metadata in terms of a task dependency graph. Our approach, based on graph partitioning methods, is able to provide parallel performance improvements of 1.12X on average with respect to th...
International audienceScientific workflows are frequently modeled as Directed Acyclic Graphs (DAGs) ...
Graph-structured data can be found in nearly every aspect of today's world, be it road networks, soc...
International audienceOver the past few years, parallel sparse direct solvers made significant progr...
The complexity of shared memory systems is becoming more relevant as the number of memory domains in...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
The importance of high-performance graph processing to solve big data problems targeting high-impact...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
Pattern matching on large graphs is the foundation for a variety of application domains. The continu...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Main-memory column-stores are called to efficiently use mod-ern non-uniform memory access (NUMA) arc...
Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high perfor...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
International audienceScientific workflows are frequently modeled as Directed Acyclic Graphs (DAGs) ...
Graph-structured data can be found in nearly every aspect of today's world, be it road networks, soc...
International audienceOver the past few years, parallel sparse direct solvers made significant progr...
The complexity of shared memory systems is becoming more relevant as the number of memory domains in...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
The importance of high-performance graph processing to solve big data problems targeting high-impact...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
Pattern matching on large graphs is the foundation for a variety of application domains. The continu...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Main-memory column-stores are called to efficiently use mod-ern non-uniform memory access (NUMA) arc...
Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high perfor...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
International audienceScientific workflows are frequently modeled as Directed Acyclic Graphs (DAGs) ...
Graph-structured data can be found in nearly every aspect of today's world, be it road networks, soc...
International audienceOver the past few years, parallel sparse direct solvers made significant progr...