Shared memory systems are becoming increasingly complex as they typically integrate several storage devices. That brings different access latencies or bandwidth rates depending on the proximity between the cores where memory accesses are issued and the storage devices containing the requested data. In this context, techniques to manage and mitigate non-uniform memory access (NUMA) effects consist in migrating threads, memory pages or both and are generally applied by the system software. We propose techniques at the runtime system level to further mitigate the impact of NUMA effects on parallel applications' performance. We leverage runtime system metadata expressed in terms of a task dependency graph, where nodes are pieces of serial code ...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
The need to achieve higher performance through greater degrees of parallelism necessitates distribut...
The complexity of shared memory systems is becoming more relevant as the number of memory domains in...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
The importance of high-performance graph processing to solve big data problems targeting high-impact...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high perfor...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
The latency of memory access times is hence non-uniform, because it depends on where the request ori...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
Recent technological trends have aided the design and development of large-scale heterogeneous syste...
Nonuniform memory access time (referred to as NUMA) is an important feature in the design of large s...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
The need to achieve higher performance through greater degrees of parallelism necessitates distribut...
The complexity of shared memory systems is becoming more relevant as the number of memory domains in...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
The importance of high-performance graph processing to solve big data problems targeting high-impact...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high perfor...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
The latency of memory access times is hence non-uniform, because it depends on where the request ori...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
Recent technological trends have aided the design and development of large-scale heterogeneous syste...
Nonuniform memory access time (referred to as NUMA) is an important feature in the design of large s...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
The need to achieve higher performance through greater degrees of parallelism necessitates distribut...