Current multi-socket systems have complex memory hierarchies with significant Non-Uniform Memory Access (NUMA) effects: memory performance depends on the location of the data and the thread. This complexity means that thread- and data-mappings have a significant impact on performance. However, it is hard to find efficient data mappings and thread configurations due to the complex interactions between applications and systems. In this paper we explore the combined search space of thread mappings, data mappings, number of NUMA nodes, and degreeof-parallelism, per application phase, and across multiple systems. We show that there are significant performance benefits from optimizing this wide range of parameters together. However, such an optim...
International audienceCurrent and future architectures rely on thread-level parallelism to sustain p...
With the rise of multi-socket multi-core CPUs a lot of ef-fort is being put into how to best exploit...
Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high perfor...
Current multi-socket systems have complex memory hierarchies with significant Non-Uniform Memory Acc...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
International audienceNon Uniform Memory Access (NUMA) architectures are nowadays common for running...
International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achievin...
A common approach to improve memory access in NUMA machines exploits operating system (OS) page prot...
The problem of placement of threads, or virtual cores, on physical cores in a multicore system has b...
This paper introduces two novel algorithms for thread migrations, named CIMAR (Core-aware Interchang...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
International audienceCurrent and future architectures rely on thread-level parallelism to sustain p...
With the rise of multi-socket multi-core CPUs a lot of ef-fort is being put into how to best exploit...
Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high perfor...
Current multi-socket systems have complex memory hierarchies with significant Non-Uniform Memory Acc...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
International audienceNon Uniform Memory Access (NUMA) architectures are nowadays common for running...
International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achievin...
A common approach to improve memory access in NUMA machines exploits operating system (OS) page prot...
The problem of placement of threads, or virtual cores, on physical cores in a multicore system has b...
This paper introduces two novel algorithms for thread migrations, named CIMAR (Core-aware Interchang...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
International audienceCurrent and future architectures rely on thread-level parallelism to sustain p...
With the rise of multi-socket multi-core CPUs a lot of ef-fort is being put into how to best exploit...
Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high perfor...