tures are ubiquitous in HPC systems. NUMA along with other factors including socket layout, data placement, and memory contention significantly increase the search space to find an optimal mapping of applications to NUMA systems. This search space may be intractable for online optimization and challenging for efficient offline search. This paper presents DyNUMA, a framework for dynamic optimization of programs on NUMA architectures. DyNUMA uses simple, memory-centric, performance and energy models with non-linear terms to capture the complex and interacting effects of system layout, program concurrency, data placement, and memory controller contention. DyNUMA leverages an artificial neural network (ANN) with input, output, and intermediate ...
Modern multicore systems are based on a Non-Uniform Memory Access (NUMA) design. In a NUMA system, c...
The spread of deep learning on embedded devices has prompted the development of numerous methods to ...
International audienceIn this paper, we analyse performance and energy consumption of five OpenMP ru...
Current multi-socket systems have complex memory hierarchies with significant Non-Uniform Memory Acc...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
International audienceNon Uniform Memory Access (NUMA) architectures are nowadays common for running...
HPC systems expose configuration options that help users optimize their applications' execution. Que...
Hardware transactional memory (HTM) is supported by widely-used commodity processors. While the effe...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
Both NUMA thread/data placement and hardware prefetcher configuration have significant impacts on HP...
Hardware transactional memory (HTM) is widely supported by commodity processors. While the effective...
Both NUMA thread/data placement and hardware prefetcher configuration have significant impacts on HP...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
The problem of placement of threads, or virtual cores, on physical cores in a multicore system has b...
Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high perfor...
Modern multicore systems are based on a Non-Uniform Memory Access (NUMA) design. In a NUMA system, c...
The spread of deep learning on embedded devices has prompted the development of numerous methods to ...
International audienceIn this paper, we analyse performance and energy consumption of five OpenMP ru...
Current multi-socket systems have complex memory hierarchies with significant Non-Uniform Memory Acc...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
International audienceNon Uniform Memory Access (NUMA) architectures are nowadays common for running...
HPC systems expose configuration options that help users optimize their applications' execution. Que...
Hardware transactional memory (HTM) is supported by widely-used commodity processors. While the effe...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
Both NUMA thread/data placement and hardware prefetcher configuration have significant impacts on HP...
Hardware transactional memory (HTM) is widely supported by commodity processors. While the effective...
Both NUMA thread/data placement and hardware prefetcher configuration have significant impacts on HP...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
The problem of placement of threads, or virtual cores, on physical cores in a multicore system has b...
Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high perfor...
Modern multicore systems are based on a Non-Uniform Memory Access (NUMA) design. In a NUMA system, c...
The spread of deep learning on embedded devices has prompted the development of numerous methods to ...
International audienceIn this paper, we analyse performance and energy consumption of five OpenMP ru...