Parallel scientific programs executing in a NUMA environment are confronted with the problem of how to place their data in the system's physically separate memories so as to minimise the latency of accesses to this data made by the program's threads. Motivated by this poor performance, this thesis describes a technique by which the partition of a parallel program's workload that is created by a loadbalancing routine may be used to identify the affinities of the threads of this program for regions of the program's address space.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...
The problem of placement of threads, or virtual cores, on physical cores in a multicore system has b...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achievin...
This paper introduces a resource allocation framework specifically tailored for addressing the probl...
Abstract—Multi-core nodes with Non-Uniform Memory Ac-cess (NUMA) are now a common architecture for h...
Nonuniform memory access time (referred to as NUMA) is an important feature in the design of large s...
International audienceNowadays, on Multi-core Multiprocessors with Hierarchical Memory (Non-Uniform ...
Abstract—An important aspect of workload characterization is understanding memory system performance...
Given the wide scale adoption of multi-cores in main stream computing, parallel programs rarely exec...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...
The problem of placement of threads, or virtual cores, on physical cores in a multicore system has b...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achievin...
This paper introduces a resource allocation framework specifically tailored for addressing the probl...
Abstract—Multi-core nodes with Non-Uniform Memory Ac-cess (NUMA) are now a common architecture for h...
Nonuniform memory access time (referred to as NUMA) is an important feature in the design of large s...
International audienceNowadays, on Multi-core Multiprocessors with Hierarchical Memory (Non-Uniform ...
Abstract—An important aspect of workload characterization is understanding memory system performance...
Given the wide scale adoption of multi-cores in main stream computing, parallel programs rarely exec...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...