In a non-uniform memory access machine, the placement of software threads to hardware cores can have a significant effect on the performance of concurrent applications. Detecting the best possible placement for each application is a necessity for thread scheduling. Yet, due to the difficulty of this problem, operating-system schedulers do not really try to understand the needs of applications, but rather focus on (non-portable) scheduling heuristics.In this paper, we introduce thread-placement learning (TPLE), a technique for understanding the placement requirements of applications. TPLE utilizes machine learning and performance counters for choosing between different placement policies. To feed the machine learning model, TPLE requires a s...
The emergence of multicore and manycore processors is set to change the parallel computing world. Ap...
Large, high frequency single-core chip designs are increasingly being replaced with larger chip mult...
This paper describes a method to improve the cache locality of sequential programs by scheduling fin...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achievin...
This paper introduces a reinforcement-learning based resource allocation framework for dynamic place...
International audienceThread mapping has been extensively used as a technique to efficiently exploit...
Abstract—Thread mapping has been extensively used as a technique to efficiently exploit memory hiera...
Funding: This work has been partially supported by the European Union grant EU H2020-ICT-2014-1 proj...
This paper introduces a resource allocation framework specifically tailored for addressing the probl...
Abstract—There is a clear trend in current processor design towards the combination of several threa...
In a modern chip-multiprocessor system, memory is a shared resource among multiple concurrently exec...
capable of executing instructions from multiple threads in the same cycle. SMT in fact was introduce...
The introduction of multicore/multithreaded processors, comprised of a large number of hardware cont...
The emergence of multicore and manycore processors is set to change the parallel computing world. Ap...
Large, high frequency single-core chip designs are increasingly being replaced with larger chip mult...
This paper describes a method to improve the cache locality of sequential programs by scheduling fin...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achievin...
This paper introduces a reinforcement-learning based resource allocation framework for dynamic place...
International audienceThread mapping has been extensively used as a technique to efficiently exploit...
Abstract—Thread mapping has been extensively used as a technique to efficiently exploit memory hiera...
Funding: This work has been partially supported by the European Union grant EU H2020-ICT-2014-1 proj...
This paper introduces a resource allocation framework specifically tailored for addressing the probl...
Abstract—There is a clear trend in current processor design towards the combination of several threa...
In a modern chip-multiprocessor system, memory is a shared resource among multiple concurrently exec...
capable of executing instructions from multiple threads in the same cycle. SMT in fact was introduce...
The introduction of multicore/multithreaded processors, comprised of a large number of hardware cont...
The emergence of multicore and manycore processors is set to change the parallel computing world. Ap...
Large, high frequency single-core chip designs are increasingly being replaced with larger chip mult...
This paper describes a method to improve the cache locality of sequential programs by scheduling fin...