Multi-socket Multi-core architectures with shared caches in each socket have become mainstream when a single multi-core chip cannot provide enough computing capacity for high performance computing. However, traditional task-stealing schedulers tend to pollute the shared cache and incur severe cache misses due to their randomness in stealing. To ad-dress the problem, this paper proposes a Cache Aware Task-Stealing (CATS) scheduler, which uses the shared cache effi-ciently with an online profiling method and schedules tasks with shared data to the same socket. CATS adopts an online DAG partitioner based on the profiling information to en-sure tasks with shared data can efficiently utilize the shared cache. One outstanding novelty of CATS is t...
International audienceMulti-core architectures are well suited to ful ll the increasing performance ...
Task-based dataflow programming models and runtimes em-erge as promising candidates for programming ...
Effective cache utilization is critical to performance in chip-multiprocessor systems (CMP). Modern ...
Since different companies are introducing new capabilities and features on their products, the dema...
Computational task DAGs are executed on parallel computers by a task scheduling algorithm. Intellige...
In chip multiprocessors (CMPs), limiting the number of off-chip cache misses is crucial for good per...
In a multicore system, effective management of shared last level cache (LLC), such as hardware/softw...
This paper studies the data locality of the work-stealing scheduling algorithm on hardware-controlle...
The fork-join paradigm of concurrent expression has gained popularity in conjunction with work-steal...
International audienceWith the recent advent of many-core architectures such as chip multiprocessors...
Growing processing demand on multi-tasking real-time systems can be met by employing scalable multi-...
Abstract“The On Chip NUMA Architectures (OCNA) introduce a new challenge namely memory-latency to th...
Manycore processors, with tens to hundreds of tiny cores but no hardware-based cache coherence, can ...
Directed acyclic graph (DAG)-aware task scheduling algorithms have been studied extensively in recen...
As the trend of more cores sharing common resources on a single die and more systems crammed into en...
International audienceMulti-core architectures are well suited to ful ll the increasing performance ...
Task-based dataflow programming models and runtimes em-erge as promising candidates for programming ...
Effective cache utilization is critical to performance in chip-multiprocessor systems (CMP). Modern ...
Since different companies are introducing new capabilities and features on their products, the dema...
Computational task DAGs are executed on parallel computers by a task scheduling algorithm. Intellige...
In chip multiprocessors (CMPs), limiting the number of off-chip cache misses is crucial for good per...
In a multicore system, effective management of shared last level cache (LLC), such as hardware/softw...
This paper studies the data locality of the work-stealing scheduling algorithm on hardware-controlle...
The fork-join paradigm of concurrent expression has gained popularity in conjunction with work-steal...
International audienceWith the recent advent of many-core architectures such as chip multiprocessors...
Growing processing demand on multi-tasking real-time systems can be met by employing scalable multi-...
Abstract“The On Chip NUMA Architectures (OCNA) introduce a new challenge namely memory-latency to th...
Manycore processors, with tens to hundreds of tiny cores but no hardware-based cache coherence, can ...
Directed acyclic graph (DAG)-aware task scheduling algorithms have been studied extensively in recen...
As the trend of more cores sharing common resources on a single die and more systems crammed into en...
International audienceMulti-core architectures are well suited to ful ll the increasing performance ...
Task-based dataflow programming models and runtimes em-erge as promising candidates for programming ...
Effective cache utilization is critical to performance in chip-multiprocessor systems (CMP). Modern ...