Dynamic task graph schedulers automatically balance work across processor cores by scheduling tasks among available threads while preserving dependences. In this paper, we design NabbitC, a provably efficient dynamic task graph scheduler that accounts for data locality on NUMA systems. NabbitC allows users to assign a color to each task representing the location (e.g., a processor core) that has the most efficient access to data needed during that node’s execution. NabbitC then automatically adjusts the scheduling so as to preferentially execute each node at the location that matches its color—leading to better locality because the node is likely to make local rather than remote accesses. At the same time, NabbitC tries to optimize load bal...
Task graphs are used for scheduling tasks on parallel processors when the tasks have dependencies. I...
It is now widely recognized that increased levels of parallelism is a necessary condition for improv...
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support th...
International audienceNowadays shared memory HPC platforms expose a large number of cores organized ...
The task parallel programming model allows programmers to express concurrency at a high level of abs...
Graph processing is increasingly bottlenecked by main memory accesses. On-chip caches are of little ...
Future multi- and many- core processors are likely to have tens of cores arranged in a tiled archite...
In this paper we concentrate on a crucial parameter for efficiency in Big Data and HPC applications:...
International audienceWe investigate efficient execution of computations, modeled as Directed Acycli...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
Modern computer architectures expose an increasing number of parallel features supported by complex ...
Application performance can degrade significantly due to node-local load imbalances during applicati...
International audienceWe present a joint scheduling and memory allocation algorithm for efficient ex...
Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can...
The era of manycore computing will bring new fundamental challenges that the techniques designed for...
Task graphs are used for scheduling tasks on parallel processors when the tasks have dependencies. I...
It is now widely recognized that increased levels of parallelism is a necessary condition for improv...
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support th...
International audienceNowadays shared memory HPC platforms expose a large number of cores organized ...
The task parallel programming model allows programmers to express concurrency at a high level of abs...
Graph processing is increasingly bottlenecked by main memory accesses. On-chip caches are of little ...
Future multi- and many- core processors are likely to have tens of cores arranged in a tiled archite...
In this paper we concentrate on a crucial parameter for efficiency in Big Data and HPC applications:...
International audienceWe investigate efficient execution of computations, modeled as Directed Acycli...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
Modern computer architectures expose an increasing number of parallel features supported by complex ...
Application performance can degrade significantly due to node-local load imbalances during applicati...
International audienceWe present a joint scheduling and memory allocation algorithm for efficient ex...
Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can...
The era of manycore computing will bring new fundamental challenges that the techniques designed for...
Task graphs are used for scheduling tasks on parallel processors when the tasks have dependencies. I...
It is now widely recognized that increased levels of parallelism is a necessary condition for improv...
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support th...