Abstract—Implementing locality-aware scheduling algo-rithms using fine-programming models may generate schedul-ing overheads due to the potential elevated number of tasks. In order to reduce such overhead, while increasing at the same time data locality in multithreaded applications, this paper proposes a new technique named Locality-Driven Code Scheduling (LDCS). LDCS uses the data dependency graph of an application to identify the tasks writing to a common chunk of data and groups them into a single coarse-grain construct called super-task. LDCS uses fine-grain synchronization to start the execution of a super-task, but relaxes the constraints of classical macro-dataflow models by signaling a super-task in the middle of its execution to f...
Lazy scheduling is a runtime scheduler for task-parallel codes that effectively coarsens parallelism...
Massively parallel processing devices, like Graphics Processing Units (GPUs), have the ability to ac...
Clustering is a common technique to deal with wire delays. Fully-distributed architectures, where th...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
As chip multiprocessors proliferate, programming support for these devices is likely to receive a lo...
Recent trend has made it clear that the processor makers are committed to the multi-core chip design...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
It is often assumed that computational load balance cannot be achieved in parallel and distributed s...
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support th...
Modern computer architectures expose an increasing number of parallel features supported by complex ...
Efficiently scheduling application concurrency to system level resources is one of the main challeng...
Part 4: Applications of Parallel and Distributed ComputingInternational audienceOrdinary programs co...
The task parallel programming model allows programmers to express concurrency at a high level of abs...
Computing systems have undergone a fundamental transformation from single core devices to devices wi...
Making computer systems more energy efficient while obtaining the maximum performance possible is ke...
Lazy scheduling is a runtime scheduler for task-parallel codes that effectively coarsens parallelism...
Massively parallel processing devices, like Graphics Processing Units (GPUs), have the ability to ac...
Clustering is a common technique to deal with wire delays. Fully-distributed architectures, where th...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
As chip multiprocessors proliferate, programming support for these devices is likely to receive a lo...
Recent trend has made it clear that the processor makers are committed to the multi-core chip design...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
It is often assumed that computational load balance cannot be achieved in parallel and distributed s...
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support th...
Modern computer architectures expose an increasing number of parallel features supported by complex ...
Efficiently scheduling application concurrency to system level resources is one of the main challeng...
Part 4: Applications of Parallel and Distributed ComputingInternational audienceOrdinary programs co...
The task parallel programming model allows programmers to express concurrency at a high level of abs...
Computing systems have undergone a fundamental transformation from single core devices to devices wi...
Making computer systems more energy efficient while obtaining the maximum performance possible is ke...
Lazy scheduling is a runtime scheduler for task-parallel codes that effectively coarsens parallelism...
Massively parallel processing devices, like Graphics Processing Units (GPUs), have the ability to ac...
Clustering is a common technique to deal with wire delays. Fully-distributed architectures, where th...