It is often assumed that computational load balance cannot be achieved in parallel and distributed systems without the use of a priori domain knowledge, including precedence constraints and locality information. Hence, in distributed memory architectures, locality maintenance and load balancing are seen as user level activities involving compiler and runtime system support in software. Most efforts on locality conscious data remapping for load balancing require the availability of a global data dependency graph. All such software schemes need an explicit phase for the remapping-system execution, where the application execution is halted. These schemes view load-balancing and locality maintenance as top-down problems in the sense that the us...
Abstract — The development of efficient parallel out-of-core applications is often tedious, because ...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...
Recently, multi-cores chips have become omnipresent in computer systems ranging from high-end server...
It is often assumed that computational load balance cannot be achieved in parallel and distributed s...
Traditionally, in distributed memory architectures, locality maintenance and load balancing are seen...
We articulate the need for managing (data) locality automatically rather than leaving it to the prog...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
Modern computer architectures expose an increasing number of parallel features supported by complex ...
This paper presents a simple load balancing algorithm and its probabilistic analysis. Unlike most of...
The task parallel programming model allows programmers to express concurrency at a high level of abs...
We define a set of overhead functions that capture the salient artifacts representing the interactio...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
Abstract—The emergence of multi-core systems opens new opportunities for thread-level parallelism an...
Abstract—Implementing locality-aware scheduling algo-rithms using fine-programming models may genera...
Task parallelism as employed by the OpenMP task construct or some Intel Threading Building Blocks (T...
Abstract — The development of efficient parallel out-of-core applications is often tedious, because ...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...
Recently, multi-cores chips have become omnipresent in computer systems ranging from high-end server...
It is often assumed that computational load balance cannot be achieved in parallel and distributed s...
Traditionally, in distributed memory architectures, locality maintenance and load balancing are seen...
We articulate the need for managing (data) locality automatically rather than leaving it to the prog...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
Modern computer architectures expose an increasing number of parallel features supported by complex ...
This paper presents a simple load balancing algorithm and its probabilistic analysis. Unlike most of...
The task parallel programming model allows programmers to express concurrency at a high level of abs...
We define a set of overhead functions that capture the salient artifacts representing the interactio...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
Abstract—The emergence of multi-core systems opens new opportunities for thread-level parallelism an...
Abstract—Implementing locality-aware scheduling algo-rithms using fine-programming models may genera...
Task parallelism as employed by the OpenMP task construct or some Intel Threading Building Blocks (T...
Abstract — The development of efficient parallel out-of-core applications is often tedious, because ...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...
Recently, multi-cores chips have become omnipresent in computer systems ranging from high-end server...