We articulate the need for managing (data) locality automatically rather than leaving it to the programmer, especially in parallel programming systems. To this end, we propose techniques for coupling tightly the computation (including the thread scheduler) and the memory manager so that data and computation can be positioned closely in hardware. Such tight coupling of computation and memory management is in sharp contrast with the prevailing practice of considering each in isolation. For example, memory-management techniques usually abstract the computation as an unknown "mutator", which is treated as a "black box". As an example of the approach, in this paper we consider a specific class of parallel computations, nested-parallel computatio...
The evolution of computing technology towards the ultimate physical limits makes communication the d...
AbstractWe present an efficient memory management scheme for concurrent programming languages where ...
This paper presents a scheme to manage heap data in the local memory present in each core of a limit...
We articulate the need for managing (data) locality automatically rather than leaving it to the prog...
International audienceAn important feature of functional programs is that they are parallel by defau...
It is often assumed that computational load balance cannot be achieved in parallel and distributed s...
The task parallel programming model allows programmers to express concurrency at a high level of abs...
This work explores the tradeoffs of the memory system of a new massively parallel multiprocessor in ...
On recent high-performance multiprocessors, there is a potential conflict between the goals of achie...
Improving program locality has become increasingly important on modern computer systems. An effectiv...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
Development of scalable application codes requires an understanding and exploitation of the locality...
Abstract — The development of efficient parallel out-of-core applications is often tedious, because ...
The evolution of computing technology towards the ultimate physical limits makes communication the d...
AbstractWe present an efficient memory management scheme for concurrent programming languages where ...
This paper presents a scheme to manage heap data in the local memory present in each core of a limit...
We articulate the need for managing (data) locality automatically rather than leaving it to the prog...
International audienceAn important feature of functional programs is that they are parallel by defau...
It is often assumed that computational load balance cannot be achieved in parallel and distributed s...
The task parallel programming model allows programmers to express concurrency at a high level of abs...
This work explores the tradeoffs of the memory system of a new massively parallel multiprocessor in ...
On recent high-performance multiprocessors, there is a potential conflict between the goals of achie...
Improving program locality has become increasingly important on modern computer systems. An effectiv...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
Development of scalable application codes requires an understanding and exploitation of the locality...
Abstract — The development of efficient parallel out-of-core applications is often tedious, because ...
The evolution of computing technology towards the ultimate physical limits makes communication the d...
AbstractWe present an efficient memory management scheme for concurrent programming languages where ...
This paper presents a scheme to manage heap data in the local memory present in each core of a limit...