It is difficult to achieve high performance while programming in the large. In particular, maintaining locality hinders portability and modularity. Existing methodologies are not sufficient: explicit communication and coding for locality require the programmer to vio-late encapsulation and compositionality of software modules, while automated compiler analysis remains unreliable. This thesis presents a performance model that makes thread and object locality explicit. Zones form a runtime hierarchy that reflects the intended clustering of threads and objects, which are dynamically mapped onto hardware units such as processor clusters, pages, or cache lines. This conceptual indirection allows programmers to reason in the abstract about locali...
A memory abstraction is an abstraction layer between the program execution and the memory that provi...
Numerical software for sequential or parallel machines with memory hierarchies can benefit from loca...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...
The cost of data movement has always been an important concern in high performance computing (HPC) s...
The evolution of computing technology towards the ultimate physical limits makes communication the d...
In memory hierarchies, programs can be speeded up by increasing their degree of locality. This paper...
Recently, multi-cores chips have become omnipresent in computer systems ranging from high-end server...
This work explores the tradeoffs of the memory system of a new massively parallel multiprocessor in ...
Development of scalable application codes requires an understanding and exploitation of the locality...
This thesis covers the design and implementation of Legion, a new programming model and runtime syst...
The widening gap between processor speed and main memory speed has generated interest in compiletime...
A desirable concurrency semantics to provide for programs is region serializability. This strong sem...
Abstract—Modern computing platforms are increasingly com-plex, with multiple cores, shared caches, a...
Abstract: Locality is a universal behavior of all computational processes: They tend to refer repeat...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...
A memory abstraction is an abstraction layer between the program execution and the memory that provi...
Numerical software for sequential or parallel machines with memory hierarchies can benefit from loca...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...
The cost of data movement has always been an important concern in high performance computing (HPC) s...
The evolution of computing technology towards the ultimate physical limits makes communication the d...
In memory hierarchies, programs can be speeded up by increasing their degree of locality. This paper...
Recently, multi-cores chips have become omnipresent in computer systems ranging from high-end server...
This work explores the tradeoffs of the memory system of a new massively parallel multiprocessor in ...
Development of scalable application codes requires an understanding and exploitation of the locality...
This thesis covers the design and implementation of Legion, a new programming model and runtime syst...
The widening gap between processor speed and main memory speed has generated interest in compiletime...
A desirable concurrency semantics to provide for programs is region serializability. This strong sem...
Abstract—Modern computing platforms are increasingly com-plex, with multiple cores, shared caches, a...
Abstract: Locality is a universal behavior of all computational processes: They tend to refer repeat...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...
A memory abstraction is an abstraction layer between the program execution and the memory that provi...
Numerical software for sequential or parallel machines with memory hierarchies can benefit from loca...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...