The exploitation of locality of reference in shared memory multiprocessors is one of the most important problems in parallel processing today. Locality can be managed in several levels: hardware, operating system, runtime environment of the compiler, user level. In this paper we investigate the problem of exploiting locality at the operating system level and its interactions with the compiler and the architecture. Our main conclusion, based on trace-driven simulations of real applications, is that exploitation of locality is effective only if all three levels cooperate. The compiler should do sophisticated data alignment, the operating system should perform on-line caching and page replication, while the architecture should provide simple b...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
The performance of cache memories relies on the locality exhibited by programs. Traditionally this l...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
The exploitation of locality of reference in shared memory multiprocessors is one of the most import...
Improving program locality has become increasingly important on modern computer systems. An effectiv...
The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programm...
Data locality is a well-recognized requirement for the development of any parallel application, but ...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
We have developed compiler algorithms that analyze coarse-grained, explicitly parallel programs and ...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...
This work explores the tradeoffs of the memory system of a new massively parallel multiprocessor in ...
On recent high-performance multiprocessors, there is a potential conflict between the goals of achie...
We propose a synthetic address trace generation model which combine the accuracy advantage of trace-...
Applications often under-utilize cache space and there are no software locality optimization techniq...
We present a unified approach to locality optimization that employs both data and control transforma...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
The performance of cache memories relies on the locality exhibited by programs. Traditionally this l...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
The exploitation of locality of reference in shared memory multiprocessors is one of the most import...
Improving program locality has become increasingly important on modern computer systems. An effectiv...
The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programm...
Data locality is a well-recognized requirement for the development of any parallel application, but ...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
We have developed compiler algorithms that analyze coarse-grained, explicitly parallel programs and ...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...
This work explores the tradeoffs of the memory system of a new massively parallel multiprocessor in ...
On recent high-performance multiprocessors, there is a potential conflict between the goals of achie...
We propose a synthetic address trace generation model which combine the accuracy advantage of trace-...
Applications often under-utilize cache space and there are no software locality optimization techniq...
We present a unified approach to locality optimization that employs both data and control transforma...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
The performance of cache memories relies on the locality exhibited by programs. Traditionally this l...
The speed of processors increases much faster than the memory access time. This makes memory accesse...