The memory system remains a major performance bottleneck in modern and future architectures. In this dissertation, we propose a hardware/software cooperative approach and demonstrate its effectiveness. This approach combines the global yet imperfect view of the compiler with the timely yet narrow-scope context of the hardware. It relies on a light-weight extension to the instruction set architecture to convey compile-time knowledge (hints) to the hardware. The hardware then uses these hints to make better decisions. Our work shows that a cooperative hardware/software approach to (1) cache replacement, (2) prefetching, and (3) their combination eliminates or tolerates much of the memory performance bottleneck. (1) Our work enhances cache rep...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
The widening gap between processor and memory speeds renders data locality optimization a very impor...
The memory system remains a major performance bottleneck in modern and future architectures. In this...
Data prefetching is an effective technique to hide memory latency and thus bridge the increasing pro...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
Cooperative caching seeks to improve memory system performance by using compiler locality hints to a...
Set-associative caches are traditionally managed using hardwarebased lookup and replacement schemes ...
An ideal high performance computer includes a fast processor and a multi-million byte memory of comp...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2014.On most modern computers, ...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
The performance of the memory hierarchy has become one of the most critical elements in the performa...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
The widening gap between processor and memory speeds renders data locality optimization a very impor...
The memory system remains a major performance bottleneck in modern and future architectures. In this...
Data prefetching is an effective technique to hide memory latency and thus bridge the increasing pro...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
Cooperative caching seeks to improve memory system performance by using compiler locality hints to a...
Set-associative caches are traditionally managed using hardwarebased lookup and replacement schemes ...
An ideal high performance computer includes a fast processor and a multi-million byte memory of comp...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2014.On most modern computers, ...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
The performance of the memory hierarchy has become one of the most critical elements in the performa...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
The widening gap between processor and memory speeds renders data locality optimization a very impor...