This work addresses the problem of the increasing performance disparity between the microprocessor and memory subsystem. Current L1 caches fabricated in deep submicron processes must either shrink to maintain timing, or suffer higher latencies, exacerbating the problem. We introduce a new classification for the behavior of memory traffic, which we refer to as target behavior. Classification of the target behavior falls into two categories: Uni-Targeted Instructions (UTI) and Multi-Targeted Instructions (MTI). On average, 30% of all dynamic memory LD/ST operations come from execution of UTIs, yet only a few hundred static instructions are actually UTIs. This makes isolation of the UTI targets an avenue for optimization. The addition of a sma...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...
Caches mitigate the long memory latency that limits the performance of modern processors. However, c...
Memory latency has become an important performance bottleneck in current microprocessors. This probl...
Journal ArticleAlthough microprocessor performance continues to increase at a rapid pace, the growin...
Distinguishing transient blocks from frequently used blocks enables servicing references to transien...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
This dissertation analyzes a way to improve cache performance via active management of a target cach...
textOne of the major limiters to computer system performance has been the access to main memory, wh...
High instruction fetch bandwidth is essential for high performance in today’s wide-issue outof-order...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
The “Memory Wall”, the vast gulf between processor execution speed and memory latency, has led to th...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
High instruction fetch bandwidth is essential for high performance in today’s wide-issue out-of-orde...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...
Caches mitigate the long memory latency that limits the performance of modern processors. However, c...
Memory latency has become an important performance bottleneck in current microprocessors. This probl...
Journal ArticleAlthough microprocessor performance continues to increase at a rapid pace, the growin...
Distinguishing transient blocks from frequently used blocks enables servicing references to transien...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
This dissertation analyzes a way to improve cache performance via active management of a target cach...
textOne of the major limiters to computer system performance has been the access to main memory, wh...
High instruction fetch bandwidth is essential for high performance in today’s wide-issue outof-order...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
The “Memory Wall”, the vast gulf between processor execution speed and memory latency, has led to th...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
High instruction fetch bandwidth is essential for high performance in today’s wide-issue out-of-orde...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...