Achieving low load-to-use latency with low energy and storage overheads is critical for performance. Existing techniques either prefetch into the pipeline (via address prediction and validation) or provide data reuse in the pipeline (via register sharing or LO caches). These techniques provide a range of tradeoffs between latency, reuse, and overhead. In this work, we present a pipeline prefetching technique that achieves state-of-the-art performance and data reuse without additional data storage, data movement, or validation overheads by adding address tags to the register file. Our addition of register file tags allows us to forward (reuse) load data from the register file with no additional data movement, keep the data alive in the regis...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for im...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
Ever increasing memory latencies and deeper pipelines push memory farther from the processor. Prefet...
An effective method for reducing the effect of load latency in modern processors is data prefetching...
The large latency of memory accesses in modern computer systems is a key obstacle to achieving high ...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
Energy efficiency is becoming a major constraint in processor designs. Every component of the proces...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
To improve the performance and energy-efficiency of in-order processors, this paper proposes a novel...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
In this paper, we present our design of a high performance prefetcher, which exploits various locali...
Prefetching has emerged as one of the most successful techniques to bridge the gap between modern pr...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for im...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
Ever increasing memory latencies and deeper pipelines push memory farther from the processor. Prefet...
An effective method for reducing the effect of load latency in modern processors is data prefetching...
The large latency of memory accesses in modern computer systems is a key obstacle to achieving high ...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
Energy efficiency is becoming a major constraint in processor designs. Every component of the proces...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
To improve the performance and energy-efficiency of in-order processors, this paper proposes a novel...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
In this paper, we present our design of a high performance prefetcher, which exploits various locali...
Prefetching has emerged as one of the most successful techniques to bridge the gap between modern pr...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for im...
Processor performance has increased far faster than memories have been able to keep up with, forcing...