While runahead execution is effective at parallelizing independent long-latency cache misses, it is unable to parallelize dependent long-latency cache misses. To overcome this limitation, this paper proposes a novel hardware technique, address-value delta (AVD) prediction. An AVD predictor keeps track of the address (pointer) load instructions for which the arithmetic difference (i.e., delta) between the effective address and the data value is stable. If such a load instruction incurs a long-latency cache miss during runahead execution, its data value is predicted by subtracting the stable delta from its effective address. This prediction enables the pre-execution of dependent instructions, including load instructions that incur long-latenc...
With the increasing performance gap between the processor and the memory, the importance of caches i...
Prior work in hardware prefetching has focused mostly on either predicting regular streams with unif...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
textHigh-performance processors tolerate latency using out-of-order execution. Unfortunately, today...
Hard-to-predict branches depending on long-latency cache-misses have been recognized as a major perf...
To improve the performance and energy-efficiency of in-order processors, this paper proposes a novel...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
Value prediction improves instruction level parallelism in superscalar processors by breaking true d...
One major restriction to the performance of out-of-order superscalar processors is the latency of lo...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
Value prediction attempts to eliminate true-data dependencies by dynamically predicting the outcome ...
With the increasing performance gap between the processor and the memory, the importance of caches i...
Prior work in hardware prefetching has focused mostly on either predicting regular streams with unif...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
textHigh-performance processors tolerate latency using out-of-order execution. Unfortunately, today...
Hard-to-predict branches depending on long-latency cache-misses have been recognized as a major perf...
To improve the performance and energy-efficiency of in-order processors, this paper proposes a novel...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
Value prediction improves instruction level parallelism in superscalar processors by breaking true d...
One major restriction to the performance of out-of-order superscalar processors is the latency of lo...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
Value prediction attempts to eliminate true-data dependencies by dynamically predicting the outcome ...
With the increasing performance gap between the processor and the memory, the importance of caches i...
Prior work in hardware prefetching has focused mostly on either predicting regular streams with unif...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...