While runahead execution is effective at parallelizing independent long-latency cache misses, it is unable to parallelize dependent long-latency cache misses. To overcome this limitation, this paper proposes a novel technique, address-value delta (AVD) prediction. An AVD predictor keeps track of the address (pointer) load instructions for which the arithmetic difference (i.e., delta) between the effective address and the data value is stable. If such a load instruction incurs a long-latency cache miss during runahead execution, its data value is predicted by subtracting the stable delta from its effective address. This prediction enables the pre-execution of dependent instructions, including load instructions that incur long-latency cache m...
Value prediction attempts to eliminate true-data dependencies by dynamically predicting the outcome ...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
The execution time of programs that have large working sets is substantially increased by the overhe...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
Hard-to-predict branches depending on long-latency cache-misses have been recognized as a major perf...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
textHigh-performance processors tolerate latency using out-of-order execution. Unfortunately, today...
Abstract — Recent works have proposed the use of prediction techniques to execute speculatively true...
To improve the performance and energy-efficiency of in-order processors, this paper proposes a novel...
Value prediction improves instruction level parallelism in superscalar processors by breaking true d...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
One major restriction to the performance of out-of-order superscalar processors is the latency of lo...
Abstract—An increasing cache latency in next-generation pro-cessors incurs profound performance impa...
With the increasing performance gap between the processor and the memory, the importance of caches i...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
Value prediction attempts to eliminate true-data dependencies by dynamically predicting the outcome ...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
The execution time of programs that have large working sets is substantially increased by the overhe...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
Hard-to-predict branches depending on long-latency cache-misses have been recognized as a major perf...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
textHigh-performance processors tolerate latency using out-of-order execution. Unfortunately, today...
Abstract — Recent works have proposed the use of prediction techniques to execute speculatively true...
To improve the performance and energy-efficiency of in-order processors, this paper proposes a novel...
Value prediction improves instruction level parallelism in superscalar processors by breaking true d...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
One major restriction to the performance of out-of-order superscalar processors is the latency of lo...
Abstract—An increasing cache latency in next-generation pro-cessors incurs profound performance impa...
With the increasing performance gap between the processor and the memory, the importance of caches i...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
Value prediction attempts to eliminate true-data dependencies by dynamically predicting the outcome ...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
The execution time of programs that have large working sets is substantially increased by the overhe...