Two orthogonal hardware techniques, table-based address prediction and early address calculation, for reducing the latency of load instructions have been recently proposed. The key idea behind both of these techniques is to speculatively perform loads early in the processor pipeline using predicted values for the loads ' addresses. These techniques have required either a large hardware table or complex register bypass logic to be implemented in order to accurately predict the important loads in the presence ofalarge number of less-important loads. This paper proposes a compilerdirected approach that allows a streamlined version of both of these techniques to be e ectively used together. The compiler provides directives to indicate whic...
Achieving low load-to-use latency with low energy and storage overheads is critical for performance....
Value prediction improves instruction level parallelism in superscalar processors by breaking true d...
memory disambiguation, load-forwarding, speculation The superscalar processor must issue instruction...
One major restriction to the performance of out-of-order superscalar processors is the latency of lo...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
Register promotion is an optimization that allocates a value to a register for a region of its lifet...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
The pervasive use of pointers with complicated patterns in C programs often constrains compiler alia...
Abstract—An increasing cache latency in next-generation pro-cessors incurs profound performance impa...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
An increasing cache latency in future processors incurs profound performance impacts in spite of adv...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
Achieving low load-to-use latency with low energy and storage overheads is critical for performance....
Value prediction improves instruction level parallelism in superscalar processors by breaking true d...
memory disambiguation, load-forwarding, speculation The superscalar processor must issue instruction...
One major restriction to the performance of out-of-order superscalar processors is the latency of lo...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
Register promotion is an optimization that allocates a value to a register for a region of its lifet...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
The pervasive use of pointers with complicated patterns in C programs often constrains compiler alia...
Abstract—An increasing cache latency in next-generation pro-cessors incurs profound performance impa...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
An increasing cache latency in future processors incurs profound performance impacts in spite of adv...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
Achieving low load-to-use latency with low energy and storage overheads is critical for performance....
Value prediction improves instruction level parallelism in superscalar processors by breaking true d...
memory disambiguation, load-forwarding, speculation The superscalar processor must issue instruction...