Untolerated load instruction latencies often have a significant impact on overall program performance. As one means of miti-gating this effect, we present an aggressive hardware-based mech-anism that provides effective support for reducing the latency of load instructions. Through the judicious use of instruction predecode, base regis-ter caching, and fast address calculation, it becomes possible to complete load instructions up to two cycles earlier than traditional pipeline designs. For a pipeline with one cycle data cache access, this results in what we term a zero-cycle load. A zero-cycle load produces a result prior to reaching the execute stage of the pipeline, allowing subsequent dependent instructions to issue unfettered by load dep...
This paper proposes a method of buffering instructions by software-based prefetching. The method all...
A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is...
Processor design techniques, such as pipelining, superscalar, and VLIW, have dramatically decreased ...
Untolerated load instruction latencies often have a significant impact on overall program performanc...
Abstract—An increasing cache latency in next-generation pro-cessors incurs profound performance impa...
The speed gap between processor and memory continues to limit performance. To address this problem, ...
The considerable gap between processor and DRAM speed and the power losses in the cache hierarchy ca...
The speed gap between processor and memory continues to limit performance. To address this problem, ...
Around 2003, newly activated power constraints caused single-thread performance growth to slow drama...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
Accommodating the uncertain latency of load instructions is one of the most vexing problems in in-or...
The speed gap between processor and memory continues to limit performance. To address this problem, ...
Pipelined microprocessors allow the simultaneous execution of several machine instructions at a time...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
Execution efficiency of memory instructions remains critically important. To this end, a plethora of...
This paper proposes a method of buffering instructions by software-based prefetching. The method all...
A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is...
Processor design techniques, such as pipelining, superscalar, and VLIW, have dramatically decreased ...
Untolerated load instruction latencies often have a significant impact on overall program performanc...
Abstract—An increasing cache latency in next-generation pro-cessors incurs profound performance impa...
The speed gap between processor and memory continues to limit performance. To address this problem, ...
The considerable gap between processor and DRAM speed and the power losses in the cache hierarchy ca...
The speed gap between processor and memory continues to limit performance. To address this problem, ...
Around 2003, newly activated power constraints caused single-thread performance growth to slow drama...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
Accommodating the uncertain latency of load instructions is one of the most vexing problems in in-or...
The speed gap between processor and memory continues to limit performance. To address this problem, ...
Pipelined microprocessors allow the simultaneous execution of several machine instructions at a time...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
Execution efficiency of memory instructions remains critically important. To this end, a plethora of...
This paper proposes a method of buffering instructions by software-based prefetching. The method all...
A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is...
Processor design techniques, such as pipelining, superscalar, and VLIW, have dramatically decreased ...