For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the design and evaluation of a fast address generation mechanism capable of eliminating the delays caused by effective address calculation for many loads and stores. Our approach works by predicting early in the pipeline (part of) the effective address of a memory access and using this predicted address to speculatively access the data cache. If the prediction is correct, the cache access is overlapped with non-speculative effective address calculation. Otherwise, the cache is accessed again in the following cycle, this time using the correct effective address. The impact on...
The execution time of programs that have large working sets is substantially increased by the overhe...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
Hard-to-predict branches depending on long-latency cache-misses have been recognized as a major perf...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
Two orthogonal hardware techniques, table-based address prediction and early address calculation, fo...
Abstract—An increasing cache latency in next-generation pro-cessors incurs profound performance impa...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...
An increasing cache latency in future processors incurs profound performance impacts in spite of adv...
With the increasing performance gap between the processor and the memory, the importance of caches i...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
L1 data caches in high-performance processors continue to grow in set associativity. Higher associat...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
L1 data caches in high-performance processors continue to grow in set associativity. Higher associat...
The execution time of programs that have large working sets is substantially increased by the overhe...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
Hard-to-predict branches depending on long-latency cache-misses have been recognized as a major perf...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
Two orthogonal hardware techniques, table-based address prediction and early address calculation, fo...
Abstract—An increasing cache latency in next-generation pro-cessors incurs profound performance impa...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...
An increasing cache latency in future processors incurs profound performance impacts in spite of adv...
With the increasing performance gap between the processor and the memory, the importance of caches i...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
L1 data caches in high-performance processors continue to grow in set associativity. Higher associat...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
L1 data caches in high-performance processors continue to grow in set associativity. Higher associat...
The execution time of programs that have large working sets is substantially increased by the overhe...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
Hard-to-predict branches depending on long-latency cache-misses have been recognized as a major perf...