One major restriction to the performance of out-of-order superscalar processors is the latency of loads. We propose to tolerate these latencies by executing loads earlier using a mechanism we called address prediction. The prediction of the address of the data to be loaded is made during the dispatch using the address of the instruction. Thus, loads are executed earlier and, provided the prediction is good, their result might be used as soon as the address is verified. The address prediction technique looks like prefetching techniques but presents a higher gain because the value is directly fetched into the processor rather than in the L1 cache only. Keywords: Architecture, out-of-order superscalar processors, data preloading Table des mat...
The execution time of programs that have large working sets is substantially increased by the overhe...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
By exploiting fine grain parallelism, superscalar processors can potentially increase the performanc...
Two orthogonal hardware techniques, table-based address prediction and early address calculation, fo...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
Value prediction improves instruction level parallelism in superscalar processors by breaking true d...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
International audienceLong-latency load requests continue to limit the performance of high-performan...
Even in the multicore era, making single cores faster is paramount to achieve high- performance comp...
The execution time of programs that have large working sets is substantially increased by the overhe...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
By exploiting fine grain parallelism, superscalar processors can potentially increase the performanc...
Two orthogonal hardware techniques, table-based address prediction and early address calculation, fo...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
Value prediction improves instruction level parallelism in superscalar processors by breaking true d...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
International audienceLong-latency load requests continue to limit the performance of high-performan...
Even in the multicore era, making single cores faster is paramount to achieve high- performance comp...
The execution time of programs that have large working sets is substantially increased by the overhe...
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-blo...
By exploiting fine grain parallelism, superscalar processors can potentially increase the performanc...