The ever-increasing computational power of contemporary microprocessors reduces the execution time spent on arithmetic computations (i.e., the computations not involving slow memory operations such as cache misses) significantly. Therefore, for memory-intensive workloads, it becomes more important to overlap multiple cache misses than to overlap slow memory operations with other computations. In this paper, we propose a novel technique to parallelize sequential cache misses, thereby increasing memory-level parallelism (MLP). Our idea is based on value prediction, which was proposed originally as an instruction-level parallelism (ILP) optimization to break true data dependencies. In this paper, we advocate value prediction in its capability ...
Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory acc...
Even in the multicore era, making single cores faster is paramount to achieve high- performance comp...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
Recent trends regarding general purpose microprocessors have focused on Thread-Level Parallelism (TL...
This paper demonstrates how to utilize the inherent error resilience of a wide range of applications...
Value prediction improves instruction level parallelism in superscalar processors by breaking true d...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
Instruction Level Parallelism (ILP) is one of the key issues to boost the performance of future gene...
Value prediction breaks data dependencies in a program thereby creating instruction level parallelis...
Value prediction attempts to eliminate true-data dependencies by dynamically predicting the outcome ...
Modern superscalar processors often suffer long stalls due to load misses in on-chip L2 caches. To a...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
This paper presents an experimental and analytical study of value prediction and its impact on specu...
Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory acc...
Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory acc...
Even in the multicore era, making single cores faster is paramount to achieve high- performance comp...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
Recent trends regarding general purpose microprocessors have focused on Thread-Level Parallelism (TL...
This paper demonstrates how to utilize the inherent error resilience of a wide range of applications...
Value prediction improves instruction level parallelism in superscalar processors by breaking true d...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
Instruction Level Parallelism (ILP) is one of the key issues to boost the performance of future gene...
Value prediction breaks data dependencies in a program thereby creating instruction level parallelis...
Value prediction attempts to eliminate true-data dependencies by dynamically predicting the outcome ...
Modern superscalar processors often suffer long stalls due to load misses in on-chip L2 caches. To a...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
This paper presents an experimental and analytical study of value prediction and its impact on specu...
Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory acc...
Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory acc...
Even in the multicore era, making single cores faster is paramount to achieve high- performance comp...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...