The performance potential of a value reuse mechanism depends on its reuse detection time, the number of reuse opportunities, and the amount of work saved by skipping each reuse unit. Since larger instruction groups typically have fewer reuse opportunities than smaller groups, but also provide greater benefit for each reuse-detection process, it is very important to find the balance point that provides the largest overall performance gain. We propose a new mechanism called sub-block reuse to balance the reuse granularity and the number of reuse opportunities. Our simulation results show that sub-block reuse with compiler assistance has a substantial and consistent potential to improve the performance of superscalar processors, with speedups ...
Superscalar processors currently have the potential to\ud fetch multiple basic blocks per cycle by e...
Modern CPUs rely on expensive branch predictors to speed up execution. Predictions nevertheless impl...
Contemporary superscalar processors employ large instruction window to tolerate long latency (mainly...
The fact that instructions in programs often produce repetitive results has motivated researchers to...
The fact that instructions in programs often produce repetitive results has motivated researchers to...
Superscalar microprocessors currently power the majority of computing machines. These processors ar...
Modern superscalar processors use advanced features like dynamic scheduling and speculative executio...
The foremost goal of superscalar processor design is to increase performance through the exploitatio...
Value locality is the phenomenon that a small number of values occur repeatedly in the same register...
Superscalar architectural techniques increase instruction throughput by increasing resources and usi...
This paper presents a study of the performance limits of data value reuse. Two types of data value r...
The main aim of this short paper is to investigate multiple-instruction-issue in a high-performance ...
We present a technique for ameliorating the detrimental impact of the true data dependencies that ul...
Superscalar and superpipelining techniques increase the overlap between the instructions in a pipeli...
Superscalar processors contain large, complex structures to hold data and instructions as they wait ...
Superscalar processors currently have the potential to\ud fetch multiple basic blocks per cycle by e...
Modern CPUs rely on expensive branch predictors to speed up execution. Predictions nevertheless impl...
Contemporary superscalar processors employ large instruction window to tolerate long latency (mainly...
The fact that instructions in programs often produce repetitive results has motivated researchers to...
The fact that instructions in programs often produce repetitive results has motivated researchers to...
Superscalar microprocessors currently power the majority of computing machines. These processors ar...
Modern superscalar processors use advanced features like dynamic scheduling and speculative executio...
The foremost goal of superscalar processor design is to increase performance through the exploitatio...
Value locality is the phenomenon that a small number of values occur repeatedly in the same register...
Superscalar architectural techniques increase instruction throughput by increasing resources and usi...
This paper presents a study of the performance limits of data value reuse. Two types of data value r...
The main aim of this short paper is to investigate multiple-instruction-issue in a high-performance ...
We present a technique for ameliorating the detrimental impact of the true data dependencies that ul...
Superscalar and superpipelining techniques increase the overlap between the instructions in a pipeli...
Superscalar processors contain large, complex structures to hold data and instructions as they wait ...
Superscalar processors currently have the potential to\ud fetch multiple basic blocks per cycle by e...
Modern CPUs rely on expensive branch predictors to speed up execution. Predictions nevertheless impl...
Contemporary superscalar processors employ large instruction window to tolerate long latency (mainly...