Trace-level reuse is based on the observation that some traces (dynamic sequences of instructions) are frequently repeated during the execution of a program, and in many cases, the instructions that make up such traces have the same source operand values. The execution of such traces will obviously produce the same outcome and thus, their execution can be skipped if the processor records the outcome of previous executions. This paper presents an analysis of the performance potential of trace-level reuse and discusses a preliminary realistic implementation. Like instruction-level reuse, trace-level reuse can improve performance by decreasing resource contention and the latency of some instructions. However, we show that trace-level reuse is ...
Abstract. Memory traces record the addresses touched by a program during its execution, enabling man...
As the instruction issue width of superscalar proces-sors increases, instruction fetch bandwidth req...
As the issue width of superscalar processors is increased, instruction fetch bandwidth requirements ...
Trace-level reuse is based on the observation that some traces (dynamic sequences of instructions) a...
This paper presents a study of the performance limits of data value reuse. Two types of data value r...
Processors that can simultaneously execute multiple paths of execution will only exacerbate the fetc...
Superscalar microprocessors currently power the majority of computing machines. These processors ar...
In trace processors, a sequential program is partitioned at run time into "traces." A tra...
The fact that instructions in programs often produce repetitive results has motivated researchers to...
In high-performance processors, increasing the number of instructions fetched and executed in parall...
Instruction Reuse is a microarchitectural technique that exploits dynamic instruction repetition to ...
Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetch...
The fact that instructions in programs often produce repetitive results has motivated researchers to...
The objective of this paper is to improve the use of the hardware resources of the trace cache mecha...
The objective of this paper is to improve the use of the hardware resources of the trace cache mecha...
Abstract. Memory traces record the addresses touched by a program during its execution, enabling man...
As the instruction issue width of superscalar proces-sors increases, instruction fetch bandwidth req...
As the issue width of superscalar processors is increased, instruction fetch bandwidth requirements ...
Trace-level reuse is based on the observation that some traces (dynamic sequences of instructions) a...
This paper presents a study of the performance limits of data value reuse. Two types of data value r...
Processors that can simultaneously execute multiple paths of execution will only exacerbate the fetc...
Superscalar microprocessors currently power the majority of computing machines. These processors ar...
In trace processors, a sequential program is partitioned at run time into "traces." A tra...
The fact that instructions in programs often produce repetitive results has motivated researchers to...
In high-performance processors, increasing the number of instructions fetched and executed in parall...
Instruction Reuse is a microarchitectural technique that exploits dynamic instruction repetition to ...
Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetch...
The fact that instructions in programs often produce repetitive results has motivated researchers to...
The objective of this paper is to improve the use of the hardware resources of the trace cache mecha...
The objective of this paper is to improve the use of the hardware resources of the trace cache mecha...
Abstract. Memory traces record the addresses touched by a program during its execution, enabling man...
As the instruction issue width of superscalar proces-sors increases, instruction fetch bandwidth req...
As the issue width of superscalar processors is increased, instruction fetch bandwidth requirements ...