The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is de-fined. Then the specific areas of register renaming, instruction win-dow wakeup and selection logic, and operand bypassing are ana-lyzed. Each is modeled and Spice simulated for feature sizes of 0:8m, 0:35m, and 0:18m. Performance results and trends are expressed in terms of issue width and window size. Our analysis in-dicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future. A microarchitecture that simplifies wakeup and selection logic is proposed and discussed. This implementation puts chains of de-pendent instructions into queues, and issues in...
In the problem size-ensemble size plane, fixed-sized and scaled-sized paradigms have been the subset...
Superscalar and VLIW processors can both execute multiple instructions each cycle. Each employs a di...
Contemporary superscalar processors employ large instruction window to tolerate long latency (mainly...
To characterize future performance limitations of superscalar processors, the delays of key pipeline...
The advance of integration allows implementation of very wide issue superscalar processors on a sing...
The main aim of this short paper is to investigate multiple-instruction-issue in a high-performance ...
The poor scalability of existing superscalar processors has been of great concern to the computer en...
A mechanistic model for out-of-order superscalar processors is developed and then applied to the stu...
We present a simple technique for instruction-level parallelism and analyze its performance impact. ...
High performance superscalar microarchitectures exploit instruction-level parallelism (ILP) to impro...
LaZy Superscalar is a processor architecture which delays the execution of fetched instructions unti...
Current superscalar processors feature 64-bit datapaths to execute the program instructions, regardl...
In modern superscalar processors, the complex instruction scheduler could form the critical path of ...
A Large instruction window is a key requirement to exploit greater Instruction Level Parallelism in ...
RingScalar is a complexity-effective microarchitecture for out-of-order superscalar processors, that...
In the problem size-ensemble size plane, fixed-sized and scaled-sized paradigms have been the subset...
Superscalar and VLIW processors can both execute multiple instructions each cycle. Each employs a di...
Contemporary superscalar processors employ large instruction window to tolerate long latency (mainly...
To characterize future performance limitations of superscalar processors, the delays of key pipeline...
The advance of integration allows implementation of very wide issue superscalar processors on a sing...
The main aim of this short paper is to investigate multiple-instruction-issue in a high-performance ...
The poor scalability of existing superscalar processors has been of great concern to the computer en...
A mechanistic model for out-of-order superscalar processors is developed and then applied to the stu...
We present a simple technique for instruction-level parallelism and analyze its performance impact. ...
High performance superscalar microarchitectures exploit instruction-level parallelism (ILP) to impro...
LaZy Superscalar is a processor architecture which delays the execution of fetched instructions unti...
Current superscalar processors feature 64-bit datapaths to execute the program instructions, regardl...
In modern superscalar processors, the complex instruction scheduler could form the critical path of ...
A Large instruction window is a key requirement to exploit greater Instruction Level Parallelism in ...
RingScalar is a complexity-effective microarchitecture for out-of-order superscalar processors, that...
In the problem size-ensemble size plane, fixed-sized and scaled-sized paradigms have been the subset...
Superscalar and VLIW processors can both execute multiple instructions each cycle. Each employs a di...
Contemporary superscalar processors employ large instruction window to tolerate long latency (mainly...