LaZy Superscalar is a processor architecture which delays the execution of fetched instructions until their results are needed by other instructions. This approach eliminates dead instructions and provides the necessary means to fuse dependent instructions across multiple control dependencies by explicitly tracking control and data dependencies through a matrix based scheduler. We present this novel redesign of scheduling, recovery and commit mechanisms and evaluate the performance of the proposed architecture. Our simulations using Spec 2006 benchmark suite indicate that LaZy Superscalar can achieve significant speed-ups while providing respectable power savings compared to a conventional superscalar processor
Modern superscalar architectures with dynamic scheduling and register renaming capabilities have int...
Superscalar architecture resulting in aggressive performance is a proven architecture for general pu...
High performance superscalar microarchitectures exploit instruction-level parallelism (ILP) to impro...
© 2015 ACM. LaZy Superscalar is a processor architecture which delays the execution of fetched instr...
It is increasingly accepted that superscalar processors can only achieve their full performance pote...
One of the main obstacles to exploiting the fine-grained parallelism that is available in general-pu...
We present a technique for ameliorating the detrimental impact of the true data dependencies that ul...
The foremost goal of superscalar processor design is to increase performance through the exploitatio...
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic su...
If a high-performance superscalar processor is to realise its full potential, the complier must re-o...
Super-scalar processors can execute multiple instructions out-of-order per cycle and speculatively ...
Modern superscalar processors use wide instruction issue widths and out-of-order execution in order ...
The main aim of this short paper is to investigate multiple-instruction-issue in a high-performance ...
While delayed branch mechanisms were popular with the designers of RISC processors, most superscalar...
Lazy scheduling is a runtime scheduler for task-parallel codes that effectively coarsens parallelism...
Modern superscalar architectures with dynamic scheduling and register renaming capabilities have int...
Superscalar architecture resulting in aggressive performance is a proven architecture for general pu...
High performance superscalar microarchitectures exploit instruction-level parallelism (ILP) to impro...
© 2015 ACM. LaZy Superscalar is a processor architecture which delays the execution of fetched instr...
It is increasingly accepted that superscalar processors can only achieve their full performance pote...
One of the main obstacles to exploiting the fine-grained parallelism that is available in general-pu...
We present a technique for ameliorating the detrimental impact of the true data dependencies that ul...
The foremost goal of superscalar processor design is to increase performance through the exploitatio...
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic su...
If a high-performance superscalar processor is to realise its full potential, the complier must re-o...
Super-scalar processors can execute multiple instructions out-of-order per cycle and speculatively ...
Modern superscalar processors use wide instruction issue widths and out-of-order execution in order ...
The main aim of this short paper is to investigate multiple-instruction-issue in a high-performance ...
While delayed branch mechanisms were popular with the designers of RISC processors, most superscalar...
Lazy scheduling is a runtime scheduler for task-parallel codes that effectively coarsens parallelism...
Modern superscalar architectures with dynamic scheduling and register renaming capabilities have int...
Superscalar architecture resulting in aggressive performance is a proven architecture for general pu...
High performance superscalar microarchitectures exploit instruction-level parallelism (ILP) to impro...