We present a simple technique for instruction-level parallelism and analyze its performance impact. Our processor architecture economically encodes two instructions, one ALU and one load/store, into a single 32-bit instruction word. Using an existing RISC processor design as a starting point, we detail the instruction set, the pipeline design, and scheduling techniques. Implementation should require little or no additional hardware over the scalar processor, and may require less. Simulation results show up to 13% improvement in program execution time. Corresponding author. The work described in this paper was performed at the University of Western Ontaro. y Michael Bennett is a professor in the Computer Science Department, University of...
dataflow processors, superscalar processors, instruction scheduling, trace scheduling, software pipe...
We present a technique for ameliorating the detrimental impact of the true data dependencies that ul...
has emphasized instruction-level parallelism, which improves performance by increasing the number of...
The main aim of this short paper is to investigate multiple-instruction-issue in a high-performance ...
Superscalar architectural techniques increase instruction throughput by increasing resources and usi...
A great deal of the current research into computer architecture is directed at Multiple Instruction ...
If a high-performance superscalar processor is to realise its full potential, the complier must re-o...
Superscalar processing is the latest in a long series of innovations aimed at producing ever-faster ...
This paper describes a novel processor architecture, called hyperscalar processor architecture, whic...
This dissertation demonstrates that through the careful application of hardware and software techniq...
Superscalar and VLIW processors can both execute multiple instructions each cycle. Each employs a di...
DS is a new microarchitecture that combines decoupled (DAE) and superscalar techniques to exploit in...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
To maximize the performance of wide-issue superscalar out-of-order microprocessors, the issue stage ...
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities...
dataflow processors, superscalar processors, instruction scheduling, trace scheduling, software pipe...
We present a technique for ameliorating the detrimental impact of the true data dependencies that ul...
has emphasized instruction-level parallelism, which improves performance by increasing the number of...
The main aim of this short paper is to investigate multiple-instruction-issue in a high-performance ...
Superscalar architectural techniques increase instruction throughput by increasing resources and usi...
A great deal of the current research into computer architecture is directed at Multiple Instruction ...
If a high-performance superscalar processor is to realise its full potential, the complier must re-o...
Superscalar processing is the latest in a long series of innovations aimed at producing ever-faster ...
This paper describes a novel processor architecture, called hyperscalar processor architecture, whic...
This dissertation demonstrates that through the careful application of hardware and software techniq...
Superscalar and VLIW processors can both execute multiple instructions each cycle. Each employs a di...
DS is a new microarchitecture that combines decoupled (DAE) and superscalar techniques to exploit in...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
To maximize the performance of wide-issue superscalar out-of-order microprocessors, the issue stage ...
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities...
dataflow processors, superscalar processors, instruction scheduling, trace scheduling, software pipe...
We present a technique for ameliorating the detrimental impact of the true data dependencies that ul...
has emphasized instruction-level parallelism, which improves performance by increasing the number of...