CMOS technology scaling poses challenges in designing dynamically scheduled cores that can sustain both high instruction-level parallelism and aggressive clock frequencies. In this paper, we present a new architecture that maps compiler-scheduled blocks onto a two-dimensional grid of ALUs. For the mapped window of execution, instructions execute in a dataflow-like manner, with each ALU forwarding its result along short wires to the consumers of the result. We describe our studies of program behavior and a preliminary evaluation that show that this architecture has the potential for both high clock speeds and high ILP, and may offer the best of both the VLIW and dynamic superscalar architectures
Modern processors employ a large amount of hardware to dynamically detect parallelism in single-thre...
. Transport-triggered architectures are a new class of architectures that provide more scheduling fr...
Abstract. Increasing on-chip wire delay along with the distributed nature of processing elements, ma...
Advances in IC technology increase the integration density for higher clock rates and provide more o...
Modern superscalar processors use wide instruction issue widths and out-of-order execution in order ...
Very long instruction word (VLIW) machines potentially provide the most direct way to exploit Instru...
Advances in VLSI technology will enable chips with over a billion transistors within the next decade...
Static multi-issue machines, such as traditional Very Long Instructional Word (VLIW) architectures, ...
dataflow processors, superscalar processors, instruction scheduling, trace scheduling, software pipe...
To achieve performance, Explicitly Parallel Instruction Computing (EPIC) systems take the responsibi...
This work examines the interaction of compiler scheduling techniques with processor features such as...
We present a simple technique for instruction-level parallelism and analyze its performance impact. ...
Superscalar architectural techniques increase instruction throughput by increasing resources and usi...
A common approach to enhance the performance of processors is to increase the number of function uni...
The foremost goal of superscalar processor design is to increase performance through the exploitatio...
Modern processors employ a large amount of hardware to dynamically detect parallelism in single-thre...
. Transport-triggered architectures are a new class of architectures that provide more scheduling fr...
Abstract. Increasing on-chip wire delay along with the distributed nature of processing elements, ma...
Advances in IC technology increase the integration density for higher clock rates and provide more o...
Modern superscalar processors use wide instruction issue widths and out-of-order execution in order ...
Very long instruction word (VLIW) machines potentially provide the most direct way to exploit Instru...
Advances in VLSI technology will enable chips with over a billion transistors within the next decade...
Static multi-issue machines, such as traditional Very Long Instructional Word (VLIW) architectures, ...
dataflow processors, superscalar processors, instruction scheduling, trace scheduling, software pipe...
To achieve performance, Explicitly Parallel Instruction Computing (EPIC) systems take the responsibi...
This work examines the interaction of compiler scheduling techniques with processor features such as...
We present a simple technique for instruction-level parallelism and analyze its performance impact. ...
Superscalar architectural techniques increase instruction throughput by increasing resources and usi...
A common approach to enhance the performance of processors is to increase the number of function uni...
The foremost goal of superscalar processor design is to increase performance through the exploitatio...
Modern processors employ a large amount of hardware to dynamically detect parallelism in single-thre...
. Transport-triggered architectures are a new class of architectures that provide more scheduling fr...
Abstract. Increasing on-chip wire delay along with the distributed nature of processing elements, ma...