As technology trends yield shorter cycle times and larger, wider datapaths in architectures for multimedia systems, global broadcast networks for operand communication are becoming a major bottleneck in processor performance. New low-latency operand transport techniques are needed. This paper proposes and evaluates lower cost mechanisms than traditional bypass networks, exploiting regular operand distribution patterns in multimedia applications. To reduce latency associated with operand movement within a datapath, our mechanism, called dynamic instruction clustering, groups chains of dependent instructions within a basic block at runtime, identifies intermediate value transportation, and schedules it on networked ALUs which are connected by...
Bypass delays are expected to grow beyond 1ns as technology scales. These delays necessitate pipelin...
Modern multicomputer interconnection networks offer the delivery of messages with very low latency. ...
Instruction Reuse is a microarchitectural technique that exploits dynamic instruction repetition to ...
Multimedia applications pose new challenges to computer architecture. Their tremendous communicatio...
The traditional VLIW (very long instruction word) architecture with a single register file does not ...
Abstract Modern processors rely heavily on broadcast networks to bypass instruction results todepend...
Recent works [14] show that delays introduced in the issue and bypass logic will become critical for...
As transistor feature sizes decrease exponentially, the critical problem in massively parallel archi...
To maximize the performance of wide-issue superscalar out-of-order microprocessors, the issue stage ...
Instruction reuse is a microarchitectural technique that improves the execution time of a program by...
Clustering is a technique to decentralize the design of future wide issue VLIW cores and enable them...
An important problem in instruction level parallel (ILP) machines is how to handle the many data tra...
textIncreasing bandwidth and decreasing latency are two orthogonal techniques for improving program...
which permits unrestricted use, distribution, and reproduction in any medium, provided the original ...
Clustering is a common technique to overcome the wire delay problem incurred by the evolution of tec...
Bypass delays are expected to grow beyond 1ns as technology scales. These delays necessitate pipelin...
Modern multicomputer interconnection networks offer the delivery of messages with very low latency. ...
Instruction Reuse is a microarchitectural technique that exploits dynamic instruction repetition to ...
Multimedia applications pose new challenges to computer architecture. Their tremendous communicatio...
The traditional VLIW (very long instruction word) architecture with a single register file does not ...
Abstract Modern processors rely heavily on broadcast networks to bypass instruction results todepend...
Recent works [14] show that delays introduced in the issue and bypass logic will become critical for...
As transistor feature sizes decrease exponentially, the critical problem in massively parallel archi...
To maximize the performance of wide-issue superscalar out-of-order microprocessors, the issue stage ...
Instruction reuse is a microarchitectural technique that improves the execution time of a program by...
Clustering is a technique to decentralize the design of future wide issue VLIW cores and enable them...
An important problem in instruction level parallel (ILP) machines is how to handle the many data tra...
textIncreasing bandwidth and decreasing latency are two orthogonal techniques for improving program...
which permits unrestricted use, distribution, and reproduction in any medium, provided the original ...
Clustering is a common technique to overcome the wire delay problem incurred by the evolution of tec...
Bypass delays are expected to grow beyond 1ns as technology scales. These delays necessitate pipelin...
Modern multicomputer interconnection networks offer the delivery of messages with very low latency. ...
Instruction Reuse is a microarchitectural technique that exploits dynamic instruction repetition to ...