In this article, we present an approach for improv-ing the performance of sequences of dependent instruc-tions. We observe that many sequences of instructions can be interpreted as functions. Unlike sequences of in-structions, functions can be translated into very fast but exponentially costly two-level combinational circuits. We present an approach that exploits this principle, speeds up programs thanks to circuit-level parallelism/redundancy, but avoids the exponential costs. We analyze the potential of this approach, and then we propose an implementation that consists of a super-scalar processor with a large specic functional unit as-sociated with specic back-end transformations. The per-formance of the SpecInt2000 benchmarks and selecte...
To provide high performance at practical power levels, tomorrow’s chips will have to consist primari...
A traditional extensible processor with customized circuits achieves high performance at the cost of...
In trace processors, a sequential program is partitioned at run time into "traces." A tra...
The end of Dennard scaling leads to new research directions that try to cope with the utilization wa...
This dissertation demonstrates that through the careful application of hardware and software techniq...
We propose a Domain-Specific Architecture for elementary function computation to improve throughput ...
In code generation, instruction selection chooses processor instructions to implement a program unde...
has emphasized instruction-level parallelism, which improves performance by increasing the number of...
We present a technique for ameliorating the detrimental impact of the true data dependencies that ul...
A common approach to enhance the performance of processors is to increase the number of function uni...
Original article can be found at: http://www.sciencedirect.com/science/journal/13837621 Copyright El...
Many of the current applications used in battery powered devices are from digital signal processing,...
Performance bounds represent the best achievable performance that can be delivered by target microar...
We advocate using performance bounds to guide code optimizations. Accurate performance bounds establ...
Hardware specialization has received renewed interest recently as chips are hitting power limits. Ch...
To provide high performance at practical power levels, tomorrow’s chips will have to consist primari...
A traditional extensible processor with customized circuits achieves high performance at the cost of...
In trace processors, a sequential program is partitioned at run time into "traces." A tra...
The end of Dennard scaling leads to new research directions that try to cope with the utilization wa...
This dissertation demonstrates that through the careful application of hardware and software techniq...
We propose a Domain-Specific Architecture for elementary function computation to improve throughput ...
In code generation, instruction selection chooses processor instructions to implement a program unde...
has emphasized instruction-level parallelism, which improves performance by increasing the number of...
We present a technique for ameliorating the detrimental impact of the true data dependencies that ul...
A common approach to enhance the performance of processors is to increase the number of function uni...
Original article can be found at: http://www.sciencedirect.com/science/journal/13837621 Copyright El...
Many of the current applications used in battery powered devices are from digital signal processing,...
Performance bounds represent the best achievable performance that can be delivered by target microar...
We advocate using performance bounds to guide code optimizations. Accurate performance bounds establ...
Hardware specialization has received renewed interest recently as chips are hitting power limits. Ch...
To provide high performance at practical power levels, tomorrow’s chips will have to consist primari...
A traditional extensible processor with customized circuits achieves high performance at the cost of...
In trace processors, a sequential program is partitioned at run time into "traces." A tra...