Loops are the main time consuming part of programs based on floating point computations. The performance of the loops is limited either by recurrences in the computation or by the resources offered by the architecture. Several general-purpose superscalar microprocessors have been implemented with multiply-add fused floating-point units, that reduces the latency of the combined operation and the number of resources used. This paper analyses the influence of these two factors in the instruction-level parallelism exploitable from loops executed on a broad set of future aggressive processor configurations. The estimation of implementation costs (area and cycle time) enables a fair comparison of these configurations in terms of final performance...
textFloating-point computer arithmetic units are used for modern-day computers for 2D/3D graphic and...
This dissertation demonstrates that through the careful application of hardware and software techniq...
textMost general purpose processors (GPP) and application specific processors (ASP) use the floating...
Loops are the main time consuming part of programs based on floating point computations. The perform...
Abstract—Multiply-add operations form a crucial part of many digital signal processing and control e...
Abstract: Floating-point unit is an integral part of any modern microprocessor. The fused multiply ...
Current high-performance floating-point microprocessors try to maximize the exploitable parallelism ...
Architectural resources and program recurrences are themain limitations to the amount of Instruction...
Abstract — Architectural resources and program recurrences are the main limitations to the amount of...
dataflow processors, superscalar processors, instruction scheduling, trace scheduling, software pipe...
Many scenarios demand a high processing power often combined with a limited energy budget. A way to ...
Embedded systems require maximum performance from a processor within significant constraints in powe...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
The inherent instruction-level parallelism (ILP) of current applications (specially those based on f...
In this paper, a high speed Arithmetic synthesizable Fused Multiply Add Unit (FMA) is modeled capabl...
textFloating-point computer arithmetic units are used for modern-day computers for 2D/3D graphic and...
This dissertation demonstrates that through the careful application of hardware and software techniq...
textMost general purpose processors (GPP) and application specific processors (ASP) use the floating...
Loops are the main time consuming part of programs based on floating point computations. The perform...
Abstract—Multiply-add operations form a crucial part of many digital signal processing and control e...
Abstract: Floating-point unit is an integral part of any modern microprocessor. The fused multiply ...
Current high-performance floating-point microprocessors try to maximize the exploitable parallelism ...
Architectural resources and program recurrences are themain limitations to the amount of Instruction...
Abstract — Architectural resources and program recurrences are the main limitations to the amount of...
dataflow processors, superscalar processors, instruction scheduling, trace scheduling, software pipe...
Many scenarios demand a high processing power often combined with a limited energy budget. A way to ...
Embedded systems require maximum performance from a processor within significant constraints in powe...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
The inherent instruction-level parallelism (ILP) of current applications (specially those based on f...
In this paper, a high speed Arithmetic synthesizable Fused Multiply Add Unit (FMA) is modeled capabl...
textFloating-point computer arithmetic units are used for modern-day computers for 2D/3D graphic and...
This dissertation demonstrates that through the careful application of hardware and software techniq...
textMost general purpose processors (GPP) and application specific processors (ASP) use the floating...