Programs that are not loop intensive and which have small basic blocks present a challenge to architectures that rely on instruction level parallelism to deliver high performance. This paper presents a heuristic for moving operations across basic block boundaries based on profile information gathered from previous executions of a program. Architectural support is required for executing these operations speculatively. Performance improvements on a heapsort routine are discussed for a Very Long Instruction Word (VLIW) machine model, using two different sets of operation latencies. Experiments were done both at the assembly code level and on the intermediate representation produced by the compiler. The results show a speedup of about two over ...
We advocate using performance bounds to guide code optimizations. Accurate performance bounds establ...
This dissertation demonstrates that through the careful application of hardware and software techniq...
AbstractWe analyse the capacity of different running models to benefit from the Instruction-Level Pa...
The quality of synthesis results for most, high-level synthesis approaches is strongly affected by t...
High performance computer architectures increasingly use compile-time instruction scheduling to reor...
instruction-level parallelism, compilers, VLIW, superscalar, code generation Trace Scheduling-2 is a...
In the high-level synthesis of ASICs or in the code generation for ASIPs, the presence of conditiona...
To achieve high-performance on processors featuring ILP, most compilers apply locally a set of heuri...
In code generation, instruction selection chooses processor instructions to implement a program unde...
this paper, we emphasize the practicality of lazy code motion by giving explicit directions for its ...
Performance bounds represent the best achievable performance that can be delivered by target microar...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
Developing efficient programs for many of the current parallel computers is not easy due to the arch...
Code scheduling to exploit instruction level parallelism (ILP) is a critical problem in compiler opt...
Code motion is well-known as a powerful technique for the optimization of sequential programs. It im...
We advocate using performance bounds to guide code optimizations. Accurate performance bounds establ...
This dissertation demonstrates that through the careful application of hardware and software techniq...
AbstractWe analyse the capacity of different running models to benefit from the Instruction-Level Pa...
The quality of synthesis results for most, high-level synthesis approaches is strongly affected by t...
High performance computer architectures increasingly use compile-time instruction scheduling to reor...
instruction-level parallelism, compilers, VLIW, superscalar, code generation Trace Scheduling-2 is a...
In the high-level synthesis of ASICs or in the code generation for ASIPs, the presence of conditiona...
To achieve high-performance on processors featuring ILP, most compilers apply locally a set of heuri...
In code generation, instruction selection chooses processor instructions to implement a program unde...
this paper, we emphasize the practicality of lazy code motion by giving explicit directions for its ...
Performance bounds represent the best achievable performance that can be delivered by target microar...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
Developing efficient programs for many of the current parallel computers is not easy due to the arch...
Code scheduling to exploit instruction level parallelism (ILP) is a critical problem in compiler opt...
Code motion is well-known as a powerful technique for the optimization of sequential programs. It im...
We advocate using performance bounds to guide code optimizations. Accurate performance bounds establ...
This dissertation demonstrates that through the careful application of hardware and software techniq...
AbstractWe analyse the capacity of different running models to benefit from the Instruction-Level Pa...