Dynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-predict branch instructions. A recently proposed dynamic predication architecture, the diverge-merge processor (DMP), provides large performance improvements by dynamically predicating a large set of complex control-flow graphs that result in branch mispredictions. DMP requires significant support from a profiling compiler to determine which branch instructions and control-flow structures can be dynamically predicated. However, previous work on dynamic predication did not extensively examine the tradeoffs involved in profiling and code generation for dynamic predication architectures. This paper describes compiler support for obtaining high perfo...
Static analysis requires the full knowledge of the overall program structure. The structure of a pro...
Pipeline stalls due to branches represent one of the most significant impediments to realizing the p...
Abstract Profile-based optimizations can be used for instruction scheduling, loop scheduling, data p...
This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-...
textEven after decades of research in branch prediction, branch predictors still remain imperfect, w...
Conditional branches are expensive. Branches require a significant percentage of execution cycles si...
Abstract—Data-parallel architectures must provide efficient support for complex control-flow constru...
Irregular control-flow structures like deeply nested conditional branches are common in real-world s...
High performance microprocessors have relied on accurate branch predictors to maintain high instruct...
Conditional branches are expensive. Branches require a significant percentage of execution cycles si...
The paper describes the design and implementation of an adaptive recompilation framework for Rotor, ...
Dynamic compilers perform a wealth of optimizations to improve the performance of the generated mach...
International audienceThe increasing popularity of Graphics Processing Units (GPUs), has brought ren...
The energy costs of data movement are limiting the performance scaling of future generations of high...
Traditional compilers rely on static information about programs to perform optimizations. While such...
Static analysis requires the full knowledge of the overall program structure. The structure of a pro...
Pipeline stalls due to branches represent one of the most significant impediments to realizing the p...
Abstract Profile-based optimizations can be used for instruction scheduling, loop scheduling, data p...
This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-...
textEven after decades of research in branch prediction, branch predictors still remain imperfect, w...
Conditional branches are expensive. Branches require a significant percentage of execution cycles si...
Abstract—Data-parallel architectures must provide efficient support for complex control-flow constru...
Irregular control-flow structures like deeply nested conditional branches are common in real-world s...
High performance microprocessors have relied on accurate branch predictors to maintain high instruct...
Conditional branches are expensive. Branches require a significant percentage of execution cycles si...
The paper describes the design and implementation of an adaptive recompilation framework for Rotor, ...
Dynamic compilers perform a wealth of optimizations to improve the performance of the generated mach...
International audienceThe increasing popularity of Graphics Processing Units (GPUs), has brought ren...
The energy costs of data movement are limiting the performance scaling of future generations of high...
Traditional compilers rely on static information about programs to perform optimizations. While such...
Static analysis requires the full knowledge of the overall program structure. The structure of a pro...
Pipeline stalls due to branches represent one of the most significant impediments to realizing the p...
Abstract Profile-based optimizations can be used for instruction scheduling, loop scheduling, data p...