International audienceWe present a methodology for precision tuning of full applications. These techniques must select a search space composed of either variables or instructions and provide a scalable search strategy. In full application settings one cannot assume compiler support for practical reasons. Thus, an additional important challenge is enabling code refactoring. We argue for an instruction-based search space and we show: 1) how to exploit dynamic program information based on call stacks; and 2) how to exploit the iterative nature of scientific codes, combined with temporal locality. We applied the methodology to tune the implementation of scientific codes written in a combination of Python, CUDA, C++ and Fortran, tuning calls to ...
AbstractWe present a dynamic method for tuning algorithmic parameters of parallel scientific program...
UnrestrictedThe enormous and growing complexity of today's high-end systems has increased the alread...
Achieving peak performance from the computational ker-nels that dominate application performance oft...
While tremendously useful, automated techniques for tuning the precision of floating-point programs ...
Given the variety of numerical errors that can occur, floating-point programs are difficult to write...
For scientific array-based programs, optimization for a particular target platform is a hard problem...
In memory hierarchies, programs can be speeded up by increasing their degree of locality. This paper...
The aggressive optimization of floating-point computations is an important problem in high-performan...
Search schemes constitute a flexible and generic framework to describe how all approximate occurrenc...
Value locality is the phenomenon that a small number of values occur repeatedly in the same register...
Approximating ideal program outputs is a common technique for solving computationally difficult prob...
It has long been known that the quality of the code produced by an optimizing compiler is dependent ...
Abstract. Real-time heuristic search algorithms are useful when the amount of time or memory resourc...
This article aims at making iterative optimization practical and usable by speeding up the evaluatio...
Abstract. Tuning stochastic local search algorithms for tackling large instances is difficult due to...
AbstractWe present a dynamic method for tuning algorithmic parameters of parallel scientific program...
UnrestrictedThe enormous and growing complexity of today's high-end systems has increased the alread...
Achieving peak performance from the computational ker-nels that dominate application performance oft...
While tremendously useful, automated techniques for tuning the precision of floating-point programs ...
Given the variety of numerical errors that can occur, floating-point programs are difficult to write...
For scientific array-based programs, optimization for a particular target platform is a hard problem...
In memory hierarchies, programs can be speeded up by increasing their degree of locality. This paper...
The aggressive optimization of floating-point computations is an important problem in high-performan...
Search schemes constitute a flexible and generic framework to describe how all approximate occurrenc...
Value locality is the phenomenon that a small number of values occur repeatedly in the same register...
Approximating ideal program outputs is a common technique for solving computationally difficult prob...
It has long been known that the quality of the code produced by an optimizing compiler is dependent ...
Abstract. Real-time heuristic search algorithms are useful when the amount of time or memory resourc...
This article aims at making iterative optimization practical and usable by speeding up the evaluatio...
Abstract. Tuning stochastic local search algorithms for tackling large instances is difficult due to...
AbstractWe present a dynamic method for tuning algorithmic parameters of parallel scientific program...
UnrestrictedThe enormous and growing complexity of today's high-end systems has increased the alread...
Achieving peak performance from the computational ker-nels that dominate application performance oft...