Abstract—Autotuning systems intelligently navigate a search space of possible implementations of a computation to find the implementation(s) that best meets a specific optimization criteria, usually performance. This paper describes Nitro, a programmer-directed autotuning framework that facilitates tuning of code variants, or alternative implementations of the same computation. Nitro provides a library interface that permits programmers to express code variants along with meta-information that aids the system in selecting among the set of variants at run time. Machine learning is employed to build a model through training on this meta-information, so that when a new input is presented, Nitro can consult the model to select the appropriate v...
Achieving peak performance from the computational ker-nels that dominate application performance oft...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
dissertationEmerging trends such as growing architectural diversity and increased emphasis on energy...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
Efficient large-scale scientific computing requires efficient code, yet optimizing code to render it...
We present a novel strategy for automatic performance tuning of GPU programs. The strategy combines ...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Writing high performance GPGPU code is often difficult and time-consuming, potentially requiring lab...
In high-performance computing, excellent node-level performance is required for the efficient use of...
The end of Moore's Law and the breakdown of Dennard's scaling mean thatincreasing hardware ...
AbstractEmpirical performance optimization of computer codes using autotuners has received significa...
Achieving peak performance from the computational ker-nels that dominate application performance oft...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
dissertationEmerging trends such as growing architectural diversity and increased emphasis on energy...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
Efficient large-scale scientific computing requires efficient code, yet optimizing code to render it...
We present a novel strategy for automatic performance tuning of GPU programs. The strategy combines ...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Writing high performance GPGPU code is often difficult and time-consuming, potentially requiring lab...
In high-performance computing, excellent node-level performance is required for the efficient use of...
The end of Moore's Law and the breakdown of Dennard's scaling mean thatincreasing hardware ...
AbstractEmpirical performance optimization of computer codes using autotuners has received significa...
Achieving peak performance from the computational ker-nels that dominate application performance oft...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...