We have developed several autotuning benchmarks in CUDA that take into account performance-relevant source-code parameters and reach near peak-performance on various GPU architectures. We have used them during the development and evaluation of a novel search method for tuning space. With our framework Kernel Tuning Toolkit, freely available at Github, we measured computation times and hardware performance counters on several GPUs for the complete tuning spaces of five benchmarks. These data, which we provide here, might benefit research of search algorithms for the tuning spaces of GPU codes or research of relation between applied code optimization, hardware performance counters, and GPU kernels' performance. Moreover, we provide the scrip...
Abstract—Autotuning systems intelligently navigate a search space of possible implementations of a c...
GPUs have been used for years in compute intensive applications. Their massive parallel processing c...
An autotuner takes a parameterized code as input and tries to optimize the code by finding the best ...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
International audienceAutotuning, the practice of automatic tuning of applications to provide perfor...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
We present a novel strategy for automatic performance tuning of GPU programs. The strategy combines ...
Optimal performance is an important goal in compute intensive applications. For GPU applications, th...
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
Writing high performance GPGPU code is often difficult and time-consuming, potentially requiring lab...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Abstract—Autotuning systems intelligently navigate a search space of possible implementations of a c...
GPUs have been used for years in compute intensive applications. Their massive parallel processing c...
An autotuner takes a parameterized code as input and tries to optimize the code by finding the best ...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
International audienceAutotuning, the practice of automatic tuning of applications to provide perfor...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
We present a novel strategy for automatic performance tuning of GPU programs. The strategy combines ...
Optimal performance is an important goal in compute intensive applications. For GPU applications, th...
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
Writing high performance GPGPU code is often difficult and time-consuming, potentially requiring lab...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Abstract—Autotuning systems intelligently navigate a search space of possible implementations of a c...
GPUs have been used for years in compute intensive applications. Their massive parallel processing c...
An autotuner takes a parameterized code as input and tries to optimize the code by finding the best ...