International audienceAutotuning, the practice of automatic tuning of applications to provide performance portability, has received increased attention in the research community, especially in high performance computing. Ensuring high performance on a variety of hardware usually means modifications to the code, often via different values of a selected set of parameters, such as tiling size, loop unrolling factor or data layout. However, the search space of all possible combinations of these parameters can be large, which can result in cases where the benefits of autotuning are outweighed by its cost, especially with dynamic tuning. Therefore, estimating the tuning time in advance or shortening the tuning time is very important in dynamic tu...
This report documents the program and the outcomes of Dagstuhl Seminar 13401 "Automatic Application ...
GPUs have been used for years in compute intensive applications. Their massive parallel processing c...
In high-performance computing, excellent node-level performance is required for the efficient use of...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
We present a novel strategy for automatic performance tuning of GPU programs. The strategy combines ...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
Abstract—Autotuning systems intelligently navigate a search space of possible implementations of a c...
Over the last several decades we have witnessed tremendous change in the landscape of computer archi...
Achieving peak performance from library subroutines usually requires extensive, machine-dependent tu...
This paper presents an automated performance tuning solution, which partitions a program into a numb...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
This report documents the program and the outcomes of Dagstuhl Seminar 13401 "Automatic Application ...
GPUs have been used for years in compute intensive applications. Their massive parallel processing c...
In high-performance computing, excellent node-level performance is required for the efficient use of...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
We present a novel strategy for automatic performance tuning of GPU programs. The strategy combines ...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
Abstract—Autotuning systems intelligently navigate a search space of possible implementations of a c...
Over the last several decades we have witnessed tremendous change in the landscape of computer archi...
Achieving peak performance from library subroutines usually requires extensive, machine-dependent tu...
This paper presents an automated performance tuning solution, which partitions a program into a numb...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
This report documents the program and the outcomes of Dagstuhl Seminar 13401 "Automatic Application ...
GPUs have been used for years in compute intensive applications. Their massive parallel processing c...
In high-performance computing, excellent node-level performance is required for the efficient use of...