Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Processing Units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU program (kernel) is challenging, and generally only certain specific kernel configurations lead to significant increases in performance. Auto-tuning is the process of automatically optimizing software for highly-efficient execution on a target hardware platform. Auto-tuning is particularly useful for GPU programming, as a single kernel requires re-tuning after code changes, for different input data, and for different architectures. However, the discrete, and non-convex nature of the search space creates a c...
Finding optimal parameter configurations for tunable GPU kernels is a non-Trivial exercise for large...
We present a novel strategy for automatic performance tuning of GPU programs. The strategy combines ...
International audienceMany computationally-intensive algorithms benefit from the wide parallelism of...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
Graphics Processing Units (GPUs) have revolutionized the computing landscape over the past decade. H...
Writing high performance GPGPU code is often difficult and time-consuming, potentially requiring lab...
International audienceIn this paper, we develop an approach to GPU kernel optimization by focusing o...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
In high-performance computing, excellent node-level performance is required for the efficient use of...
International audienceThis article proposes an online auto-tuning approach for computing kernels. Di...
Finding optimal parameter configurations for tunable GPU kernels is a non-Trivial exercise for large...
We present a novel strategy for automatic performance tuning of GPU programs. The strategy combines ...
International audienceMany computationally-intensive algorithms benefit from the wide parallelism of...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
Graphics Processing Units (GPUs) have revolutionized the computing landscape over the past decade. H...
Writing high performance GPGPU code is often difficult and time-consuming, potentially requiring lab...
International audienceIn this paper, we develop an approach to GPU kernel optimization by focusing o...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
In high-performance computing, excellent node-level performance is required for the efficient use of...
International audienceThis article proposes an online auto-tuning approach for computing kernels. Di...
Finding optimal parameter configurations for tunable GPU kernels is a non-Trivial exercise for large...
We present a novel strategy for automatic performance tuning of GPU programs. The strategy combines ...
International audienceMany computationally-intensive algorithms benefit from the wide parallelism of...