International audienceThis article proposes an online auto-tuning approach for computing kernels. Differently from existing online auto-tuners, which regenerate code with long compilation chains from the source to the binary code, our approach consists on deploying auto-tuning directly at the level of machine code generation. This allows auto-tuning to pay off in very short-running applications. As a proof of concept, our approach is demonstrated in two benchmarks, which execute during hundreds of milliseconds to a few seconds only. In a CPU-bound kernel, the speedups achieved are 1.10 to 1.58 in average depending on the target micro-architecture, up to 2.53 in the most favourable conditions (all run-time overheads included). In a memory-bo...
Over the last several decades we have witnessed tremendous change in the landscape of computer archi...
International audienceComputing systems rarely deliver best possible performance due to ever increas...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
International audienceThis article proposes an online auto-tuning approach for computing kernels. Di...
In high-performance computing, excellent node-level performance is required for the efficient use of...
The recent transformation from an environment where gains in computational performance came from inc...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
This paper presents an automated performance tuning solution, which partitions a program into a numb...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Sparse kernel performance depends on both the matrix and hardware platform. � Challenges in tuning s...
The primary reason for performing compiler optimizations before running the program is that they are...
Modern high performance libraries, such as ATLAS and FFTW, and programming languages, such as PetaBr...
Over the last several decades we have witnessed tremendous change in the landscape of computer archi...
International audienceComputing systems rarely deliver best possible performance due to ever increas...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
International audienceThis article proposes an online auto-tuning approach for computing kernels. Di...
In high-performance computing, excellent node-level performance is required for the efficient use of...
The recent transformation from an environment where gains in computational performance came from inc...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
This paper presents an automated performance tuning solution, which partitions a program into a numb...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Sparse kernel performance depends on both the matrix and hardware platform. � Challenges in tuning s...
The primary reason for performing compiler optimizations before running the program is that they are...
Modern high performance libraries, such as ATLAS and FFTW, and programming languages, such as PetaBr...
Over the last several decades we have witnessed tremendous change in the landscape of computer archi...
International audienceComputing systems rarely deliver best possible performance due to ever increas...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...