Peak performance metrics published by vendors often do not correspond to what can be achieved in practice. It is therefore of great interest to do extensive benchmarking on core applications and library routines. Since DGEMM is one of the most used in compute-intensive numerical codes, it is typically highly vendor optimized and of great interest for empirical benchmarks.In this paper we show how to build a novel tool that autotunes the benchmarking process for the Roofline model. Our novel approach can efficiently and reliably find optimal configurations for any target hardware. Results of our tool on a range of hardware architectures and comparisons to theoretical peak performance are included. Our tool autotunes the benchmarks for the ta...
In high-performance computing, excellent node-level performance is required for the efficient use of...
This paper describes a portfolio-based approach for model checking, i.e., an approach in which sever...
As computer architectures become more complex, the task of writing efficient program to best utilize...
Abstract New algorithms are constantly devel-oped in search of better or faster results. Many varian...
An autotuner takes a parameterized code as input and tries to optimize the code by finding the best ...
Achieving peak performance from the computational ker-nels that dominate application performance oft...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
Abstract—Autotuning systems intelligently navigate a search space of possible implementations of a c...
AbstractEmpirical performance optimization of computer codes using autotuners has received significa...
Achieving peak performance from library subroutines usually requires extensive, machine-dependent tu...
With respect to the continuous growth of computing systems, the energy-efficiency requirement of the...
UnrestrictedThe enormous and growing complexity of today's high-end systems has increased the alread...
Over the last several decades we have witnessed tremendous change in the landscape of computer archi...
Achieving peak performance from the computational kernels that dominate application performance ofte...
International audienceA large amount of resources is spent writing, porting, and optimizing scientif...
In high-performance computing, excellent node-level performance is required for the efficient use of...
This paper describes a portfolio-based approach for model checking, i.e., an approach in which sever...
As computer architectures become more complex, the task of writing efficient program to best utilize...
Abstract New algorithms are constantly devel-oped in search of better or faster results. Many varian...
An autotuner takes a parameterized code as input and tries to optimize the code by finding the best ...
Achieving peak performance from the computational ker-nels that dominate application performance oft...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
Abstract—Autotuning systems intelligently navigate a search space of possible implementations of a c...
AbstractEmpirical performance optimization of computer codes using autotuners has received significa...
Achieving peak performance from library subroutines usually requires extensive, machine-dependent tu...
With respect to the continuous growth of computing systems, the energy-efficiency requirement of the...
UnrestrictedThe enormous and growing complexity of today's high-end systems has increased the alread...
Over the last several decades we have witnessed tremendous change in the landscape of computer archi...
Achieving peak performance from the computational kernels that dominate application performance ofte...
International audienceA large amount of resources is spent writing, porting, and optimizing scientif...
In high-performance computing, excellent node-level performance is required for the efficient use of...
This paper describes a portfolio-based approach for model checking, i.e., an approach in which sever...
As computer architectures become more complex, the task of writing efficient program to best utilize...