Optimal performance is an important goal in compute intensive applications. For GPU applications, this requires a lot of experience and knowledge about the algorithms and the underlying hardware, making them an ideal target for auto-tuning approaches. We present an auto-tuner which optimizes array layouts in CUDA applications. Depending on the data and program parameters, kernels can have varying optimal configurations. We thus adjust array layouts adaptively at runtime and achieve or even exceed performance of hand optimized code. We automatically detect data characteristics to identify different performance scenarios without user input or additional programming. We perform an empirical analysis of the application in order to construct our...
Abstract. Autotuning is an established technique for adjusting perfor-mance-critical parameters of a...
The increasing programmability, performance, and cost/effectiveness of GPUs have led to a widespread...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
GPUs have been used for years in compute intensive applications. Their massive parallel processing c...
The continuing evolution of Graphics Processing Units (GPU) has shown rapid performance increases ov...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
In high-performance computing, excellent node-level performance is required for the efficient use of...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Memory optimizations have became increasingly important in order to fully exploit the computational ...
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
Abstract. Autotuning is an established technique for adjusting perfor-mance-critical parameters of a...
The increasing programmability, performance, and cost/effectiveness of GPUs have led to a widespread...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
GPUs have been used for years in compute intensive applications. Their massive parallel processing c...
The continuing evolution of Graphics Processing Units (GPU) has shown rapid performance increases ov...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant ...
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I ...
In high-performance computing, excellent node-level performance is required for the efficient use of...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Memory optimizations have became increasingly important in order to fully exploit the computational ...
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
Abstract. Autotuning is an established technique for adjusting perfor-mance-critical parameters of a...
The increasing programmability, performance, and cost/effectiveness of GPUs have led to a widespread...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...