The Roofline performance model provides an intuitive approach to identify performance bottlenecks and guide performance optimization. However, the classic FLOP-centric approach is inappropriate for emerging applications that perform more integer operations than floating-point operations. In this paper, we propose an Instruction Roofline Model on NVIDIA GPUs. The Instruction Roofline incorporates instructions and memory transactions across all memory hierarchies together and provides more performance insights than the FLOP-oriented Roofline Model, i.e., instruction throughput, stride memory access patterns, bank conflicts, and thread predication. We use our Instruction Roofline methodology to analyze five proxy applications: HPGMG from AMReX...
GPUs are gaining fast adoption as high-performance computing architectures, mainly because of their ...
This article consists of a collection of slides from the authors' conference presentation. The Roofl...
Abstract — GPU has become a first-order computing plat-form. Nonetheless, not many performance model...
The Roofline performance model provides an intuitive approach to identify performance bottlenecks an...
The Roofline performance model provides an intuitive approach to identify performance bottlenecks an...
Performance analysis is a daunting job, especially for the rapid-evolving accelerator technologies. ...
We develop a microbenchmark-based performance model for NVIDIA GeForce 200-series GPUs. Our model id...
Sparse problems arise from a variety of applications, from scientific simulations to graph analytics...
Sparse problems arise from a variety of applications, from scientific simulations to graph analytics...
Manufacturers will likely offer multiple products with differing numbers of cores to cover multiple ...
Computing systems today rely on massively parallel and heterogeneous architectures to promise very h...
High-level tools for analyzing and predicting the performance GPU-accelerated applications are scarc...
The significant growth in computational power of mod-ern Graphics Processing Units(GPUs) coupled wit...
The relentless demands for improvements in the compute throughput, and energy efficiency have driven...
We propose an easy-to-understand, visual performance model that offers insights to programmers and a...
GPUs are gaining fast adoption as high-performance computing architectures, mainly because of their ...
This article consists of a collection of slides from the authors' conference presentation. The Roofl...
Abstract — GPU has become a first-order computing plat-form. Nonetheless, not many performance model...
The Roofline performance model provides an intuitive approach to identify performance bottlenecks an...
The Roofline performance model provides an intuitive approach to identify performance bottlenecks an...
Performance analysis is a daunting job, especially for the rapid-evolving accelerator technologies. ...
We develop a microbenchmark-based performance model for NVIDIA GeForce 200-series GPUs. Our model id...
Sparse problems arise from a variety of applications, from scientific simulations to graph analytics...
Sparse problems arise from a variety of applications, from scientific simulations to graph analytics...
Manufacturers will likely offer multiple products with differing numbers of cores to cover multiple ...
Computing systems today rely on massively parallel and heterogeneous architectures to promise very h...
High-level tools for analyzing and predicting the performance GPU-accelerated applications are scarc...
The significant growth in computational power of mod-ern Graphics Processing Units(GPUs) coupled wit...
The relentless demands for improvements in the compute throughput, and energy efficiency have driven...
We propose an easy-to-understand, visual performance model that offers insights to programmers and a...
GPUs are gaining fast adoption as high-performance computing architectures, mainly because of their ...
This article consists of a collection of slides from the authors' conference presentation. The Roofl...
Abstract — GPU has become a first-order computing plat-form. Nonetheless, not many performance model...