With energy-efficient architectures, including accelerators and many-core processors, gaining traction, application developers face the challenge of optimizing their applications for multiple hardware features including many-core parallelism, wide processing vector-units and on-chip high-bandwidth memory. In this paper, we discuss the development and utilization of a new application performance tool based on an extension of the classical roofline-model for simultaneously profiling multiple levels in the cache-memory hierarchy. This tool presents a powerful visual aid for the developer and can be used to frame the many-dimensional optimization problem in a tractable way. We show case studies of real scientific applications that have gained i...
We propose an easy-to-understand, visual performance model that offers insights to programmers and a...
"What Mathematics is to Physics, Data traversal is to High-performance computing." The world of Comp...
We present preliminary results of theRooflineToolkit formulticore, manycore, and accelerated archite...
Manufacturers will likely offer multiple products with differing numbers of cores to cover multiple ...
The Roofline model offers insight on how to improve the performance of software and hardware
This article consists of a collection of slides from the authors' conference presentation. The Roofl...
The end of Dennard scaling signaled a shift in HPC supercomputer architectures from systems built fr...
Understanding the performance of applications on modern multi- and manycore platforms is a difficult...
The overarching goal of this thesis is to provide an algorithm-centric approach to analyzing the rel...
In this session we show, in two case studies, how the roofline feature of Intel Advisor has been uti...
International audienceThe ever growing complexity of high performance computing systems imposes sign...
International audienceThe roofline model is a popular approach to ``bounds and bottleneck''performan...
thesisTo address the need of understanding and optimizing the performance of complex applications an...
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016...
The expeditious proliferation of Internet connectivity and the growing adoption of digital products ...
We propose an easy-to-understand, visual performance model that offers insights to programmers and a...
"What Mathematics is to Physics, Data traversal is to High-performance computing." The world of Comp...
We present preliminary results of theRooflineToolkit formulticore, manycore, and accelerated archite...
Manufacturers will likely offer multiple products with differing numbers of cores to cover multiple ...
The Roofline model offers insight on how to improve the performance of software and hardware
This article consists of a collection of slides from the authors' conference presentation. The Roofl...
The end of Dennard scaling signaled a shift in HPC supercomputer architectures from systems built fr...
Understanding the performance of applications on modern multi- and manycore platforms is a difficult...
The overarching goal of this thesis is to provide an algorithm-centric approach to analyzing the rel...
In this session we show, in two case studies, how the roofline feature of Intel Advisor has been uti...
International audienceThe ever growing complexity of high performance computing systems imposes sign...
International audienceThe roofline model is a popular approach to ``bounds and bottleneck''performan...
thesisTo address the need of understanding and optimizing the performance of complex applications an...
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016...
The expeditious proliferation of Internet connectivity and the growing adoption of digital products ...
We propose an easy-to-understand, visual performance model that offers insights to programmers and a...
"What Mathematics is to Physics, Data traversal is to High-performance computing." The world of Comp...
We present preliminary results of theRooflineToolkit formulticore, manycore, and accelerated archite...