Node-level performance is one of the factors that may limit applications from reaching the supercomputers' peak performance. Studying node-level performance and attributing it to the source code results into valuable insight that can be used to improve the application efficiency, albeit performing such a study may be an intimidating task due to the complexity and size of the applications. We present in this paper a mechanism that takes advantage of combining piece-wise linear regressions, coarse-grain sampling, and minimal instrumentation to detect performance phases in the computation regions even if their granularity is very fine. This mechanism then maps the performance of each phase into the application syntactical structure displaying ...
Performance prediction is necessary and crucial in order to deal with multi-dimensional performance ...
Improvements in performance and energy efficiency often require deep understanding of the complex in...
Applications may have unintended performance problems in spite of compiler optimizations, because of...
Node-level performance is one of the factors that may limit applications from reaching the supercomp...
As access to supercomputing resources is becoming more and more commonplace, performance analysis to...
Modern supercomputers deliver large computational power, but it is difficult for an application to e...
Performance evaluation tools enable analysts to shed light on how applications behave both from a ge...
Computer memory hierarchy becomes increasingly powerful but also more complex to optimize. Run-time...
In a single second a modern processor can execute billions of instructions. Obtaining a bird's eye ...
Measuring the performance of parallel codes is a compromise between lots of factors. The most import...
Software performance is considered a major concern when writing efficient code. In the past, develop...
Computers perform different applications in different ways. To characterize an application performan...
Most programs are repetitive, where similar behavior can be seen at different execution times. Algo...
Most performance debugging and tuning of parallel programs is based on the "measure-modify"...
Understanding program behavior is at the foundation of computer architecture and program optimizatio...
Performance prediction is necessary and crucial in order to deal with multi-dimensional performance ...
Improvements in performance and energy efficiency often require deep understanding of the complex in...
Applications may have unintended performance problems in spite of compiler optimizations, because of...
Node-level performance is one of the factors that may limit applications from reaching the supercomp...
As access to supercomputing resources is becoming more and more commonplace, performance analysis to...
Modern supercomputers deliver large computational power, but it is difficult for an application to e...
Performance evaluation tools enable analysts to shed light on how applications behave both from a ge...
Computer memory hierarchy becomes increasingly powerful but also more complex to optimize. Run-time...
In a single second a modern processor can execute billions of instructions. Obtaining a bird's eye ...
Measuring the performance of parallel codes is a compromise between lots of factors. The most import...
Software performance is considered a major concern when writing efficient code. In the past, develop...
Computers perform different applications in different ways. To characterize an application performan...
Most programs are repetitive, where similar behavior can be seen at different execution times. Algo...
Most performance debugging and tuning of parallel programs is based on the "measure-modify"...
Understanding program behavior is at the foundation of computer architecture and program optimizatio...
Performance prediction is necessary and crucial in order to deal with multi-dimensional performance ...
Improvements in performance and energy efficiency often require deep understanding of the complex in...
Applications may have unintended performance problems in spite of compiler optimizations, because of...