Abstract—Applications must scale well to make efficient use of today’s class of petascale computers, which contain hundreds of thousands of processor cores. Inefficiencies that do not even appear in modest-scale executions can become major bottlenecks in large-scale executions. Because scaling problems are often difficult to diagnose, there is a critical need for scalable tools that guide scientists to the root causes of scaling problems. Load imbalance is one of the most common scaling problems. To provide actionable insight into load imbalance, we present post-mortem parallel analysis techniques for pinpointing and quantifying load imbalance in the context of call path profiles of parallel programs. We show how to identify load imbalance ...
Scientific applications will have to scale to many thousands of processor cores to reach petascale. ...
High-performance computing systems have become increasingly dynamic, complex, and unpredictable. To ...
Load balance is critical for performance in large parallel applica-tions. An imbalance on today’s fa...
The amount of parallelism in modern supercomputers currently grows from generation to generation, an...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
Applications must scale well to make efficient use of even medium-scale parallel systems. Because sc...
Driven by growing application requirements and accelerated by current trends in microprocessor desig...
Supercomputers play a key role in countless areas of science and engineering, enabling the developme...
Cutting-edge science and engineering applications require petascale computing. Petascale computing p...
We present a new technique for identifying scalability bottle-necks in executions of single-program,...
With the ubiquity of multi-core processors, software must make effective use of multiple cores to ob...
Identifying performance bottlenecks and their associated calling contexts is critical for tuning hig...
In this thesis, we studied the behavior of parallel programs to understand how to automated the task...
Scaling a parallel program to modern supercomputers is challenging due to inter-process communicatio...
Abstract. A sophisticated approach for the parallel execution of irreg-ular applications on parallel...
Scientific applications will have to scale to many thousands of processor cores to reach petascale. ...
High-performance computing systems have become increasingly dynamic, complex, and unpredictable. To ...
Load balance is critical for performance in large parallel applica-tions. An imbalance on today’s fa...
The amount of parallelism in modern supercomputers currently grows from generation to generation, an...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
Applications must scale well to make efficient use of even medium-scale parallel systems. Because sc...
Driven by growing application requirements and accelerated by current trends in microprocessor desig...
Supercomputers play a key role in countless areas of science and engineering, enabling the developme...
Cutting-edge science and engineering applications require petascale computing. Petascale computing p...
We present a new technique for identifying scalability bottle-necks in executions of single-program,...
With the ubiquity of multi-core processors, software must make effective use of multiple cores to ob...
Identifying performance bottlenecks and their associated calling contexts is critical for tuning hig...
In this thesis, we studied the behavior of parallel programs to understand how to automated the task...
Scaling a parallel program to modern supercomputers is challenging due to inter-process communicatio...
Abstract. A sophisticated approach for the parallel execution of irreg-ular applications on parallel...
Scientific applications will have to scale to many thousands of processor cores to reach petascale. ...
High-performance computing systems have become increasingly dynamic, complex, and unpredictable. To ...
Load balance is critical for performance in large parallel applica-tions. An imbalance on today’s fa...