There are few runtime tools for modestly sized computing systems, with 10^3 processors, and above this scale, they work poorly. We present the Stack Trace Analysis Tool (STAT) to aid in debugging extreme-scale applications. STAT can reduce the problem exploration space from thousands of processes to a few by sampling application stack traces to form process equivalence classes,, groups of processes exhibiting similar behavior. In typical parallel computations, large numbers of processes exhibit a small number of different behavior classes, manifested as common patterns in their stack traces. The problem space is reduced to representatives from these common behavior classes upon which we can use full-featured debuggers for root cause anal...
This thesis describes the design and implementation of an integrated debugging system for parallel p...
ABSTRACT: Tracing allows the analysis of task interactions with each other and with the operating sy...
Scaling a parallel program to modern supercomputers is challenging due to inter-process communicatio...
We present the Stack Trace Analysis Tool (STAT) to aid in debugging extreme-scale applications. STAT...
We present STATBench, an emulator of a scalable, lightweight, and effective tool to help debug extre...
Petascale systems will present several new challenges to performance and correctness tools. Such mac...
We present a scalable temporal order analysis technique that sup-ports debugging of large scale appl...
Developing correct and efficient software for large scale systems is a challenging task. Developers ...
Petascale platforms with O(10{sup 5}) and O(10{sup 6}) processing cores are driving advancements in ...
Abstract—Statistical debugging identifies program behaviors that are highly correlated with failures...
As computational systems grow more and more complex, their debugging and performance optimization be...
A powerful and widely-used method for analyzing the performance behavior of parallel programs is ev...
Abstract—During software development, exceptions are by no means exceptional: Programmers repeatedly...
Multicore is here to stay. To keep up with the hardware innovation, software developers mustmove fro...
A powerful and widely-used method for analyzing the performance behavior of parallel programs is eve...
This thesis describes the design and implementation of an integrated debugging system for parallel p...
ABSTRACT: Tracing allows the analysis of task interactions with each other and with the operating sy...
Scaling a parallel program to modern supercomputers is challenging due to inter-process communicatio...
We present the Stack Trace Analysis Tool (STAT) to aid in debugging extreme-scale applications. STAT...
We present STATBench, an emulator of a scalable, lightweight, and effective tool to help debug extre...
Petascale systems will present several new challenges to performance and correctness tools. Such mac...
We present a scalable temporal order analysis technique that sup-ports debugging of large scale appl...
Developing correct and efficient software for large scale systems is a challenging task. Developers ...
Petascale platforms with O(10{sup 5}) and O(10{sup 6}) processing cores are driving advancements in ...
Abstract—Statistical debugging identifies program behaviors that are highly correlated with failures...
As computational systems grow more and more complex, their debugging and performance optimization be...
A powerful and widely-used method for analyzing the performance behavior of parallel programs is ev...
Abstract—During software development, exceptions are by no means exceptional: Programmers repeatedly...
Multicore is here to stay. To keep up with the hardware innovation, software developers mustmove fro...
A powerful and widely-used method for analyzing the performance behavior of parallel programs is eve...
This thesis describes the design and implementation of an integrated debugging system for parallel p...
ABSTRACT: Tracing allows the analysis of task interactions with each other and with the operating sy...
Scaling a parallel program to modern supercomputers is challenging due to inter-process communicatio...