This paper extends results concerning the recovery of accurate parallel program traces from corrupted traces initially gathered with software monitoring. Earlier work developed an approach that functioned for idealized machines. Here, we explain how this approach can be modified to track the temporal uncertainties due to the software monitoring mechanisms, and to halt when the recovered order can no longer be guaranteed. By quantifying certain system and application parameters, we are able to compute both worst-case and average-case estimates of the number of trace events that our modified trace-recovery algorithm will be able to recover before it can no longer guarantee the recovered order of events. We also report the values of these key ...
It is easy to find errors and inefficient parts of a sequential program, by using a standard debugge...
The need for increased computing capability and more diverse hardware with its evermore complex topo...
We have studied two related issues in the design, execution and debugging of shared memory parallel ...
Execution monitoring plays a central role in most software development tools for parallel and distri...
Abstract. Tracing parallel programs to observe their performance introduces in-trusion as the result...
//TRACE1 is a new approach for extracting and replaying traces of parallel applications to recreate ...
A powerful and widely-used method for analyzing the performance behavior of parallel programs is eve...
This paper presents a generic approach for deriving detectably recoverable implementations of many w...
A powerful and widely-used method for analyzing the performance behavior of parallel programs is eve...
Event tracing of applications under dynamic execution is crucial for performance modeling, optimizat...
This paper describes techniques which automatically detect data races in parallel programs by analyz...
A common debugging strategy involves re-executing a program (on a given input) over and over, each t...
The need for increased computing capability and more diverse hardware with its evermore complex topo...
This thesis focuses on the notation of representative quality of software generated traces of messag...
A powerful and widely-used method for analyzing the performance behavior of parallel programs is ev...
It is easy to find errors and inefficient parts of a sequential program, by using a standard debugge...
The need for increased computing capability and more diverse hardware with its evermore complex topo...
We have studied two related issues in the design, execution and debugging of shared memory parallel ...
Execution monitoring plays a central role in most software development tools for parallel and distri...
Abstract. Tracing parallel programs to observe their performance introduces in-trusion as the result...
//TRACE1 is a new approach for extracting and replaying traces of parallel applications to recreate ...
A powerful and widely-used method for analyzing the performance behavior of parallel programs is eve...
This paper presents a generic approach for deriving detectably recoverable implementations of many w...
A powerful and widely-used method for analyzing the performance behavior of parallel programs is eve...
Event tracing of applications under dynamic execution is crucial for performance modeling, optimizat...
This paper describes techniques which automatically detect data races in parallel programs by analyz...
A common debugging strategy involves re-executing a program (on a given input) over and over, each t...
The need for increased computing capability and more diverse hardware with its evermore complex topo...
This thesis focuses on the notation of representative quality of software generated traces of messag...
A powerful and widely-used method for analyzing the performance behavior of parallel programs is ev...
It is easy to find errors and inefficient parts of a sequential program, by using a standard debugge...
The need for increased computing capability and more diverse hardware with its evermore complex topo...
We have studied two related issues in the design, execution and debugging of shared memory parallel ...