Replay of parallel execution is required by HPC debuggers and resilience mechanisms. Up-to-date, there is no existing deterministic replay solution for one-sided communication. The essential problem is that the readers of updated data do not have any information on which remote threads produced the updates, the conventional happens-before based ordering tracking techniques are challenging to work at scale. This paper presents SReplay, the first software tool for sub-group deterministic record and replay for one-sided communication. SReplay allows the user to specify and record the execution of a set of threads of interest (sub-group), and then deterministically replays the execution of the sub-group on a local machine without starting the r...
ATHAPASCAN-0 programs are executed by a network of communicating threads evolving dynamically. Withi...
In the area of debugging parallel executions, record and replay is a technique that allows determini...
The processor industry is at an inflection point. In the past, performance was the driving force beh...
Replay of parallel execution is required by HPC debuggers and resilience mechanisms. Up-to-date, the...
ABSTRACT Message Passing Interface (MPI) is a widely used standard for managing coarse-grained concu...
The ability to reproduce a parallel execution is desirable for debugging and program reliability pur...
Clusters of shared-memory symmetric multiprocessors are increasingly used for high performance...
The debugging cycle is the most common methodology for finding and correcting errors in sequential p...
The debugging cycle is the most common methodology for finding and correcting errors in sequential p...
Record and deterministic Replay (RnR) is a primitive with many proposed applications in computer sys...
While a lot of work has been focused on design and programming of shared memory multi-core architect...
Shared-memory parallel programs are inherently nondeterministic, making it difficult to diagnose rar...
Significant time is spent by companies trying to reproduce and fix bugs. BugNet is a recent architec...
Ability to replay a program’s execution on a multi-processor system can significantly help parallel ...
Recent research in deterministic record-replayseeks to ease debugging, security, and fault tolerance...
ATHAPASCAN-0 programs are executed by a network of communicating threads evolving dynamically. Withi...
In the area of debugging parallel executions, record and replay is a technique that allows determini...
The processor industry is at an inflection point. In the past, performance was the driving force beh...
Replay of parallel execution is required by HPC debuggers and resilience mechanisms. Up-to-date, the...
ABSTRACT Message Passing Interface (MPI) is a widely used standard for managing coarse-grained concu...
The ability to reproduce a parallel execution is desirable for debugging and program reliability pur...
Clusters of shared-memory symmetric multiprocessors are increasingly used for high performance...
The debugging cycle is the most common methodology for finding and correcting errors in sequential p...
The debugging cycle is the most common methodology for finding and correcting errors in sequential p...
Record and deterministic Replay (RnR) is a primitive with many proposed applications in computer sys...
While a lot of work has been focused on design and programming of shared memory multi-core architect...
Shared-memory parallel programs are inherently nondeterministic, making it difficult to diagnose rar...
Significant time is spent by companies trying to reproduce and fix bugs. BugNet is a recent architec...
Ability to replay a program’s execution on a multi-processor system can significantly help parallel ...
Recent research in deterministic record-replayseeks to ease debugging, security, and fault tolerance...
ATHAPASCAN-0 programs are executed by a network of communicating threads evolving dynamically. Withi...
In the area of debugging parallel executions, record and replay is a technique that allows determini...
The processor industry is at an inflection point. In the past, performance was the driving force beh...