ABSTRACT Message Passing Interface (MPI) is a widely used standard for managing coarse-grained concurrency on distributed computers. Debugging parallel MPI applications, however, has always been a particularly challenging task due to their high degree of concurrent execution and non-deterministic behavior. Deterministic replay is a potentially powerful technique for addressing these challenges, with existing MPI replay tools adopting either data-replay or orderreplay approaches. Unfortunately, each approach has its tradeoffs. Data-replay generates substantial log sizes by recording every communication message. Order-replay generates small logs, but requires all processes to be replayed together. We believe that these drawbacks are the prima...
Shared-memory parallel programs are inherently nondeterministic, making it difficult to diagnose rar...
International audience— As reported by many recent studies, the mean time between failures of future...
Clusters of shared-memory symmetric multiprocessors are increasingly used for high performance...
Record and deterministic Replay (RnR) is a primitive with many proposed applications in computer sys...
Replay of parallel execution is required by HPC debuggers and resilience mechanisms. Up-to-date, the...
Application record and replay is the ability to record application execution and replay it at a late...
In this paper we present an execution replay system for Athapascan, an MPI-based multi-threaded runt...
In the area of debugging parallel executions, record and replay is a technique that allows determini...
The ability to reproduce a parallel execution is desirable for debugging and program reliability pur...
While a lot of work has been focused on design and programming of shared memory multi-core architect...
The Parallel Debugging Tool (PDT) of the Annai programming environment is developed within the Joint...
Debugging of concurrent systems is a tedious and error-prone activity. A main issue is that there is...
International audienceReplication has recently gained attention in the context of fault tolerance fo...
The debugging cycle is the most common methodology for finding and correcting errors in sequential p...
Accepté à EUC'2014International audienceThis work presents a debugging methodology for MPSoC based o...
Shared-memory parallel programs are inherently nondeterministic, making it difficult to diagnose rar...
International audience— As reported by many recent studies, the mean time between failures of future...
Clusters of shared-memory symmetric multiprocessors are increasingly used for high performance...
Record and deterministic Replay (RnR) is a primitive with many proposed applications in computer sys...
Replay of parallel execution is required by HPC debuggers and resilience mechanisms. Up-to-date, the...
Application record and replay is the ability to record application execution and replay it at a late...
In this paper we present an execution replay system for Athapascan, an MPI-based multi-threaded runt...
In the area of debugging parallel executions, record and replay is a technique that allows determini...
The ability to reproduce a parallel execution is desirable for debugging and program reliability pur...
While a lot of work has been focused on design and programming of shared memory multi-core architect...
The Parallel Debugging Tool (PDT) of the Annai programming environment is developed within the Joint...
Debugging of concurrent systems is a tedious and error-prone activity. A main issue is that there is...
International audienceReplication has recently gained attention in the context of fault tolerance fo...
The debugging cycle is the most common methodology for finding and correcting errors in sequential p...
Accepté à EUC'2014International audienceThis work presents a debugging methodology for MPSoC based o...
Shared-memory parallel programs are inherently nondeterministic, making it difficult to diagnose rar...
International audience— As reported by many recent studies, the mean time between failures of future...
Clusters of shared-memory symmetric multiprocessors are increasingly used for high performance...