Checkpointing protocols usually rely on the constitution of consistent global states, from which the application can restart upon a failure. This paper proposes a new characterization and technique to build a recoverable state, aiming at relaxing the constraints and overhead. , for Promised consistency, is proposed as such a recovery condition on a global state. A key idea is to use promised events: place holders forcing any restart to reach an actual global state of the first execution. A preliminary contribution is a formal treatment of potential causality, studying its impact on recoverability and determinism
This article proposes an original approach that applies the Rollback-Dependency Trackability (RDT) p...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
International audienceFault-tolerance protocols play an important role in today long runtime scienti...
Checkpointing protocols usually rely on the constitution of consistent global states, from which the...
The reliability of concurrent and distributed systems often depends on some well-known techniques fo...
Checkpointing in a distributed system is essential for recovery to a globally consistent state after...
We consider the problem of bringing a distributed system to a consistent state after transient fail...
This survey covers rollback-recovery techniques that do not require special language constructs. In ...
International audienceThis paper analyzes the relationship between a distributed checkpoint/rollback...
This paper considers the problem of constructing the maximum and the minimum consistent global check...
A transaction-consistent global checkpoint of a database records a state of the database which refle...
dbj ©rice.edu In a distributed system using rollback recovery, information saved on stable storage d...
International audienceRollback is a fundamental technique for ensuring reliability of systems, allow...
rollback, recovery The problem of rollback-recovery in message-passing systems has undergone extensi...
International audienceProcessor failures in post-petascale parallel computing platforms are common o...
This article proposes an original approach that applies the Rollback-Dependency Trackability (RDT) p...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
International audienceFault-tolerance protocols play an important role in today long runtime scienti...
Checkpointing protocols usually rely on the constitution of consistent global states, from which the...
The reliability of concurrent and distributed systems often depends on some well-known techniques fo...
Checkpointing in a distributed system is essential for recovery to a globally consistent state after...
We consider the problem of bringing a distributed system to a consistent state after transient fail...
This survey covers rollback-recovery techniques that do not require special language constructs. In ...
International audienceThis paper analyzes the relationship between a distributed checkpoint/rollback...
This paper considers the problem of constructing the maximum and the minimum consistent global check...
A transaction-consistent global checkpoint of a database records a state of the database which refle...
dbj ©rice.edu In a distributed system using rollback recovery, information saved on stable storage d...
International audienceRollback is a fundamental technique for ensuring reliability of systems, allow...
rollback, recovery The problem of rollback-recovery in message-passing systems has undergone extensi...
International audienceProcessor failures in post-petascale parallel computing platforms are common o...
This article proposes an original approach that applies the Rollback-Dependency Trackability (RDT) p...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
International audienceFault-tolerance protocols play an important role in today long runtime scienti...