Checkpointing and rollback recovery are techniques that can provide efficient recovery from transient process failures. In a message-passing system, the rollback of a message sender may cause the rollback of the corresponding receiver, and the system needs to roll back to a consistent set of checkpoints called the recovery line. If the processes are allowed to take uncoordinated checkpoints, the above rollback propagation may result in the domino effect which prevents recovery line progression. Traditionally, only obsolete checkpoints before the global recovery line can be discarded, and the necessary and sufficient condition for identifying all garbage checkpoints has remained an open problem.In this thesis, we derive a necessary and suffi...
The domino effect is an important problem for the checkpointing and rollback recovery in distributed...
A message is {\it in-transit} with respect to a global state if its sending is recorded in this glob...
This paper introduces an effective communication-induced checkpointing protocol using message loggin...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
A~p1Wixd bar Pub- lso Abstract I Uncoordinated checkpointing for message-passing systems allows maxi...
This survey covers rollback-recovery techniques that do not require special language constructs. In ...
rollback, recovery The problem of rollback-recovery in message-passing systems has undergone extensi...
Backward error recovery is one of the most used schemes to ensure fault-tolera- nce in distributed s...
Backward error recovery is one of the most used schemes to ensure fault-tolera- nce in distributed s...
This paper considers the problem of constructing the maximum and the minimum consistent global check...
Communication-induced checkpointing protocols that ensure rollback-dependency trackability (RDT) gua...
Message logging and checkpointing can provide fault tolerance in distributed systems in which all pr...
.... Abstract a process is logged on stable storage [5], and each process is occasionally checkpoint...
A message is {\it in-transit} with respect to a global state if its sending is recorded in this glob...
The domino effect is an important problem for the checkpointing and rollback recovery in distributed...
A message is {\it in-transit} with respect to a global state if its sending is recorded in this glob...
This paper introduces an effective communication-induced checkpointing protocol using message loggin...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
A~p1Wixd bar Pub- lso Abstract I Uncoordinated checkpointing for message-passing systems allows maxi...
This survey covers rollback-recovery techniques that do not require special language constructs. In ...
rollback, recovery The problem of rollback-recovery in message-passing systems has undergone extensi...
Backward error recovery is one of the most used schemes to ensure fault-tolera- nce in distributed s...
Backward error recovery is one of the most used schemes to ensure fault-tolera- nce in distributed s...
This paper considers the problem of constructing the maximum and the minimum consistent global check...
Communication-induced checkpointing protocols that ensure rollback-dependency trackability (RDT) gua...
Message logging and checkpointing can provide fault tolerance in distributed systems in which all pr...
.... Abstract a process is logged on stable storage [5], and each process is occasionally checkpoint...
A message is {\it in-transit} with respect to a global state if its sending is recorded in this glob...
The domino effect is an important problem for the checkpointing and rollback recovery in distributed...
A message is {\it in-transit} with respect to a global state if its sending is recorded in this glob...
This paper introduces an effective communication-induced checkpointing protocol using message loggin...