dbj ©rice.edu In a distributed system using rollback recovery, information saved on stable storage during failure-free execution allows certain states of each process to be recovered after a failure. For example, in a deterministic system using message logging and checkpointing, a process state can be recovered only if all messages received by the process since its previous checkpoint have been logged. In a nondeterministic system using checkpointing alone, a process state can be recovered only if it has been recorded in a checkpoint. Optimistic rollback recovery methods in general record this information asynchronously, assuming that a suitable recoverable system state can be constructed for use during recovery. A system state is called re...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
We propose a new algorithm for recovering asynchronously from failures in a distributed computation....
Message logging and checkpointing can provide fault tolerance in distributed systems in which all pr...
In a distributed system using message logging and checkpointing to provide fault tol-erance, there i...
.... Abstract a process is logged on stable storage [5], and each process is occasionally checkpoint...
Message logging and check pointing can provide fault tolerance in distributed systems in which all p...
We consider the problem of bringing a distributed system to a consistent state after transient fail...
rollback, recovery The problem of rollback-recovery in message-passing systems has undergone extensi...
Checkpointing in a distributed system is essential for recovery to a globally consistent state after...
This survey covers rollback-recovery techniques that do not require special language constructs. In ...
In this work, we present a high performance recovery algorithm for distributed systems in which chec...
This survey covers rollback-recovery techniques that do not require special language constructs. In ...
In this paper, we present a new protocol for optimistic rollback recovery in distributed systems. Th...
This survey covers rollback-recovery techniques that do not require special language constructs. In ...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
We propose a new algorithm for recovering asynchronously from failures in a distributed computation....
Message logging and checkpointing can provide fault tolerance in distributed systems in which all pr...
In a distributed system using message logging and checkpointing to provide fault tol-erance, there i...
.... Abstract a process is logged on stable storage [5], and each process is occasionally checkpoint...
Message logging and check pointing can provide fault tolerance in distributed systems in which all p...
We consider the problem of bringing a distributed system to a consistent state after transient fail...
rollback, recovery The problem of rollback-recovery in message-passing systems has undergone extensi...
Checkpointing in a distributed system is essential for recovery to a globally consistent state after...
This survey covers rollback-recovery techniques that do not require special language constructs. In ...
In this work, we present a high performance recovery algorithm for distributed systems in which chec...
This survey covers rollback-recovery techniques that do not require special language constructs. In ...
In this paper, we present a new protocol for optimistic rollback recovery in distributed systems. Th...
This survey covers rollback-recovery techniques that do not require special language constructs. In ...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
We propose a new algorithm for recovering asynchronously from failures in a distributed computation....