This paper considers the problem of constructing the maximum and the minimum consistent global checkpoints that contain a target set of checkpoints, and identify it as a generic issue in recovery-related applications. We formulate the problem as a reachability analysis problem on a directed rollback-dependency graph, and develop efficient algorithms to calculate the two consistent global checkpoints for both general nondeterministic executions and piecewise deterministic executions. We also demonstrate that the approach provides a generalization and unifying framework for many existing and potential applications including software error recovery, mobile computing recovery, parallel debugging and output commits. 1 Introduction A checkpoint ...
A global checkpoint of a distributed computation is a a set of local checkpoints (local states), one...
dbj ©rice.edu In a distributed system using rollback recovery, information saved on stable storage d...
A global checkpoint of a distributed computation is a a set of local checkpoints (local states), one...
This paper considers the problem of constructing the maximum and the minimum consistent global check...
Finding consistent global checkpoints of a distributed computation is important for analyzing, testi...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
A distributed coordinated checkpointing algorithm for distributed mobile systems is presented. A con...
We consider the problem of bringing a distributed system to a consistent state after transient fail...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpointing in a distributed system is essential for recovery to a globally consistent state after...
In this work, we present a high performance recovery algorithm for distributed systems in which chec...
A global checkpoint of a distributed computation is a a set of local checkpoints (local states), one...
dbj ©rice.edu In a distributed system using rollback recovery, information saved on stable storage d...
A global checkpoint of a distributed computation is a a set of local checkpoints (local states), one...
This paper considers the problem of constructing the maximum and the minimum consistent global check...
Finding consistent global checkpoints of a distributed computation is important for analyzing, testi...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
A distributed coordinated checkpointing algorithm for distributed mobile systems is presented. A con...
We consider the problem of bringing a distributed system to a consistent state after transient fail...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpointing in a distributed system is essential for recovery to a globally consistent state after...
In this work, we present a high performance recovery algorithm for distributed systems in which chec...
A global checkpoint of a distributed computation is a a set of local checkpoints (local states), one...
dbj ©rice.edu In a distributed system using rollback recovery, information saved on stable storage d...
A global checkpoint of a distributed computation is a a set of local checkpoints (local states), one...