Checkpointing is widely used in robust fault-tolerant applications. We present an efficient incremental checkpointing mechanism. It requires to record only the state changes and not the complete state. After the creation of a checkpoint, state changes are logged incrementally as records in memory, with which an application can spontaneously roll back later. This incrementalism allows us to implement checkpointing with high performance. Only small constant time is required for checkpoint creation and state recording. Rollback requires linear time in the number of recorded state changes, which is bounded by the number of state variables times the number of checkpoints. We implement a Java source transformer that automatically converts an exis...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
: We propose a method to incorporate coordinated checkpointing and rollback in high performance comp...
In this paper we present a software approach, namely Fast-software-Checkpointing (FSC), to reduce th...
This paper presents a checkpointing-recovery scheme for Time Warp parallel simulation. The scheme re...
A new transparent, incremental, concurrent checkpoint mechanism for real-time and interactive applic...
We consider the problem of bringing a distributed system to a consistent state after transient fail...
Enabling the execution of Java applications on personal embedded devices could bring great benefits ...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpointing in a distributed system is essential for recovery to a globally consistent state after...
The advent of cluster computing has resulted in a thrust towards providing software mechanisms for r...
Checkpointing support allows program execution to roll-back to an earlier program point, discarding ...
desirable features: A process can independently initiate consistent global checkpointing by saving i...
This paper describes our experience with the implementation and applications of the Unix checkpointi...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
: We propose a method to incorporate coordinated checkpointing and rollback in high performance comp...
In this paper we present a software approach, namely Fast-software-Checkpointing (FSC), to reduce th...
This paper presents a checkpointing-recovery scheme for Time Warp parallel simulation. The scheme re...
A new transparent, incremental, concurrent checkpoint mechanism for real-time and interactive applic...
We consider the problem of bringing a distributed system to a consistent state after transient fail...
Enabling the execution of Java applications on personal embedded devices could bring great benefits ...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpointing in a distributed system is essential for recovery to a globally consistent state after...
The advent of cluster computing has resulted in a thrust towards providing software mechanisms for r...
Checkpointing support allows program execution to roll-back to an earlier program point, discarding ...
desirable features: A process can independently initiate consistent global checkpointing by saving i...
This paper describes our experience with the implementation and applications of the Unix checkpointi...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
: We propose a method to incorporate coordinated checkpointing and rollback in high performance comp...