The overhead of saving checkpoints to stable storage is the dominant performance cost in checkpointing systems. In this paper, we present a complete study of compressed differences, a new algorithm for fast incremental checkpointing. Compressed differences reduce the overhead of checkpointing by saving only the words that have changed in the current checkpointing interval while monitoring those changes using page protection. We describe two checkpointing algorithms based on compressed differences, called standard and online compressed differences. These algorithms are analyzed in detail to determine the conditions that are necessary for them to improve the performance of checkpointing. We then present results of implementing these algorith...
Checkpointing is a pivotal technique in system research, with applications ranging from crash recove...
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and c...
Checkpointing is a pivotal technique in system research, with applications ranging from crash recove...
A new transparent, incremental, concurrent checkpoint mechanism for real-time and interactive applic...
For checkpointing to be practical, it has to introduce low overhead for the targeted application. As...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
Abstract. As modern supercomputing systems reach the peta-flop performance range, they grow in both ...
Fault-tolerant computer systems are increasingly being used in such applications as e-commerce, bank...
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and c...
Checkpointing is a common technique for reducing the time to recover from faults in computer systems...
Checkpointing schemes enable fault-tolerant parallel and distributed computing by leveraging the red...
As modern supercomputing systems reach the peta-flop perfor-mance range, they grow in both size and ...
In checkpointing schemes with task duplication, checkpointing serves two purposes: detecting faults ...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
This paper shows how Koo and Toueg\u27s distributed checkpointing algorithm can be modified so as to...
Checkpointing is a pivotal technique in system research, with applications ranging from crash recove...
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and c...
Checkpointing is a pivotal technique in system research, with applications ranging from crash recove...
A new transparent, incremental, concurrent checkpoint mechanism for real-time and interactive applic...
For checkpointing to be practical, it has to introduce low overhead for the targeted application. As...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
Abstract. As modern supercomputing systems reach the peta-flop performance range, they grow in both ...
Fault-tolerant computer systems are increasingly being used in such applications as e-commerce, bank...
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and c...
Checkpointing is a common technique for reducing the time to recover from faults in computer systems...
Checkpointing schemes enable fault-tolerant parallel and distributed computing by leveraging the red...
As modern supercomputing systems reach the peta-flop perfor-mance range, they grow in both size and ...
In checkpointing schemes with task duplication, checkpointing serves two purposes: detecting faults ...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
This paper shows how Koo and Toueg\u27s distributed checkpointing algorithm can be modified so as to...
Checkpointing is a pivotal technique in system research, with applications ranging from crash recove...
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and c...
Checkpointing is a pivotal technique in system research, with applications ranging from crash recove...