Consistent checkpointing provides transparent fault tol erance for longrunning distributed applications. In this paper we describe performance measurements of an im plementation of consistent checkpointing. Our measure ments show that consistent checkpointing performs re markably well. We executed eight computeintensive dis tributed applications on a network of 16 diskless Sun3/60 workstations, comparing the performance without check pointing to the performance with consistent checkpoints taken at 2minute intervals. For six of the eight applica tions, the running time increased by less than 1% as a re sult of the checkpointing. The highest overhead measured for any of the applications was 58%. Incremental check pointing and co...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
This is a post-peer-review, pre-copyedit version of an article published in New Generation Computing...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
Due to the character of the original source materials and the nature of batch digitization, quality ...
Checkpointing schemes enable fault-tolerant parallel and distributed computing by leveraging the red...
desirable features: A process can independently initiate consistent global checkpointing by saving i...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
In checkpointing schemes with task duplication, checkpointing serves two purposes: detecting faults ...
This paper examines the performance of synchronous checkpointing in a distributed computing environm...
In this report, we consider the impact of the consistency model on checkpointing and rollback algori...
As we move to large manycores, the hardware-based global checkpointing schemes that have been propo...
AbstractThe execution times of large-scale parallel applications on modern multi/many-core systems a...
Checkpointing is a common technique for reducing the time to recover from faults in computer systems...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
This is a post-peer-review, pre-copyedit version of an article published in New Generation Computing...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
International audienceIn this paper, we design and analyze strategies to replicate the execution of ...
Due to the character of the original source materials and the nature of batch digitization, quality ...
Checkpointing schemes enable fault-tolerant parallel and distributed computing by leveraging the red...
desirable features: A process can independently initiate consistent global checkpointing by saving i...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
In checkpointing schemes with task duplication, checkpointing serves two purposes: detecting faults ...
This paper examines the performance of synchronous checkpointing in a distributed computing environm...
In this report, we consider the impact of the consistency model on checkpointing and rollback algori...
As we move to large manycores, the hardware-based global checkpointing schemes that have been propo...
AbstractThe execution times of large-scale parallel applications on modern multi/many-core systems a...
Checkpointing is a common technique for reducing the time to recover from faults in computer systems...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
This is a post-peer-review, pre-copyedit version of an article published in New Generation Computing...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...