Abstract—This paper deals with the impact of fault predic-tion techniques on checkpointing strategies. We consider fault-prediction systems that do not provide exact prediction dates, but instead time intervals during which faults are predicted to strike. These intervals dramatically complicate the analysis of the checkpointing strategies. We propose a new approach based upon two periodic modes, a regular mode outside prediction windows, and a proactive mode inside prediction windows, whenever the size of these windows is large enough. We are able to compute the best period for any size of the prediction windows, thereby deriving the scheduling strategy that minimizes platform waste. In addition, the results of the analytical study are nice...
The massive scale of current and next-generation massively parallel processing (MPP) systems present...
Preventive maintenance scheduling is needed by high value manufacturing industry, and the attention ...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis paper deals with the impact of fault prediction techniques on checkpointi...
This paper deals with the impact of fault prediction techniques on checkpointing strategies. We exte...
This paper deals with the impact of fault prediction techniques on checkpointing strategies. We exte...
This report provides an introduction to the design of scheduling algorithms to cope with faults on l...
International audienceParallel execution time is expected to decrease as the number of processors in...
AbstractCheckpointing mechanism is used to tolerate the impact of transient faults by rollback opera...
Checkpoint prediction and intelligent management have been recently proposed for reducing the number...
The large scale of current and next-generation massively parallel processing (MPP) systems presents ...
The application of checkpointing as a fault-tolerance measure for real-time services (i.e., services...
Cooperative checkpointing uses global knowledge of the state and health of the machine to improve pe...
Since the last decade, computing systems turn to large scale parallel platforms composed of thousand...
The application of checkpointing as a fault-tolerance measure for real-time services (i.e., services...
The massive scale of current and next-generation massively parallel processing (MPP) systems present...
Preventive maintenance scheduling is needed by high value manufacturing industry, and the attention ...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...
International audienceThis paper deals with the impact of fault prediction techniques on checkpointi...
This paper deals with the impact of fault prediction techniques on checkpointing strategies. We exte...
This paper deals with the impact of fault prediction techniques on checkpointing strategies. We exte...
This report provides an introduction to the design of scheduling algorithms to cope with faults on l...
International audienceParallel execution time is expected to decrease as the number of processors in...
AbstractCheckpointing mechanism is used to tolerate the impact of transient faults by rollback opera...
Checkpoint prediction and intelligent management have been recently proposed for reducing the number...
The large scale of current and next-generation massively parallel processing (MPP) systems presents ...
The application of checkpointing as a fault-tolerance measure for real-time services (i.e., services...
Cooperative checkpointing uses global knowledge of the state and health of the machine to improve pe...
Since the last decade, computing systems turn to large scale parallel platforms composed of thousand...
The application of checkpointing as a fault-tolerance measure for real-time services (i.e., services...
The massive scale of current and next-generation massively parallel processing (MPP) systems present...
Preventive maintenance scheduling is needed by high value manufacturing industry, and the attention ...
International audienceThis work provides an optimal checkpointing strategy to protect iterative appl...