Adaptive checkpointing is a relatively new approach that is particularly suitable for providing fault-tolerance in dynamic and unstable grid environments. The approach allows for periodic modification of checkpointing intervals at run-time, when additional information becomes available. In this paper an adaptive algorithm, named MeanFailureCP+, is introduced that deals with checkpointing of grid applications with execution times that are unknown a priori. The algorithm modifies its parameters, based on dynamically collected feedback on its performance. Simulation results show that the new algorithm performs even better than adaptive approaches that make use of exact information on job execution times
AbstractCheckpointing mechanism is used to tolerate the impact of transient faults by rollback opera...
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault t...
In grid workflow systems, to verify temporal constraints efficiently at the run-time execution stage...
As grids typically consist of autonomously managed subsystems with strongly varying resources, fault...
In grid workflow systems, a checkpoint selection strategy is responsible for selecting checkpoints f...
A grid is a distributed computational and storage environment often composed of heterogeneous autono...
In grid workflow systems, a checkpoint selection strategy is responsible for selecting checkpoints f...
Job checkpointing is one of the most common utilized techniques for providing fault tolerance in com...
One of the major challenges in wide use of Grid workflow systems is fault tolerance and avoidance. C...
One of the major challenges in wide use of Grid workflow systems is fault tolerance and avoidance. C...
Grid applications run on environment that is prone to different kinds of failures. Fault tolerance i...
A computational grid environment, due to its heterogeneous, autonomous and dynamic nature is prone t...
In grid workflow systems, temporal correctness is critical to assure the timely completion of grid w...
All in-text references underlined in blue are linked to publications on ResearchGate, letting you ac...
International audienceFrequent resources failures are a major challenge for the rapid completion of ...
AbstractCheckpointing mechanism is used to tolerate the impact of transient faults by rollback opera...
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault t...
In grid workflow systems, to verify temporal constraints efficiently at the run-time execution stage...
As grids typically consist of autonomously managed subsystems with strongly varying resources, fault...
In grid workflow systems, a checkpoint selection strategy is responsible for selecting checkpoints f...
A grid is a distributed computational and storage environment often composed of heterogeneous autono...
In grid workflow systems, a checkpoint selection strategy is responsible for selecting checkpoints f...
Job checkpointing is one of the most common utilized techniques for providing fault tolerance in com...
One of the major challenges in wide use of Grid workflow systems is fault tolerance and avoidance. C...
One of the major challenges in wide use of Grid workflow systems is fault tolerance and avoidance. C...
Grid applications run on environment that is prone to different kinds of failures. Fault tolerance i...
A computational grid environment, due to its heterogeneous, autonomous and dynamic nature is prone t...
In grid workflow systems, temporal correctness is critical to assure the timely completion of grid w...
All in-text references underlined in blue are linked to publications on ResearchGate, letting you ac...
International audienceFrequent resources failures are a major challenge for the rapid completion of ...
AbstractCheckpointing mechanism is used to tolerate the impact of transient faults by rollback opera...
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault t...
In grid workflow systems, to verify temporal constraints efficiently at the run-time execution stage...