As grids typically consist of autonomously managed subsystems with strongly varying resources, fault-tolerance forms an important aspect of the scheduling process of applications. Two well-known techniques for providing fault-tolerance in grids are periodic task checkpointing and replication. Both techniques mitigate the amount of work lost due to changing system availability but can introduce significant runtime overhead. The latter largely depends on the length of checkpointing interval and the chosen number of replicas, respectively. This paper presents a dynamic scheduling algorithm that switches between periodic checkpointing and replication to exploit the advantages of both techniques and to reduce the overhead. Furthermore, several n...
Fault tolerant Grid scheduling is of vital importance in the Grid computing world. Task replication ...
AbstractScheduling is a key component for performance guarantees in the case of distributed applicat...
In this paper, we present a checkpoint-based scheme to improve the turnaround time of bag-of-tasks a...
As grids typically consist of autonomously managed subsystems with strongly varying resources, fault...
A grid is a distributed computational and storage environment often composed of heterogeneous autono...
As grids typically consist of heterogeneously managed subsystems with strongly varying resources, re...
Job checkpointing is one of the most common utilized techniques for providing fault tolerance in com...
International audienceIn large-scale Grid computing environments, providing fault-tolerance is requi...
In this paper, we propose a scalable and fault-tolerant job scheduling framework for grid computing....
Adaptive checkpointing is a relatively new approach that is particularly suitable for providing faul...
This report provides an introduction to the design of scheduling algorithms to cope with faults on l...
Abstract: The massive dynamic virtual computing systems often generate large number of files as chec...
Scheduling jobs in distributed conditions of grid computing is nearly impossible to have a completel...
Task resubmission and checkpoint are among several popular techniques used in providing fault tolera...
International audienceParallel execution time is expected to decrease as the number of processors in...
Fault tolerant Grid scheduling is of vital importance in the Grid computing world. Task replication ...
AbstractScheduling is a key component for performance guarantees in the case of distributed applicat...
In this paper, we present a checkpoint-based scheme to improve the turnaround time of bag-of-tasks a...
As grids typically consist of autonomously managed subsystems with strongly varying resources, fault...
A grid is a distributed computational and storage environment often composed of heterogeneous autono...
As grids typically consist of heterogeneously managed subsystems with strongly varying resources, re...
Job checkpointing is one of the most common utilized techniques for providing fault tolerance in com...
International audienceIn large-scale Grid computing environments, providing fault-tolerance is requi...
In this paper, we propose a scalable and fault-tolerant job scheduling framework for grid computing....
Adaptive checkpointing is a relatively new approach that is particularly suitable for providing faul...
This report provides an introduction to the design of scheduling algorithms to cope with faults on l...
Abstract: The massive dynamic virtual computing systems often generate large number of files as chec...
Scheduling jobs in distributed conditions of grid computing is nearly impossible to have a completel...
Task resubmission and checkpoint are among several popular techniques used in providing fault tolera...
International audienceParallel execution time is expected to decrease as the number of processors in...
Fault tolerant Grid scheduling is of vital importance in the Grid computing world. Task replication ...
AbstractScheduling is a key component for performance guarantees in the case of distributed applicat...
In this paper, we present a checkpoint-based scheme to improve the turnaround time of bag-of-tasks a...