Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault tolerant mechanisms to ensure a good level of reliability during the execution of Grid jobs. While checkpointing is the most common method to achieve fault tolerance, there still is a lot of work to be done to improve the efficiency of the mechanism. The paper gives an overview of a checkpoint solution for checkpointing parallel applications executed on multiple sites in the Grid environment. The checkpointing mechanism is an improvement of the PGRADE checkpointing solution
Grid applications run on environment that is prone to different kinds of failures. Fault tolerance i...
International audienceAs high performance platforms (Clusters, Grids, etc.) continue to grow in size...
A computational grid environment, due to its heterogeneous, autonomous and dynamic nature is prone t...
The Grid environment is generic, heterogeneous, and dynamic with lots of unreliable resources making...
InteGrade is a grid middleware infrastructure that enables the use of idle computing power from user...
This paper introduces a novel approach in parallel checkpointing aimed at supporting fault-tolerance...
Abstract. With the maturity of the Grid, the community has made an important effort in developing mi...
Abstract. A grid checkpointing service providing migration and transparent fault tolerance is import...
The paper describes a parallel program checkpointing mechanism and its potential application in Grid...
Abstract- In grid computing, resources are used outside the boundary of organizations and it becomes...
One of the major challenges in wide use of Grid workflow systems is fault tolerance and avoidance. C...
Abstract. The Grid community has made an important effort in developing middleware to provide differ...
International audienceIn large-scale Grid computing environments, providing fault-tolerance is requi...
The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting dist...
One of the major challenges in wide use of Grid workflow systems is fault tolerance and avoidance. C...
Grid applications run on environment that is prone to different kinds of failures. Fault tolerance i...
International audienceAs high performance platforms (Clusters, Grids, etc.) continue to grow in size...
A computational grid environment, due to its heterogeneous, autonomous and dynamic nature is prone t...
The Grid environment is generic, heterogeneous, and dynamic with lots of unreliable resources making...
InteGrade is a grid middleware infrastructure that enables the use of idle computing power from user...
This paper introduces a novel approach in parallel checkpointing aimed at supporting fault-tolerance...
Abstract. With the maturity of the Grid, the community has made an important effort in developing mi...
Abstract. A grid checkpointing service providing migration and transparent fault tolerance is import...
The paper describes a parallel program checkpointing mechanism and its potential application in Grid...
Abstract- In grid computing, resources are used outside the boundary of organizations and it becomes...
One of the major challenges in wide use of Grid workflow systems is fault tolerance and avoidance. C...
Abstract. The Grid community has made an important effort in developing middleware to provide differ...
International audienceIn large-scale Grid computing environments, providing fault-tolerance is requi...
The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting dist...
One of the major challenges in wide use of Grid workflow systems is fault tolerance and avoidance. C...
Grid applications run on environment that is prone to different kinds of failures. Fault tolerance i...
International audienceAs high performance platforms (Clusters, Grids, etc.) continue to grow in size...
A computational grid environment, due to its heterogeneous, autonomous and dynamic nature is prone t...