The paper describes a parallel program checkpointing mechanism and its potential application in Grid systems in order to migrate applications among Grid sites. The checkpointing mechanism can automatically (without user interaction) support generic PVM programs created by the PGRADE Grid programming environment. The developed checkpointing mechanism is general enough to be used by any Grid job manager but the current implementation is connected to Condor. As a result, the integrated Condor/PGRADE system can guarantee the execution of any PVM program in the Grid. Notice that the Condor system can only guarantee the execution of sequential jobs. Integration of the Grid migration framework and the Mercury Grid monitor results in an observable ...
The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting dist...
The Grid environment is generic, heterogeneous, and dynamic with lots of unreliable resources making...
This paper presents an source-level software system, PMT, which performs task migrations for long-ru...
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault t...
This paper introduces a novel approach in parallel checkpointing aimed at supporting fault-tolerance...
The need for increased computational power is growing faster than our ability to produce faster comp...
The paper describes the latest novel features of the P-GRADE (Parallel Grid Run-time and Applicatio...
Abstract. A grid checkpointing service providing migration and transparent fault tolerance is import...
Kemunculan perkomputeran grid telah membolehkan perkongsian sumber perkomputeran teragih antara pes...
“Grid ” computing has emerged as an important new research field. With years of efforts, Grid resear...
Optimizing a given software system to exploit the features of the underlying system has been an area...
We are currently involved in research to enable PVM to take advantage of shared networks of workstat...
We present the design and implementation of a general task monitoring and steering system for Grid a...
Abstract. With the maturity of the Grid, the community has made an important effort in developing mi...
Typical computational grid users target only a single cluster and have to estimate the runtime of th...
The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting dist...
The Grid environment is generic, heterogeneous, and dynamic with lots of unreliable resources making...
This paper presents an source-level software system, PMT, which performs task migrations for long-ru...
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault t...
This paper introduces a novel approach in parallel checkpointing aimed at supporting fault-tolerance...
The need for increased computational power is growing faster than our ability to produce faster comp...
The paper describes the latest novel features of the P-GRADE (Parallel Grid Run-time and Applicatio...
Abstract. A grid checkpointing service providing migration and transparent fault tolerance is import...
Kemunculan perkomputeran grid telah membolehkan perkongsian sumber perkomputeran teragih antara pes...
“Grid ” computing has emerged as an important new research field. With years of efforts, Grid resear...
Optimizing a given software system to exploit the features of the underlying system has been an area...
We are currently involved in research to enable PVM to take advantage of shared networks of workstat...
We present the design and implementation of a general task monitoring and steering system for Grid a...
Abstract. With the maturity of the Grid, the community has made an important effort in developing mi...
Typical computational grid users target only a single cluster and have to estimate the runtime of th...
The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting dist...
The Grid environment is generic, heterogeneous, and dynamic with lots of unreliable resources making...
This paper presents an source-level software system, PMT, which performs task migrations for long-ru...