Abstract. The Grid community has made an important effort in developing middleware to provide different functionalities, such as resource discovery, resource management, job submission or execution monitoring. As part of this effort this paper addresses the design and implementation of an architecture (CPPC-G) based on services to manage the execution of fault tolerant applica-tions on Grids. The CPPC (Controller/Precompiler for Portable Checkpointing) framework is used to insert checkpoint instrumentation into the code of sequential and MPI applications. Designed services will be in charge of submission and monitoring of the execution of CPPC-instrumented ap-plications, management of checkpoint files generated by the fault-tolerant applica...
Because of increasing hardware and software complexity, the running time of many computational scien...
Abstract- In grid computing, resources are used outside the boundary of organizations and it becomes...
International audienceWe present in this paper an evaluation of fault management in the grid middlew...
Abstract. With the maturity of the Grid, the community has made an important effort in developing mi...
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault t...
compiler for Portable Checkpointing), a checkpointing tool designed for heterogeneous clusters and G...
International audienceWe present in this paper a study on fault management in a grid middleware. The...
Despite the increasing popularity of shared-memory systems, there is a lack of tools for providing f...
Abstract—The GridRPC model is well suited for high per-formance computing on grids thanks to efficie...
InteGrade is a grid middleware infrastructure that enables the use of idle computing power from user...
Abstract—As recent research has demonstrated, it is be-coming a necessity for large scale applicatio...
Because of increasing hardware and software complexity, the running time of many computational scie...
The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting dist...
The running times of large–scale computational science and engineering parallel applications, execut...
Grid applications run on environment that is prone to different kinds of failures. Fault tolerance i...
Because of increasing hardware and software complexity, the running time of many computational scien...
Abstract- In grid computing, resources are used outside the boundary of organizations and it becomes...
International audienceWe present in this paper an evaluation of fault management in the grid middlew...
Abstract. With the maturity of the Grid, the community has made an important effort in developing mi...
Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault t...
compiler for Portable Checkpointing), a checkpointing tool designed for heterogeneous clusters and G...
International audienceWe present in this paper a study on fault management in a grid middleware. The...
Despite the increasing popularity of shared-memory systems, there is a lack of tools for providing f...
Abstract—The GridRPC model is well suited for high per-formance computing on grids thanks to efficie...
InteGrade is a grid middleware infrastructure that enables the use of idle computing power from user...
Abstract—As recent research has demonstrated, it is be-coming a necessity for large scale applicatio...
Because of increasing hardware and software complexity, the running time of many computational scie...
The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting dist...
The running times of large–scale computational science and engineering parallel applications, execut...
Grid applications run on environment that is prone to different kinds of failures. Fault tolerance i...
Because of increasing hardware and software complexity, the running time of many computational scien...
Abstract- In grid computing, resources are used outside the boundary of organizations and it becomes...
International audienceWe present in this paper an evaluation of fault management in the grid middlew...