International audienceTo execute MPI applications reliably, fault tolerance mechanisms are needed. Message logging is a well known solution to provide fault tolerance for MPI applications. It as been proved that it can tolerate higher failure rate than coordinated checkpointing. However pessimistic and causal message logging can induce high overhead on failure free execution. In this paper, we present O2P, a new optimistic message logging protocol, based on active optimistic message logging. Contrary to existing optimistic message logging protocols that saves dependency information on reliable storage periodically, O2P logs dependency information as soon as possible to reduce the amount of data piggybacked on application messages. Thus it r...
Message logging is a popular technique for building systems that can tolerate process crashes and tr...
Fault tolerance in parallel systems has traditionally been achieved through a combination of redunda...
With the growing scale of HPC applications, there has been an increase in the number of interruption...
International audienceTo execute MPI applications reliably, fault tolerance mechanisms are needed. M...
International audienceWith the growing scale of high performance computing platforms, fault toleranc...
Message logging is a transparent solution to provide fault tolerance for message passing application...
International audienceMessage logging is an attractive solution to provide fault tolerance for messa...
Execution of MPI applications on Clusters and Grid deployments suffers from node and network failure...
International audienceFault tolerance in MPI becomes a main issue in the HPC community. Several appr...
bouteill,lemarini,gk,fci lri.fr MPI is one of the most adopted programming models for Large Cluste...
This paper presents an efficient scheme to implement the optimistic message logging and the asynchro...
AbstractWith the growing scale of High Performance Computing applications comes an increase in the n...
International audienceFault tolerance is becoming a major concern in HPC systems. The two traditiona...
Message logging is a popular technique for building systems that can tolerate process crashes and tr...
Fault tolerance in parallel systems has traditionally been achieved through a combination of redunda...
With the growing scale of HPC applications, there has been an increase in the number of interruption...
International audienceTo execute MPI applications reliably, fault tolerance mechanisms are needed. M...
International audienceWith the growing scale of high performance computing platforms, fault toleranc...
Message logging is a transparent solution to provide fault tolerance for message passing application...
International audienceMessage logging is an attractive solution to provide fault tolerance for messa...
Execution of MPI applications on Clusters and Grid deployments suffers from node and network failure...
International audienceFault tolerance in MPI becomes a main issue in the HPC community. Several appr...
bouteill,lemarini,gk,fci lri.fr MPI is one of the most adopted programming models for Large Cluste...
This paper presents an efficient scheme to implement the optimistic message logging and the asynchro...
AbstractWith the growing scale of High Performance Computing applications comes an increase in the n...
International audienceFault tolerance is becoming a major concern in HPC systems. The two traditiona...
Message logging is a popular technique for building systems that can tolerate process crashes and tr...
Fault tolerance in parallel systems has traditionally been achieved through a combination of redunda...
With the growing scale of HPC applications, there has been an increase in the number of interruption...