AbstractWith the growing scale of High Performance Computing applications comes an increase in the number of interruptions as a consequence of hardware failures. As the tendency is to scale parallel executions to hundred of thousands of processes, fault tolerance is becoming an important matter. Uncoordinated fault tolerance protocols, such as message logging, seem to be the best option since coordinated protocols might compromise applications scalability. Considering that most of the overhead during failure-free executions is caused by message logging approaches, in this paper we propose a Hybrid Message Logging protocol. It focuses on combining the fast recovery feature of pessimistic receiver-based message logging with the low protection...
International audienceTo execute MPI applications reliably, fault tolerance mechanisms are needed. M...
International audienceFault tolerance in MPI becomes a main issue in the HPC community. Several appr...
The era of petascale computing brought machines with hundreds of thousands of processors. The next g...
AbstractWith the growing scale of High Performance Computing applications comes an increase in the n...
With the growing scale of HPC applications, there has been an increase in the number of interruption...
With the growing scale of HPC applications, there has been an increase in the number of interruption...
International audienceWith the growing scale of high performance computing platforms, fault toleranc...
Message logging is a popular technique for building systems that can tolerate process crashes and tr...
in Sender-based message logging supports transparent fault tolerance in distributed sys-tems in whic...
Execution of MPI applications on Clusters and Grid deployments suffers from node and network failure...
Message logging protocols are an integral part of a technique for implementing processes that can re...
Abstract—The era of petascale computing brought machines with hundreds of thousands of processors. T...
Fault tolerance can allow processes executing in a computer system to survive failures within the sy...
Message logging protocols are an integral part of a technique for implementing processes that can re...
International audienceMessage logging is an attractive solution to provide fault tolerance for messa...
International audienceTo execute MPI applications reliably, fault tolerance mechanisms are needed. M...
International audienceFault tolerance in MPI becomes a main issue in the HPC community. Several appr...
The era of petascale computing brought machines with hundreds of thousands of processors. The next g...
AbstractWith the growing scale of High Performance Computing applications comes an increase in the n...
With the growing scale of HPC applications, there has been an increase in the number of interruption...
With the growing scale of HPC applications, there has been an increase in the number of interruption...
International audienceWith the growing scale of high performance computing platforms, fault toleranc...
Message logging is a popular technique for building systems that can tolerate process crashes and tr...
in Sender-based message logging supports transparent fault tolerance in distributed sys-tems in whic...
Execution of MPI applications on Clusters and Grid deployments suffers from node and network failure...
Message logging protocols are an integral part of a technique for implementing processes that can re...
Abstract—The era of petascale computing brought machines with hundreds of thousands of processors. T...
Fault tolerance can allow processes executing in a computer system to survive failures within the sy...
Message logging protocols are an integral part of a technique for implementing processes that can re...
International audienceMessage logging is an attractive solution to provide fault tolerance for messa...
International audienceTo execute MPI applications reliably, fault tolerance mechanisms are needed. M...
International audienceFault tolerance in MPI becomes a main issue in the HPC community. Several appr...
The era of petascale computing brought machines with hundreds of thousands of processors. The next g...