This paper introduces an effective communication-induced checkpointing protocol using message logging to enable the number of extra checkpoints to be far lower than the previous number. Even if a situation occurs in which it is decided that a process receiving a message has to perform forced checkpointing, our protocol allows the process to skip the forced checkpointing action if it recognizes that the state of its sender right before the receipt of the message is recoverable. Additionally, the communication-induced checkpointing protocol is thus not required to assume the piecewise deterministic model, despite being combined with message logging. This protocol can maintain these features by piggybacking a one-bit variable and an n-size vec...
This paper proposes an efficient non-blocking coordinated checkpointing algorithm for distributed me...
Message logging and check pointing can provide fault tolerance in distributed systems in which all p...
This paper proposes an efficient non-blocking coordinated checkpointing algorithm for distributed me...
The domino effect is an important problem for the checkpointing and rollback recovery in distributed...
A message is {\it in-transit} with respect to a global state if its sending is recorded in this glob...
A message is {\it in-transit} with respect to a global state if its sending is recorded in this glob...
Message logging and checkpointing can provide fault tolerance in distributed systems in which all pr...
.... Abstract a process is logged on stable storage [5], and each process is occasionally checkpoint...
This paper presents a new checkpointing algorithm for systems using reliable communication channels....
This paper presents a new checkpointing algorithm for systems using reliable communication channels....
Fault tolerance can allow processes executing in a computer system to survive failures within the sy...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
This paper presents a new checkpointing coordination scheme which utilizes the communication pattern...
A~p1Wixd bar Pub- lso Abstract I Uncoordinated checkpointing for message-passing systems allows maxi...
This paper proposes an efficient non-blocking coordinated checkpointing algorithm for distributed me...
Message logging and check pointing can provide fault tolerance in distributed systems in which all p...
This paper proposes an efficient non-blocking coordinated checkpointing algorithm for distributed me...
The domino effect is an important problem for the checkpointing and rollback recovery in distributed...
A message is {\it in-transit} with respect to a global state if its sending is recorded in this glob...
A message is {\it in-transit} with respect to a global state if its sending is recorded in this glob...
Message logging and checkpointing can provide fault tolerance in distributed systems in which all pr...
.... Abstract a process is logged on stable storage [5], and each process is occasionally checkpoint...
This paper presents a new checkpointing algorithm for systems using reliable communication channels....
This paper presents a new checkpointing algorithm for systems using reliable communication channels....
Fault tolerance can allow processes executing in a computer system to survive failures within the sy...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transien...
This paper presents a new checkpointing coordination scheme which utilizes the communication pattern...
A~p1Wixd bar Pub- lso Abstract I Uncoordinated checkpointing for message-passing systems allows maxi...
This paper proposes an efficient non-blocking coordinated checkpointing algorithm for distributed me...
Message logging and check pointing can provide fault tolerance in distributed systems in which all p...
This paper proposes an efficient non-blocking coordinated checkpointing algorithm for distributed me...