The main goal of this thesis is to define a rollback-recovery fault tolerance protocol for the asynchronous communicating active objects model ASP (Asynchronous Sequential Processes), and its Java implementation ProActive. This work generalises the problem raised by the development of this protocol: we study the recovery of a distributed execution from an inconsistent global state. We then propose a checkpointing protocol and its implementation that does not rely on consistent global states. We demonstrate the model efficiency through realistic experiments using communicating distributed applications that this solution is efficient in practice. Another more general contribution to the problematic of recovering from a inconsistent global sta...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Distributed systems are the basis of widespread computing facilities enabling many of our daily life...
Checkpointing protocols usually rely on the constitution of consistent global states, from which the...
The main goal of this thesis is to define a rollback-recovery fault tolerance protocol for the async...
Advances in new technologies in the field of wireless systems and communications have given rise to ...
We propose a new algorithm for recovering asynchronously from failures in a distributed computation....
In this work, we present a high performance recovery algorithm for distributed systems in which chec...
The development of reliable distributed software is simplified by the ability to assume a fail-stop...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
In this paper, we present a new protocol for optimistic rollback recovery in distributed systems. Th...
The transactor model, an extension to the actor model, spec-ifies an operational semantics to model ...
We consider the problem of bringing a distributed system to a consistent state after transient fail...
International audienceThe move towards exascale super-computers requires new fault tolerance solutio...
International audienceFault-tolerance protocols play an important role in today long runtime scienti...
We consider the problem of developing reliable services to be deployed in partitionable asynchronous...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Distributed systems are the basis of widespread computing facilities enabling many of our daily life...
Checkpointing protocols usually rely on the constitution of consistent global states, from which the...
The main goal of this thesis is to define a rollback-recovery fault tolerance protocol for the async...
Advances in new technologies in the field of wireless systems and communications have given rise to ...
We propose a new algorithm for recovering asynchronously from failures in a distributed computation....
In this work, we present a high performance recovery algorithm for distributed systems in which chec...
The development of reliable distributed software is simplified by the ability to assume a fail-stop...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
In this paper, we present a new protocol for optimistic rollback recovery in distributed systems. Th...
The transactor model, an extension to the actor model, spec-ifies an operational semantics to model ...
We consider the problem of bringing a distributed system to a consistent state after transient fail...
International audienceThe move towards exascale super-computers requires new fault tolerance solutio...
International audienceFault-tolerance protocols play an important role in today long runtime scienti...
We consider the problem of developing reliable services to be deployed in partitionable asynchronous...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Distributed systems are the basis of widespread computing facilities enabling many of our daily life...
Checkpointing protocols usually rely on the constitution of consistent global states, from which the...