Inherent non-determinism in distributed programs and presence of multiple threads of control makes it difficult to write correct distributed software. Not surprisingly, distributed systems are particularly vulnerable to software faults. To build a distributed system capable of tolerating software faults, two important problems need to be addressed: fault detection and fault recovery. The fault detection problem requires finding a (consistent) global state of the computation that satisfies certain predicate (e.g., violation of mutual exclusion). To prevent a fault from causing any serious damage such as corrupting stable storage, it is essential that it be detected in a timely manner. However, we prove that detecting a predicate in ...
In debugging distributed programs a distinction is made between an observed error and the program fa...
Fault tolerance can be defined as a concept of recovery that keeps a computer system operational by ...
AbstractProving the properties of a program which must execute on a distributed system whose nodes m...
Inherent non-determinism in distributed programs and presence of multiple threads of control makes ...
textThis dissertation presents techniques for detecting and tolerating faults in distributed systems...
Observation of global properties of a distributed program is required in many applications such as d...
Debugging distributed programs is considerably more difficult than debugging sequential programs. We...
In distributed systems, if a hardware fault corrupts the state of a process, this error might propag...
Abstract. Concurrent programs often encounter failures, such as races, owing to the presence of sync...
A general framework for the design and analysis of distributed fault-tolerant systems is proposed in...
Predicate detection is a powerful technique to verify parallel programs. Verifying correctness of pr...
Detecting global predicates of a distributed computation is a key problem in testing and debugging d...
textDistributed systems are rapidly increasing in importance due to the need for scalable computatio...
Faults are common-place and inevitable in complex applications. Hence, automated techniques are nece...
Fault tolerance is one of the most important features required by many distributed systems. We consi...
In debugging distributed programs a distinction is made between an observed error and the program fa...
Fault tolerance can be defined as a concept of recovery that keeps a computer system operational by ...
AbstractProving the properties of a program which must execute on a distributed system whose nodes m...
Inherent non-determinism in distributed programs and presence of multiple threads of control makes ...
textThis dissertation presents techniques for detecting and tolerating faults in distributed systems...
Observation of global properties of a distributed program is required in many applications such as d...
Debugging distributed programs is considerably more difficult than debugging sequential programs. We...
In distributed systems, if a hardware fault corrupts the state of a process, this error might propag...
Abstract. Concurrent programs often encounter failures, such as races, owing to the presence of sync...
A general framework for the design and analysis of distributed fault-tolerant systems is proposed in...
Predicate detection is a powerful technique to verify parallel programs. Verifying correctness of pr...
Detecting global predicates of a distributed computation is a key problem in testing and debugging d...
textDistributed systems are rapidly increasing in importance due to the need for scalable computatio...
Faults are common-place and inevitable in complex applications. Hence, automated techniques are nece...
Fault tolerance is one of the most important features required by many distributed systems. We consi...
In debugging distributed programs a distinction is made between an observed error and the program fa...
Fault tolerance can be defined as a concept of recovery that keeps a computer system operational by ...
AbstractProving the properties of a program which must execute on a distributed system whose nodes m...