TR-COSC 06/91A principal requirement of a safety critical system is that it should be able to cope with errors and deficiencies in software and hardware. There are two main approaches in handling this viz., masking and recovery. Masking is usually achieved by replicating the hardware/software. One can either adopt strategies such as voting [Avi85] or treat part of the system as a shadow system and activate it when a fault occurs [HAH89]. Even if a subset of the components fail, the entire system can continue to function. The degree of replication depends on the criticality of the unit and the probability of failure. It is easy to see that such a technique cannot be adopted for large systems, as the cost would be prohibitively large. Recov...
Summary. We study the problems of failure detection and consensus in asynchronous systems in which p...
We revisit the problem of detecting the termination of a distributed application in an asynchronous ...
Faults in computer control systems cause great economic losses and endanger human beings. In order t...
This paper describes how the transformational framework developed in [Liu91, LJ92] is applied to bac...
This work investigates the amount of information about failures required to simulate a synchronous d...
We propose a new algorithm for recovering asynchronously from failures in a distributed computation....
n complex concurrent critical systems, such as autonomous robots, unmanned air vehicles, and space s...
In this paper, we have addressed the complex problem of recovery for concurrent failures in distribu...
Highly automated manufacturing systems have gained industrial popularity for their ability to combin...
This paper studies the impact of omission failures on asynchronous distributed s ystems with crash-s...
AbstractThe termination detection problem involves detecting whether an ongoing distributed computat...
In the crash-recovery failure model of asynchronous distributed systems, processes can temporarily s...
Distributed systems are the basis of widespread computing facilities enabling many of our daily life...
Gracefully recovering from software and hardware faults is important to ensuring highly reliable an...
A Thesis Submitted to the Faculty 0/ Engineering, University 0/ Lite Witwatersrand, Johannesburg in...
Summary. We study the problems of failure detection and consensus in asynchronous systems in which p...
We revisit the problem of detecting the termination of a distributed application in an asynchronous ...
Faults in computer control systems cause great economic losses and endanger human beings. In order t...
This paper describes how the transformational framework developed in [Liu91, LJ92] is applied to bac...
This work investigates the amount of information about failures required to simulate a synchronous d...
We propose a new algorithm for recovering asynchronously from failures in a distributed computation....
n complex concurrent critical systems, such as autonomous robots, unmanned air vehicles, and space s...
In this paper, we have addressed the complex problem of recovery for concurrent failures in distribu...
Highly automated manufacturing systems have gained industrial popularity for their ability to combin...
This paper studies the impact of omission failures on asynchronous distributed s ystems with crash-s...
AbstractThe termination detection problem involves detecting whether an ongoing distributed computat...
In the crash-recovery failure model of asynchronous distributed systems, processes can temporarily s...
Distributed systems are the basis of widespread computing facilities enabling many of our daily life...
Gracefully recovering from software and hardware faults is important to ensuring highly reliable an...
A Thesis Submitted to the Faculty 0/ Engineering, University 0/ Lite Witwatersrand, Johannesburg in...
Summary. We study the problems of failure detection and consensus in asynchronous systems in which p...
We revisit the problem of detecting the termination of a distributed application in an asynchronous ...
Faults in computer control systems cause great economic losses and endanger human beings. In order t...