The Totem protocol supports the maintenance of consistency of replicated information in fault-tolerant distributed systems by providing reliable totally ordered delivery of messages. The membership algorithm of Totem maintains a consistent view of processors in a local-area network and handles all aspects of reconfiguration, including restarting of failed processors and remerging of partitioned networks. In this paper we describe a network monitor with graphical output that we have constructed for the Totem protocol development environment. The development environment executes the protocol object modules unmodified while simulating network communication, timing, and fault injection. The network monitor collects information from the processo...
Fault tolerance can allow processes executing in a computer system to survive failures within the sy...
We introduce a new algorithm for consistent failure detection in asynchronous systems. Informally, c...
We consider issues of fault tolerance for distributed computing systems at two levels of system desi...
The Totem system supports fault-tolerant applications in which distributed processes cooperate to pe...
Introduction The development of communication protocols is significantly more difficult than the de...
We present the Totem multiple-ring protocol, a novel reliable ordered multicast protocol for multipl...
Totem: A Reliable Ordered Delivery Protocol for Interconnected Local-Area Networks by Deborah A. Ag...
This paper describes the implementation of a processorgroup membership protocol in an experimental r...
A monitoring approach to the problem of constructing fault-tolerant and adaptive real-time systems, ...
This paper describes a configurable membership protocol for distributed tasks in time-triggered syst...
Using Java to Present Fault Tolerance in Distributed Systems by Saurabh Jain Many current applicati...
Given the prevalence of powerful personal workstations connected over local area networks, it is on...
Modeling the reliability of distributed systems requires a good understanding of the reliability of ...
A monitoring approach to the problem of constructing fault-tolerant and adaptive real-time systems, ...
Abstract — It is a challenge to provide detection facilities for large scale distributed systems run...
Fault tolerance can allow processes executing in a computer system to survive failures within the sy...
We introduce a new algorithm for consistent failure detection in asynchronous systems. Informally, c...
We consider issues of fault tolerance for distributed computing systems at two levels of system desi...
The Totem system supports fault-tolerant applications in which distributed processes cooperate to pe...
Introduction The development of communication protocols is significantly more difficult than the de...
We present the Totem multiple-ring protocol, a novel reliable ordered multicast protocol for multipl...
Totem: A Reliable Ordered Delivery Protocol for Interconnected Local-Area Networks by Deborah A. Ag...
This paper describes the implementation of a processorgroup membership protocol in an experimental r...
A monitoring approach to the problem of constructing fault-tolerant and adaptive real-time systems, ...
This paper describes a configurable membership protocol for distributed tasks in time-triggered syst...
Using Java to Present Fault Tolerance in Distributed Systems by Saurabh Jain Many current applicati...
Given the prevalence of powerful personal workstations connected over local area networks, it is on...
Modeling the reliability of distributed systems requires a good understanding of the reliability of ...
A monitoring approach to the problem of constructing fault-tolerant and adaptive real-time systems, ...
Abstract — It is a challenge to provide detection facilities for large scale distributed systems run...
Fault tolerance can allow processes executing in a computer system to survive failures within the sy...
We introduce a new algorithm for consistent failure detection in asynchronous systems. Informally, c...
We consider issues of fault tolerance for distributed computing systems at two levels of system desi...