Nixdorf Computer The initial design for a distributed, fault-tolerant version of UNIX based on three-way atomic message transmission was presented in an earlier paper [3]. The implementation effort then moved from Auragen Systems ’ to Nixdorf Computer where it was completed. This paper describes the working system, now known as the TARGON/32. The original design left open questions in at least two areas: fault tolerance for server processes and recovery after a crash were briefly and inaccurately sketched, rebackup after recovery was not discussed at all. The fundamental design involving three-way message transmission has remained unchanged. However, in addition to important changes in the implementation, server backup has been redesigned a...
For implementing fault-tolerance in multicomputer systems, backward error recovery, based on checkpo...
Software Implementation of Multi-Processor Fault Tolerance for Real-Time processing is addressed in ...
Fault-tolerance is an important requirement in distributed computing systems. However, designing ap...
A distributed system is a collection of nodes connected by a network, an ideal platform to provide h...
As human dependence on computing technology increases, so does the need for computer system dependab...
This book covers the most essential techniques for designing and building dependable distributed sys...
in Sender-based message logging supports transparent fault tolerance in distributed sys-tems in whic...
Computers are being used to achieve increasingly sophisticated control for large and complex systems...
Fault tolerance can allow processes executing in a computer system to survive failures within the sy...
Traditional reliability-related models for fault-tolerant systems are used to predict system reliabi...
In this paper, we have addressed the complex problem of recovery for concurrent failures in distribu...
This thesis addresses issues in building fault-tolerant distributed real-time systems. Such systems ...
A general framework for the design and analysis of distributed fault-tolerant systems is proposed in...
Distributed systems are the basis of widespread computing facilities enabling many of our daily life...
Traditionally, distributed systems requiring high dependability were designed using custom hardware ...
For implementing fault-tolerance in multicomputer systems, backward error recovery, based on checkpo...
Software Implementation of Multi-Processor Fault Tolerance for Real-Time processing is addressed in ...
Fault-tolerance is an important requirement in distributed computing systems. However, designing ap...
A distributed system is a collection of nodes connected by a network, an ideal platform to provide h...
As human dependence on computing technology increases, so does the need for computer system dependab...
This book covers the most essential techniques for designing and building dependable distributed sys...
in Sender-based message logging supports transparent fault tolerance in distributed sys-tems in whic...
Computers are being used to achieve increasingly sophisticated control for large and complex systems...
Fault tolerance can allow processes executing in a computer system to survive failures within the sy...
Traditional reliability-related models for fault-tolerant systems are used to predict system reliabi...
In this paper, we have addressed the complex problem of recovery for concurrent failures in distribu...
This thesis addresses issues in building fault-tolerant distributed real-time systems. Such systems ...
A general framework for the design and analysis of distributed fault-tolerant systems is proposed in...
Distributed systems are the basis of widespread computing facilities enabling many of our daily life...
Traditionally, distributed systems requiring high dependability were designed using custom hardware ...
For implementing fault-tolerance in multicomputer systems, backward error recovery, based on checkpo...
Software Implementation of Multi-Processor Fault Tolerance for Real-Time processing is addressed in ...
Fault-tolerance is an important requirement in distributed computing systems. However, designing ap...