The design of survivable algorithms requires a solid foundation for executing them. While hardware techniques for fault-tolerant computing are relatively well understood, fault-tolerant operating systems, as well as fault-tolerant applications (survivable algorithms), are, by contrast, little understood, and much more work in this field is required. We outline some of our work that contributes to the foundation of ultrareliable operating systems and fault-tolerant algorithm design. We introduce our consensus-based framework for fault-tolerant system design. This is followed by a description of a hierarchical partitioning method for efficient consensus. A scheduler for redundancy management is introduced, and application-specific fault toler...
Problems related to the design of the hardware for an integrated aircraft electronic system are cons...
Various aspects of reliable computing are formalized and quantified with emphasis on efficient fault...
Fault tolerance can be defined as a concept of recovery that keeps a computer system operational by ...
The management of redundancy in computer systems was studied and guidelines were provided for the de...
An Ultrareliable, Fault-Tolerant, Control-System (UFTCS) concept is described using a systems design...
A high-level design is presented for a reliable computing platform for real-time control application...
The central thesis of this research is toward the concept of reliability through redundancy for comp...
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2005.In...
An important consideration in the design of high performance multiprocessor systems is to ensure the...
This paper revises and introduces to the field of reconfigurable computer systems, some traditional ...
The impact of a five year space mission environment on fault-tolerant parallel processor architectur...
The general inadequacy of Ada for programming systems that must survive processor loss was shown. A ...
Fault-tolerant computing began between 1965 and 1970, probably with the highly reliable ...
Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardwar...
A methodology for the design of a tightly coupled, highly reliable microprocessor based computer sys...
Problems related to the design of the hardware for an integrated aircraft electronic system are cons...
Various aspects of reliable computing are formalized and quantified with emphasis on efficient fault...
Fault tolerance can be defined as a concept of recovery that keeps a computer system operational by ...
The management of redundancy in computer systems was studied and guidelines were provided for the de...
An Ultrareliable, Fault-Tolerant, Control-System (UFTCS) concept is described using a systems design...
A high-level design is presented for a reliable computing platform for real-time control application...
The central thesis of this research is toward the concept of reliability through redundancy for comp...
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2005.In...
An important consideration in the design of high performance multiprocessor systems is to ensure the...
This paper revises and introduces to the field of reconfigurable computer systems, some traditional ...
The impact of a five year space mission environment on fault-tolerant parallel processor architectur...
The general inadequacy of Ada for programming systems that must survive processor loss was shown. A ...
Fault-tolerant computing began between 1965 and 1970, probably with the highly reliable ...
Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardwar...
A methodology for the design of a tightly coupled, highly reliable microprocessor based computer sys...
Problems related to the design of the hardware for an integrated aircraft electronic system are cons...
Various aspects of reliable computing are formalized and quantified with emphasis on efficient fault...
Fault tolerance can be defined as a concept of recovery that keeps a computer system operational by ...