The modeling and design of a fault-tolerant multiprocessor system is addressed in this dissertation. In particular, the behavior of the system during recovery and restoration after a fault has occurred is investigated. Given that a multicomputer system is designed using the Algorithm to Architecture To Mapping Model (ATAMM) model, and that a fault (death of a computing resource) occurs during its normal steady-state operation, a model is presented as a viable research tool for predicting the performance bounds of the system during its recovery and restoration phases. Furthermore, the bounds of the performance behavior of the system during this transient mode can be assessed. These bounds include: time recover from the fault (trec), time to ...
A supercomputer is a repairable system with large number of compute nodes interconnected to work in ...
This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of o...
Non-peer-reviewedThe use of several distinct recovery procedures is one of the techniques that can b...
The modeling and design of a fault-tolerant multiprocessor system is addressed in this dissertation....
Traditional reliability-related models for fault-tolerant systems are used to predict system reliabi...
Various aspects of reliable computing are formalized and quantified with emphasis on efficient fault...
It is of great importance to operate a computer system with high reliability. Several techniques to ...
Performability is an attribute of a system which combines reliability and performance. Recovery proc...
With the new generation of very fast microprocessors and support chips, it is now possible to consid...
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer S...
A fault-tolerant multiprocessor with a rollback recovery mechanism is discussed. The rollback mechan...
AbstractSystem reliability is an important aspect of real-time systems, because the result of a real...
Research on dependable computing is undergoing a shift from traditional fault tolerance towards tech...
The use of Commercial Off-The-Shelf (COTS) processors is increasingly attractive for the space domai...
This research addresses design of a reliable computer from unreliable device technologies. A system ...
A supercomputer is a repairable system with large number of compute nodes interconnected to work in ...
This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of o...
Non-peer-reviewedThe use of several distinct recovery procedures is one of the techniques that can b...
The modeling and design of a fault-tolerant multiprocessor system is addressed in this dissertation....
Traditional reliability-related models for fault-tolerant systems are used to predict system reliabi...
Various aspects of reliable computing are formalized and quantified with emphasis on efficient fault...
It is of great importance to operate a computer system with high reliability. Several techniques to ...
Performability is an attribute of a system which combines reliability and performance. Recovery proc...
With the new generation of very fast microprocessors and support chips, it is now possible to consid...
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer S...
A fault-tolerant multiprocessor with a rollback recovery mechanism is discussed. The rollback mechan...
AbstractSystem reliability is an important aspect of real-time systems, because the result of a real...
Research on dependable computing is undergoing a shift from traditional fault tolerance towards tech...
The use of Commercial Off-The-Shelf (COTS) processors is increasingly attractive for the space domai...
This research addresses design of a reliable computer from unreliable device technologies. A system ...
A supercomputer is a repairable system with large number of compute nodes interconnected to work in ...
This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of o...
Non-peer-reviewedThe use of several distinct recovery procedures is one of the techniques that can b...