A new method to determin an optimal retry policy and for use in retry of fault characterization is presented. An optimal retry policy for a given fault characteristic, which determines the maximum allowable retry durations to minimize the total task completion time was derived. The combined fault characterization and retry decision, in which the characteristics of fault are estimated simultaneously with the determination of the optimal retry policy were carried out. Two solution approaches were developed, one based on the point estimation and the other on the Bayes sequential decision. The maximum likelihood estimators are used for the first approach, and the backward induction for testing hypotheses in the second approach. Numerical exampl...
An important practical problem in fault diagnosis is discriminating between permanent faults and tra...
Traditional reliability-related models for fault-tolerant systems are used to predict system reliabi...
Abstract: Fault-tolerance is a crucial aspect of safety critical systems. When such systems need to ...
A general framework for the design and analysis of distributed fault-tolerant systems is proposed in...
Following an initial mapping of a problem onto a multiprocessor machine or computer network, system ...
AbstractSystem reliability is an important aspect of real-time systems, because the result of a real...
Various aspects of reliable computing are formalized and quantified with emphasis on efficient fault...
For the vast majority of computer systems correct operation is defined as producing the correct resu...
A powerful technique particularly appropriate for the detection of errors caused by transient faults...
As High Performance Computing (HPC) systems increase in size to fulfill computational power demand, ...
This paper presents an optimising model for integrating the traditional reliability prediction meth...
A high-level design is presented for a reliable computing platform for real-time control application...
A real-time control system is generally composed of a synergistic pair, a controlled process and a c...
Due to the critical nature of the tasks in hard real-time systems, it is essential that faults be to...
We present a formal approach to implement fault-tolerance in real-time embedded systems. The initial...
An important practical problem in fault diagnosis is discriminating between permanent faults and tra...
Traditional reliability-related models for fault-tolerant systems are used to predict system reliabi...
Abstract: Fault-tolerance is a crucial aspect of safety critical systems. When such systems need to ...
A general framework for the design and analysis of distributed fault-tolerant systems is proposed in...
Following an initial mapping of a problem onto a multiprocessor machine or computer network, system ...
AbstractSystem reliability is an important aspect of real-time systems, because the result of a real...
Various aspects of reliable computing are formalized and quantified with emphasis on efficient fault...
For the vast majority of computer systems correct operation is defined as producing the correct resu...
A powerful technique particularly appropriate for the detection of errors caused by transient faults...
As High Performance Computing (HPC) systems increase in size to fulfill computational power demand, ...
This paper presents an optimising model for integrating the traditional reliability prediction meth...
A high-level design is presented for a reliable computing platform for real-time control application...
A real-time control system is generally composed of a synergistic pair, a controlled process and a c...
Due to the critical nature of the tasks in hard real-time systems, it is essential that faults be to...
We present a formal approach to implement fault-tolerance in real-time embedded systems. The initial...
An important practical problem in fault diagnosis is discriminating between permanent faults and tra...
Traditional reliability-related models for fault-tolerant systems are used to predict system reliabi...
Abstract: Fault-tolerance is a crucial aspect of safety critical systems. When such systems need to ...