For constructing fault tolerance mechanisms in large massively parallel multipro-cessor systems, a scalable fault diagnosis is necessary, which works efficiently even if there are several thousand processors in the system. In this paper we present an event-driven, distributed system-level diagnosis algorithm, based on a general diagnosis model which does not limit the number of simultaneously exist-ing faults. In particular, the relation between error detection and fault localization as well as two different methods for distributing diagnostic information are exam-ined in detail. Furthermore, we give measurements concerning how does our diag-nosis algorithm affect application performance. 1
Large multiprocessor networks require system-level fault diagnosis. Researchers have established and...
AbstractA fault diagnosis model for multiprocessor computers is proposed. Under normal operating mod...
We studied adaptive system-level fault diagnosis for multiprocessor systems. Processors can test eac...
For constructing fault tolerance mechanisms in large massively parallel multiprocessor systems, a s...
In the latest years, new ideas appeared in system level diagnosis of multiprocessor systems. In cont...
The paper presents a novel modelling technique for system-level fault diagnosis in massive parallel...
The distributed self-diagnosis of a multiprocessor/multicomputer system based on interprocessor test...
This dissertation addresses the distributed self-diagnosis of multiprocessor/multicomputer systems b...
Complex engineering systems require efficient fault diagnosis methodologies, but centralized ap-proa...
We consider issues of fault tolerance for distributed computing systems at two levels of system desi...
Diagnosability of systems is an essential property that determines how accurate any diagnostic reaso...
A general framework for the design and analysis of distributed fault-tolerant systems is proposed in...
As today\u27s distributed applications increase in complexity, it becomes increasingly difficult to ...
In previous work the authors proposed a distributed diagnosis approach consisting of two phases—prel...
The problem known as fault diagnosis in discrete-event systems is to determine a method for monitori...
Large multiprocessor networks require system-level fault diagnosis. Researchers have established and...
AbstractA fault diagnosis model for multiprocessor computers is proposed. Under normal operating mod...
We studied adaptive system-level fault diagnosis for multiprocessor systems. Processors can test eac...
For constructing fault tolerance mechanisms in large massively parallel multiprocessor systems, a s...
In the latest years, new ideas appeared in system level diagnosis of multiprocessor systems. In cont...
The paper presents a novel modelling technique for system-level fault diagnosis in massive parallel...
The distributed self-diagnosis of a multiprocessor/multicomputer system based on interprocessor test...
This dissertation addresses the distributed self-diagnosis of multiprocessor/multicomputer systems b...
Complex engineering systems require efficient fault diagnosis methodologies, but centralized ap-proa...
We consider issues of fault tolerance for distributed computing systems at two levels of system desi...
Diagnosability of systems is an essential property that determines how accurate any diagnostic reaso...
A general framework for the design and analysis of distributed fault-tolerant systems is proposed in...
As today\u27s distributed applications increase in complexity, it becomes increasingly difficult to ...
In previous work the authors proposed a distributed diagnosis approach consisting of two phases—prel...
The problem known as fault diagnosis in discrete-event systems is to determine a method for monitori...
Large multiprocessor networks require system-level fault diagnosis. Researchers have established and...
AbstractA fault diagnosis model for multiprocessor computers is proposed. Under normal operating mod...
We studied adaptive system-level fault diagnosis for multiprocessor systems. Processors can test eac...