A fault detection service for wide area distributed computations.

Stelling, P.

Publication date

June 1998

Publisher

Argonne National Laboratory

Abstract

The potential for faults in distributed computing systems is a significant complicating factor for application developers. While a variety of techniques exist for detecting and correcting faults, the implementation of these techniques in a particular context can be difficult. Hence, we propose a fault detection service designed to be incorporated, in a modular fashion, into distributed computing systems, tools, or applications. This service uses well-known techniques based on unreliable fault detectors to detect and report component failure, while allowing the user to tradeoff timeliness of reporting against false positive rates. We describe the architecture of this service, report on experimental results that quantify its cost and accuracy...

Extracted data

We use cookies to provide a better user experience.

Data Protection

A fault detection service for wide area distributed computations.

Abstract

Extracted data

A fault detection service for wide area distributed computations.

Abstract

Extracted data

Related items

Related items