Abstract—To facilitate proactive fault management in large-scale systems such as IBM Blue Gene/P, online failure prediction is of paramount importance. While many techniques have been presented for online failure prediction, questions arise regarding two commonly used approaches: period-based and event-driven. Which one has better accuracy? What is the best observation window (i.e., the time interval used to collect evidence before making a prediction)? How does the lead time (i.e., the time interval from the prediction to the failure occurrence) impact prediction arruracy? To answer these questions, we analyze and compare period-based and event-driven prediction approaches via a Bayesian prediction model. We evaluate these prediction appro...
Abstract The availability of software systems can be increased by preventive measures which are trig...
Abstract The availability of software systems can be increased by preventive measures which are trig...
<p>Failures at runtime in complex software systems are inevitable because these systems usually cont...
Analyzing, understanding and predicting failure is of paramount importance to achieve effective faul...
The growing computational and storage needs of scientific applications mandate the deployment of ext...
The demand for more computational power in science and engineering has spurred the design and deploy...
Online failure prediction is an approach that aims to increase system reliability by predicting pend...
The demands of increasingly large scientific application workflows lead to the need for more powerfu...
With ever-growing complexity and dynamicity of computer systems, proactive fault management is an ef...
The growing computational and storage needs of scientific applications mandate the deploy-ment of ex...
In this paper, we present the Framework for building Failure Prediction Models ((FPM)-P-2), a Machin...
YesFailure is an increasingly important issue in high performance computing and cloud systems. As la...
Failure prediction is one of the key challenges that have to be mastered for a new arena of fault to...
Masteroppgave i informasjons- og kommunikasjonsteknologi IKT590 2011 – Universitetet i Agder, Grims...
Online failure prediction approaches aim to predict the manifestation of failures at runtime before ...
Abstract The availability of software systems can be increased by preventive measures which are trig...
Abstract The availability of software systems can be increased by preventive measures which are trig...
<p>Failures at runtime in complex software systems are inevitable because these systems usually cont...
Analyzing, understanding and predicting failure is of paramount importance to achieve effective faul...
The growing computational and storage needs of scientific applications mandate the deployment of ext...
The demand for more computational power in science and engineering has spurred the design and deploy...
Online failure prediction is an approach that aims to increase system reliability by predicting pend...
The demands of increasingly large scientific application workflows lead to the need for more powerfu...
With ever-growing complexity and dynamicity of computer systems, proactive fault management is an ef...
The growing computational and storage needs of scientific applications mandate the deploy-ment of ex...
In this paper, we present the Framework for building Failure Prediction Models ((FPM)-P-2), a Machin...
YesFailure is an increasingly important issue in high performance computing and cloud systems. As la...
Failure prediction is one of the key challenges that have to be mastered for a new arena of fault to...
Masteroppgave i informasjons- og kommunikasjonsteknologi IKT590 2011 – Universitetet i Agder, Grims...
Online failure prediction approaches aim to predict the manifestation of failures at runtime before ...
Abstract The availability of software systems can be increased by preventive measures which are trig...
Abstract The availability of software systems can be increased by preventive measures which are trig...
<p>Failures at runtime in complex software systems are inevitable because these systems usually cont...