.1. Using Log Information to Perform Statistical Analysis on Failures Encountered by Large-Scale HPC Deployments

Narate Taerat
Nichamon Naksinehaboon
Clayton Ch
James Elliott
Chokchai (box Leangsuksun
George Ostrouchov
Stephen L. Scott

Publication date

January 2015

Abstract

Abstract — System- and application-level failures can be characterized by mining relevant log files and performing statistical analysis on the provided information. The resulting data may then be used in any number of future developments and studies on the corresponding computational architecture, including fields such as failure prediction, fault tolerance, performance modelling and power awareness. This paper provides a statistical analysis of the application- and system-level failures encountered and logged by the IBM Blue Gene/L supercomputing system over a six month period. I

Extracted data

We use cookies to provide a better user experience.

Data Protection

.1. Using Log Information to Perform Statistical Analysis on Failures Encountered by Large-Scale HPC Deployments

Abstract

Extracted data

.1. Using Log Information to Perform Statistical Analysis on Failures Encountered by Large-Scale HPC Deployments

Abstract

Extracted data

Related items

Related items