Component failure in large-scale IT installations such as cluster supercomputers or internet service providers is becoming an ever larger problem as the number of processors, memory chips and disks in a single cluster approaches a million. In this paper, we present and analyze field-gathered disk replacement data from five systems in production use at three organizations, two supercomputing sites and one internet service provider. About 70,000 disks are covered by this data, some for an entire lifetime of 5 years. All disks were high-performance enterprise disks (SCSI or FC), whose datasheet MTTF of 1,200,000 hours suggest a nominal annual failure rate of at most 0.75%. We find that in the field, annual disk replacement rates exceed 1%, wi...
Modern day datacenters host hundreds of thousands of servers that coordinate tasks in order to deliv...
Today's most reliable data storage systems are made of redundant arrays of inexpensive disks (RAID)....
The workloads running in the modern data centers of large scale Internet service providers (such asA...
It is estimated that over 90 % of all new information produced in the world is being stored on magne...
Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you
Designing highly dependable systems requires a good understanding of failure characteristics. Unfort...
While mean time to data loss (MTTDL) provides an easy way to estimate the reliability of redundant d...
© 2016 Authors. The problem of SMART-data ambiguity in different models of hard disk drives of the s...
Despite the growing popularity of Solid State Disks (SSDs) in the datacenter, little is known about ...
It has become commonplace to observe frequent multiple disk failures in big data centers in which th...
README.txt Maintenance example belonging to: The MANTIS Book: Cyber Physical System Based Proac...
Abstract — A major problem in managing large-scale datacenters is diagnosing and fixing machine fail...
Mean Time To failure, MTTF, is a commonly accepted metric for reliability. In this paper we present ...
It has become commonplace to observe frequent multiple disk failures in big data centers in which th...
Archiving and systematic backup of large digital data generates a quick demand for multi-petabyte sc...
Modern day datacenters host hundreds of thousands of servers that coordinate tasks in order to deliv...
Today's most reliable data storage systems are made of redundant arrays of inexpensive disks (RAID)....
The workloads running in the modern data centers of large scale Internet service providers (such asA...
It is estimated that over 90 % of all new information produced in the world is being stored on magne...
Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you
Designing highly dependable systems requires a good understanding of failure characteristics. Unfort...
While mean time to data loss (MTTDL) provides an easy way to estimate the reliability of redundant d...
© 2016 Authors. The problem of SMART-data ambiguity in different models of hard disk drives of the s...
Despite the growing popularity of Solid State Disks (SSDs) in the datacenter, little is known about ...
It has become commonplace to observe frequent multiple disk failures in big data centers in which th...
README.txt Maintenance example belonging to: The MANTIS Book: Cyber Physical System Based Proac...
Abstract — A major problem in managing large-scale datacenters is diagnosing and fixing machine fail...
Mean Time To failure, MTTF, is a commonly accepted metric for reliability. In this paper we present ...
It has become commonplace to observe frequent multiple disk failures in big data centers in which th...
Archiving and systematic backup of large digital data generates a quick demand for multi-petabyte sc...
Modern day datacenters host hundreds of thousands of servers that coordinate tasks in order to deliv...
Today's most reliable data storage systems are made of redundant arrays of inexpensive disks (RAID)....
The workloads running in the modern data centers of large scale Internet service providers (such asA...