Abstract—As we move towards data centers at the exascale, the reliability challenges of such enormous storage systems are daunting. We demonstrate how such systems will suffer substantial annual data loss if only traditional reliability mechanisms are employed. We argue that the architecture for exascale storage systems should incorporate novel mechanisms at or below the object level to address this problem. Our argument for such a research focus is that focusing solely on the device level will not scale, and in this study we analytically evaluate how rapidly this problem manifests. I
Recent advances in large-capacity, low-cost storage devices have led to active research in design of...
We present here a report produced by a workshop on ‘Addressing failures in exascale computing’ held ...
Reliability is a critical metric in the design and development of replication-based big data storage...
As we move towards data centers at the exascale, the reliability challenges of such enormous storage...
The introduction of Exascale storage into production systems will lead to an increase on the number ...
Reliability and availability are increasingly important in large-scale storage systems built from th...
The next generation of supercomputers will break the exascale barrier. Soon we will have systems cap...
As we look toward exascale it is clear that high-capacity HPC storage systems will incorporate the l...
Reliability and availability are increasingly important in large-scale storage systems built from th...
The storage stack in a data center consists of all the hardware and software layers involved in proc...
With the prosperity of Big Data, the performance and robustness of storage systems have become ever ...
Modern storage systems continue to increase in scale and complexity as they attempt to meet the inc...
Failure is inevitable: disks fail, hosts crash, networks partition, applications stop. Consequently...
The current approach to resilience for large high-performance computing (HPC) machines is based on g...
Emerging Web services, such as email, photo sharing, and web site archives, must preserve large volu...
Recent advances in large-capacity, low-cost storage devices have led to active research in design of...
We present here a report produced by a workshop on ‘Addressing failures in exascale computing’ held ...
Reliability is a critical metric in the design and development of replication-based big data storage...
As we move towards data centers at the exascale, the reliability challenges of such enormous storage...
The introduction of Exascale storage into production systems will lead to an increase on the number ...
Reliability and availability are increasingly important in large-scale storage systems built from th...
The next generation of supercomputers will break the exascale barrier. Soon we will have systems cap...
As we look toward exascale it is clear that high-capacity HPC storage systems will incorporate the l...
Reliability and availability are increasingly important in large-scale storage systems built from th...
The storage stack in a data center consists of all the hardware and software layers involved in proc...
With the prosperity of Big Data, the performance and robustness of storage systems have become ever ...
Modern storage systems continue to increase in scale and complexity as they attempt to meet the inc...
Failure is inevitable: disks fail, hosts crash, networks partition, applications stop. Consequently...
The current approach to resilience for large high-performance computing (HPC) machines is based on g...
Emerging Web services, such as email, photo sharing, and web site archives, must preserve large volu...
Recent advances in large-capacity, low-cost storage devices have led to active research in design of...
We present here a report produced by a workshop on ‘Addressing failures in exascale computing’ held ...
Reliability is a critical metric in the design and development of replication-based big data storage...