Memory hardware reliability is an indispensable part of whole-system dependability. Its importance is evidenced by a plethora of prior research work studying the impact of memory errors on software systems. However, the absence of solid understanding of the error characteristics prevents software system researchers from making well reasoned assumptions, and it also hinders the careful evaluations over different choices of fault tolerance design. In this paper, we present our realistic memory hardware error traces collected from production computer systems with more than 800GB memory for around nine months. Based on the traces (including detailed information on the error addresses and patterns), we explore the implications of...
Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations,...
This report examines the state of the field of software fault tolerance. Terminology, techniques for...
With continued CMOS scaling, future shipped hardware will be increasingly vulnerable to in-the-field...
Memory hardware reliability is an indispensable part of whole-system dependability. This paper prese...
Thesis (Ph. D.)--University of Rochester. Dept. of Electrical and Computer Engineering, 2012In moder...
This thesis addresses the problem of measuring hardware error sensitivity of computer systems. Hardw...
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
Technology and voltage scaling is making integrated circuits increasingly susceptible to failures ca...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
In recent years, software defects have become the dominant cause of customer outage, and improvement...
Hardware errors are projected to increase in modern computer systems due to shrinking feature sizes ...
Over three decades of continuous scaling in CMOS technology has led to tremendous improvements in pr...
The evolution of high-performance and low-cost microprocessors has led to their almost pervasive usa...
International audienceSystem reliability has become a main concern during the computer-based system ...
Technology scaling of integrated circuits is making transistors increasingly sensitive to process va...
Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations,...
This report examines the state of the field of software fault tolerance. Terminology, techniques for...
With continued CMOS scaling, future shipped hardware will be increasingly vulnerable to in-the-field...
Memory hardware reliability is an indispensable part of whole-system dependability. This paper prese...
Thesis (Ph. D.)--University of Rochester. Dept. of Electrical and Computer Engineering, 2012In moder...
This thesis addresses the problem of measuring hardware error sensitivity of computer systems. Hardw...
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
Technology and voltage scaling is making integrated circuits increasingly susceptible to failures ca...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
In recent years, software defects have become the dominant cause of customer outage, and improvement...
Hardware errors are projected to increase in modern computer systems due to shrinking feature sizes ...
Over three decades of continuous scaling in CMOS technology has led to tremendous improvements in pr...
The evolution of high-performance and low-cost microprocessors has led to their almost pervasive usa...
International audienceSystem reliability has become a main concern during the computer-based system ...
Technology scaling of integrated circuits is making transistors increasingly sensitive to process va...
Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations,...
This report examines the state of the field of software fault tolerance. Terminology, techniques for...
With continued CMOS scaling, future shipped hardware will be increasingly vulnerable to in-the-field...