Memory hardware reliability is an indispensable part of whole-system dependability. This paper presents the collection of realistic memory hardware error traces (in-cluding transient and non-transient errors) from produc-tion computer systems with more than 800 GB memory for around nine months. Detailed information on the er-ror addresses allows us to identify patterns of single-bit, row, column, and whole-chip memory errors. Based on the collected traces, we explore the implications of differ-ent hardware ECC protection schemes so as to identify the most common error causes and approximate error rates exposed to the software level. Further, we investigate the software system suscepti-bility to major error causes, with the goal of validatin...
This paper presents the results of an extensive fault injection study of the impact of processor fau...
The evolution of high-performance and low-cost microprocessors has led to their almost pervasive usa...
Error Correcting Code (ECC) techniques aims at providing concurrent correction and detection of sing...
Memory hardware reliability is an indispensable part of whole-system dependability. This paper prese...
Memory hardware reliability is an indispensable part of whole-system dependability. Its importance...
Thesis (Ph. D.)--University of Rochester. Dept. of Electrical and Computer Engineering, 2012In moder...
This thesis addresses the problem of measuring hardware error sensitivity of computer systems. Hardw...
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
With continued CMOS scaling, future shipped hardware will be increasingly vulnerable to in-the-field...
Technology and voltage scaling is making integrated circuits increasingly susceptible to failures ca...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
Technology scaling of integrated circuits is making transistors increasingly sensitive to process va...
Defects in semiconductor memory chips and errors of their functioning are of interest to both manufa...
Over three decades of continuous scaling in CMOS technology has led to tremendous improvements in pr...
Hardware errors are projected to increase in modern computer systems due to shrinking feature sizes ...
This paper presents the results of an extensive fault injection study of the impact of processor fau...
The evolution of high-performance and low-cost microprocessors has led to their almost pervasive usa...
Error Correcting Code (ECC) techniques aims at providing concurrent correction and detection of sing...
Memory hardware reliability is an indispensable part of whole-system dependability. This paper prese...
Memory hardware reliability is an indispensable part of whole-system dependability. Its importance...
Thesis (Ph. D.)--University of Rochester. Dept. of Electrical and Computer Engineering, 2012In moder...
This thesis addresses the problem of measuring hardware error sensitivity of computer systems. Hardw...
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
With continued CMOS scaling, future shipped hardware will be increasingly vulnerable to in-the-field...
Technology and voltage scaling is making integrated circuits increasingly susceptible to failures ca...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
Technology scaling of integrated circuits is making transistors increasingly sensitive to process va...
Defects in semiconductor memory chips and errors of their functioning are of interest to both manufa...
Over three decades of continuous scaling in CMOS technology has led to tremendous improvements in pr...
Hardware errors are projected to increase in modern computer systems due to shrinking feature sizes ...
This paper presents the results of an extensive fault injection study of the impact of processor fau...
The evolution of high-performance and low-cost microprocessors has led to their almost pervasive usa...
Error Correcting Code (ECC) techniques aims at providing concurrent correction and detection of sing...