Abstract—Memory errors are a major source of reliability problems in current computers. Undetected errors may result in program termination, or, even worse, silent data corruption. Recent studies have shown that the frequency of permanent memory errors is an order of magnitude higher than previously assumed and regularly affects everyday operation. Often, neither additional circuitry to support hardware-based error detection nor downtime for performing hardware tests can be afforded. In the case of permanent memory errors, a system faces two challenges: detecting errors as early as possible and handling them while avoiding system downtime. To increase system reliability, we have developed RAMpage, an online memory testing infrastructure for...
System reliability is becoming a significant concern as technology continues to shrink. This is beca...
In this dissertation we address the overhead reduction of fault tolerance (FT) techniques. Due to te...
Fault-tolerance has become an essential concern for processor designers due to increasing soft-error...
Thesis (Ph. D.)--University of Rochester. Dept. of Electrical and Computer Engineering, 2012In moder...
Today’s computers have gigabytes of main memory due to improved DRAM density. As density increases, ...
Abstract: Soft errors are emerging with the ongoing reduction of structure sizes in current and futu...
We present a software approach to hardware error injection in a running process on Linux. Then we an...
Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations,...
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has beco...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
Persistent memory (PMEM) technologies preserve data across power cycles and provide performance comp...
Persistent memory (PM) technologies offer performance close to DRAM with persistence. Persistent mem...
Memory reliability has been a major design constraint for mission-critical and large-scale systems f...
Over three decades of continuous scaling in CMOS technology has led to tremendous improvements in pr...
System reliability is becoming a significant concern as technology continues to shrink. This is beca...
In this dissertation we address the overhead reduction of fault tolerance (FT) techniques. Due to te...
Fault-tolerance has become an essential concern for processor designers due to increasing soft-error...
Thesis (Ph. D.)--University of Rochester. Dept. of Electrical and Computer Engineering, 2012In moder...
Today’s computers have gigabytes of main memory due to improved DRAM density. As density increases, ...
Abstract: Soft errors are emerging with the ongoing reduction of structure sizes in current and futu...
We present a software approach to hardware error injection in a running process on Linux. Then we an...
Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations,...
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has beco...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
Persistent memory (PMEM) technologies preserve data across power cycles and provide performance comp...
Persistent memory (PM) technologies offer performance close to DRAM with persistence. Persistent mem...
Memory reliability has been a major design constraint for mission-critical and large-scale systems f...
Over three decades of continuous scaling in CMOS technology has led to tremendous improvements in pr...
System reliability is becoming a significant concern as technology continues to shrink. This is beca...
In this dissertation we address the overhead reduction of fault tolerance (FT) techniques. Due to te...
Fault-tolerance has become an essential concern for processor designers due to increasing soft-error...