We present a software approach to hardware error injection in a running process on Linux. Then we analyze the existing hardware error reporting tools on Linux and the error tolerance techniques. We propose a new approach to tolerate memory errors at process level. Finally we evaluate our proposals
System reliability is becoming a significant concern as technology continues to shrink. This is beca...
Fault tolerance is a key requirement in several application domains of embedded processors cores. In...
Aggressive process scaling and increasing demands of performance/cost efficiency have exacerbated th...
We present a software approach to hardware error injection in a running process on Linux. Then we an...
Abstract—Memory errors are a major source of reliability problems in current computers. Undetected e...
Thesis (Ph. D.)--University of Rochester. Dept. of Electrical and Computer Engineering, 2012In moder...
The improvement of dependability in computing systems requires the evaluation of fault tolerance mec...
Soft errors in embedded systems' memories like single-event upsets and multiple-bit upsets lead to d...
This thesis deals with techniques for designing and evaluating error detection and recovery mechanis...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
Memory hardware reliability is an indispensable part of whole-system dependability. This paper prese...
Transient hardware faults have become one of the major concerns affecting the reliability of modern ...
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
The evolution of high-performance and low-cost microprocessors has led to their almost pervasive usa...
Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations,...
System reliability is becoming a significant concern as technology continues to shrink. This is beca...
Fault tolerance is a key requirement in several application domains of embedded processors cores. In...
Aggressive process scaling and increasing demands of performance/cost efficiency have exacerbated th...
We present a software approach to hardware error injection in a running process on Linux. Then we an...
Abstract—Memory errors are a major source of reliability problems in current computers. Undetected e...
Thesis (Ph. D.)--University of Rochester. Dept. of Electrical and Computer Engineering, 2012In moder...
The improvement of dependability in computing systems requires the evaluation of fault tolerance mec...
Soft errors in embedded systems' memories like single-event upsets and multiple-bit upsets lead to d...
This thesis deals with techniques for designing and evaluating error detection and recovery mechanis...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
Memory hardware reliability is an indispensable part of whole-system dependability. This paper prese...
Transient hardware faults have become one of the major concerns affecting the reliability of modern ...
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
The evolution of high-performance and low-cost microprocessors has led to their almost pervasive usa...
Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations,...
System reliability is becoming a significant concern as technology continues to shrink. This is beca...
Fault tolerance is a key requirement in several application domains of embedded processors cores. In...
Aggressive process scaling and increasing demands of performance/cost efficiency have exacerbated th...