Silent Data Corruption (SDC) is a serious reliability issue in many domains, including embedded systems. However, current protection techniques are brittle, and do not allow programmers to trade off performance for SDC coverage. Further, many of them require tens of thousands of fault injection experiments, which are highly time-intensive. In this paper, we propose two empirical models, namely SDCTune and SDCAuto, to predict the SDC proneness of a program’s data. Both models are based on static and dynamic features of the program alone, and do not require fault injections to be performed. We then develop an algorithm using both models to selectively protect the most SDC-prone data in the program subject to a given performance overhead bound...
Abstract—Increasing parallelism and transistor density, along with increasingly tighter energy and p...
119 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008.In the end, this dissertation...
This thesis focuses on resilience for high performance applications that execute on large scale plat...
Silent Data Corruption (SDC) is a serious reliability issue in many domains, including embedded syst...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Hardware errors are on the rise with reducing chip sizes, and power constraints have necessitated th...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Soft error caused by single event upset has been a severe challenge to aerospace-based computing. Si...
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems ...
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems ...
As machines increase in scale, it is predicted that failure rates of supercomputers will correspondi...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
Embedded systems’ hardware can be impacted by soft errors, which can cause data flow errors in the s...
<p>Chip manufacturers and hyperscalers are becoming increasingly aware of the problem posed by...
<p>Today more than ever before, academia, manufacturers, and hyperscalers acknowledge the majo...
Abstract—Increasing parallelism and transistor density, along with increasingly tighter energy and p...
119 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008.In the end, this dissertation...
This thesis focuses on resilience for high performance applications that execute on large scale plat...
Silent Data Corruption (SDC) is a serious reliability issue in many domains, including embedded syst...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Hardware errors are on the rise with reducing chip sizes, and power constraints have necessitated th...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Soft error caused by single event upset has been a severe challenge to aerospace-based computing. Si...
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems ...
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems ...
As machines increase in scale, it is predicted that failure rates of supercomputers will correspondi...
ELLIOTT III, JAMES JOHN. Resilient Iterative Linear Solvers Running Through Errors. (Under the direc...
Embedded systems’ hardware can be impacted by soft errors, which can cause data flow errors in the s...
<p>Chip manufacturers and hyperscalers are becoming increasingly aware of the problem posed by...
<p>Today more than ever before, academia, manufacturers, and hyperscalers acknowledge the majo...
Abstract—Increasing parallelism and transistor density, along with increasingly tighter energy and p...
119 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008.In the end, this dissertation...
This thesis focuses on resilience for high performance applications that execute on large scale plat...