Future multicore processors will become more susceptible to a variety of hardware failures. In particular, intermittent faults, caused in part by manufacturing process variation or in-progress wear-out, can cause bursts of frequent faults that last from several cycles to several seconds or more. Cost-effective reliability to tolerate intermittent faults will likely require, or be greatly simplified by, the ability to temporarily suspend execution on a core during periods of frequent intermittent faults. We investigate three existing techniques for adapting to the dynamically changing resource availability caused by such core suspension, and demonstrate their different system-level implications. We show that system software reconfiguratio...
Intermittent hardware faults are hard to diagnose as they occur non-deterministically. Hardware-only...
2015-02-18As technology scales further down in the nanometer regime, chip manufacturers are able to ...
Copyright © 2014 Chao(Saul) Wang et al. This is an open access article distributed under the Creativ...
Abstract—The frequency of hardware errors is increasing due to shrinking feature sizes, higher level...
This paper presents a dynamic scheduling solution to achieve fault tolerance in many-core architectu...
Over three decades of continuous scaling in CMOS technology has led to tremendous improvements in pr...
As semiconductor technology scales into the nanometer regime, intermittent faults have become an inc...
This paper presents a novel approach to the design of multi-/many-core systems with an adaptive leve...
Abstract: Fault-tolerance is a crucial aspect of safety critical systems. When such systems need to ...
Transient faults are emerging as a critical concern in the reliability of general-purpose microproce...
Abstract—Transient faults are emerging as a critical concern in the reliability of general-purpose m...
Part 2: Asian Conference on Availability, Reliability and Security (AsiaARES)International audienceF...
Intermittent computing is a new paradigm enabling battery-less computing devices to be powered direc...
The continued scaling of silicon fabrication technologies has enabled the integration of dozens of p...
CMOS scaling has greatly increased concerns for both lifetime reliability due to permanent faults an...
Intermittent hardware faults are hard to diagnose as they occur non-deterministically. Hardware-only...
2015-02-18As technology scales further down in the nanometer regime, chip manufacturers are able to ...
Copyright © 2014 Chao(Saul) Wang et al. This is an open access article distributed under the Creativ...
Abstract—The frequency of hardware errors is increasing due to shrinking feature sizes, higher level...
This paper presents a dynamic scheduling solution to achieve fault tolerance in many-core architectu...
Over three decades of continuous scaling in CMOS technology has led to tremendous improvements in pr...
As semiconductor technology scales into the nanometer regime, intermittent faults have become an inc...
This paper presents a novel approach to the design of multi-/many-core systems with an adaptive leve...
Abstract: Fault-tolerance is a crucial aspect of safety critical systems. When such systems need to ...
Transient faults are emerging as a critical concern in the reliability of general-purpose microproce...
Abstract—Transient faults are emerging as a critical concern in the reliability of general-purpose m...
Part 2: Asian Conference on Availability, Reliability and Security (AsiaARES)International audienceF...
Intermittent computing is a new paradigm enabling battery-less computing devices to be powered direc...
The continued scaling of silicon fabrication technologies has enabled the integration of dozens of p...
CMOS scaling has greatly increased concerns for both lifetime reliability due to permanent faults an...
Intermittent hardware faults are hard to diagnose as they occur non-deterministically. Hardware-only...
2015-02-18As technology scales further down in the nanometer regime, chip manufacturers are able to ...
Copyright © 2014 Chao(Saul) Wang et al. This is an open access article distributed under the Creativ...